High Salary Software Jobs: March 2008

Monday, March 31, 2008

Sunday, March 30, 2008

Monday, March 31, 2008

Sunday, March 30, 2008

Blog Archive

Learn more on Network backup ,Server components(Job scheduler, Error handling, Metadata database, Media manager,)

Free tutors on Network Back-up administration ,GENERAL CONDITIONS FOR BACK-UP, NETWORK BACK-UP SERVICES

Free Tutors on APPLICATION OF STORAGE NETWORKS in Web applications based upon the ‘travel portal’ case study

Know more about ADAPTABILITY AND SCALABILITY OF IT SYSTEMS and Clustering for load distribution, Web architecture

Free Storage tutors Failure of virtualization in the storage network and case study on the Failure of a data centre

Learn How avoid Failure of an I/O bus Protection against the failure of an entire server Failure of a disk subsystem

Learn more on Application and Management of Storage Networks Data sharing and Availability of data

Know more About On Data networks, voice networks and storage networks, Storage sharing, Disk storage pooling, Dynamic tape library sharing

Free tutors on Application and Management of Storage Networks (DEFINITION OF THE TERM ‘STORAGE NETWORK, Networks in the I/O path)

FREE TUTORS ON SYMMETRIC AND ASYMMETRIC STORAGE VIRTUALIZATION IN THE NETWORK(advantages of asymmetric virtualization And disadvantages of asymmetric virtualization )

Learn more on Network backup ,Server components(Job scheduler, Error handling, Metadata database, Media manager,)

SERVER COMPONENTS

Back-up servers consist of a whole range of component parts. In the following we will discuss the main components:

ü Job scheduler

ü Error handler metadata database

ü Media manager

Job scheduler

The job scheduler determines what data will be backed up when. It must be carefully conﬁgured; the actual back-up then takes place automatically. With the aid of job schedulers and tape libraries many computers can be backed up overnight without the need for a system administrator to change tapes on site. Small tape libraries have a tape drive, a magazine with space for around ten tapes and a media changer that can automatically move the various tapes back and forth between magazine and tape drive. Large tape libraries have several dozen tape drives, space for several thousands of tapes and a media changer or two to insert the tapes in the drives.

Error handling

If a regular automatic back-up of several systems has to be performed, it becomes difﬁcult to monitor whether all automated back-ups have run without errors. The error handler helps to prioritize and ﬁlter error messages and generate reports. This avoids the situation in which problems in the back-up are not noticed until a back-up needs to be restored.

Metadata database

The metadata database and the media manager represent two components that tend to be hidden. The metadata database is the brain of a network back-up system. It contains the following entries for every back-up up object: name, computer of origin, date of last change, date of last back-up, name of the back-up medium, etc. For example, an entry is made in the metadata database for every ﬁle to be backed up. The cost of the metadata database is worthwhile: in contrast to back-up tools provided by operating systems, network back-up systems permit the implementation of the incremental-forever strategy in which a ﬁle system is only fully backed up in the ﬁrst back- up. In subsequent back-ups, only those ﬁles that have changed since the previous back-up are backed up. The current state of the ﬁle system can then be calculated on the back-up server from database operations from the original full back-up and from all subsequent incremental back-ups, so that no further full back-ups are necessary. The calculations in the metadata database are generally performed faster than a new full back-up. Even more is possible: if several versions of the ﬁles are backed up on the back-up server, a whole ﬁle system or a subdirectory dated three days ago, for example, can be restored (point-in-time restore) – the metadata database makes it possible.

Media manager

Use of the incremental-forever strategy can considerably reduce the time taken by the back-up in comparison to the full back-up. The disadvantage of this is that over time the backed up ﬁles can become distributed over numerous tapes. This is critical for the

restoring of large ﬁle systems because tape mounts cost time. This is where the media manager comes into play. It can ensure that only ﬁles from a single computer are located on one tape. This reduces the number of tape mounts involved in a restore process, which means that the data can be restored more quickly. A further important function of the media manager is so-called tape reclamation. As a result of the incremental-forever strategy, more and more data that is no longer needed is located on the back-up tapes. If, for example, a ﬁle is deleted or changed very frequently over time, earlier versions of the ﬁle can be deleted from the back-up medium. The gaps on the tapes that thus become free cannot be directly overwritten using current techniques. In tape reclamation, the media manager copies the remaining data that is still required from several tapes, of which only a certain percentage is used, onto a common new tape. The tapes that have thus become free are then added to the pool of unused tapes. There is one further technical limitation in the handling of tapes: current tape drives can

only write data to the tapes at a certain speed. If the data is transferred to the tape drive too slowly this interrupts the write process, the tape rewinds a little and restarts the write process. The repeated rewinding of the tapes costs performance and causes unnecessary wear to the tapes so they have to be discarded more quickly. It is therefore better to send the data to the tape drive quickly enough so that it can write the data onto the tape in one go (streaming). The problem with this is that in network back-up the back-up clients send the data to be

backed up via the LAN to the back-up server, which forwards the data to the tape drive. On the way from back-up client via the LAN to the back-up server there are repeated ﬂuctuations in the transmission rate, which means that the streaming of tape drives is repeatedly interrupted. Although it is possible for individual clients to achieve streaming by additional measures (such as the installation of a separate LAN between back-up client and back-up server) (Section 7.7), these measures are expensive and technically not scalable at will, so they cannot be realized economically for all clients. The solution: the media manager manages a storage hierarchy within the back-up server. To achieve this, the back-up server must be equipped with hard disks and tape libraries. If a client cannot send the data fast enough for streaming, the media manager ﬁrst of all stores the data to be backed up to hard disk. When writing to a hard disk it makes no

difference what speed the data is supplied at. When enough of the data to be backed up has been temporarily saved to the hard disk of the back-up server, the media manager automatically moves large quantities of data from the hard disk of the back-up server to its tapes. This process only involves recopying the data within the back-up server, so that streaming is guaranteed when writing the tapes. This storage hierarchy is used, for example, for the back-up of user PCs Many user PCs are switched off overnight, which means that back-up cannot be guaranteed overnight. Therefore, network back-up systems often use the midday period to back up user PCs. Use of the incremental-forever strategy means that the amount of data to be backed up every day is so low that such a back-up strategy is generally feasible. All user PCs are ﬁrst of all backed up to the hard disk of the back-up server in the time window from 11 : 15 to 13 : 45. The media manager in the back-up server then has a good twenty hours to move the data from the hard disks to tapes. Then the hard disks are once again free so that the user PCs can once again be backed up to hard disk in the next midday break. In all operations described here the media manager checks whether the correct tape has been placed in the drive. To this end, the media manager writes an unambiguous signature to every tape, which it records in the metadata database. Every time a tape is inserted the media manager compares the signature on the tape with the signature in the metadata database. This ensures that no tapes are accidentally overwritten and that the correct data is written back during a restore operation. Furthermore, the media manager monitors how often a tape has been used and how old it is, so that old tapes are discarded in good time. If necessary, it ﬁrst copies data that is still required to a new tape. Older tape media formats also have to be wound back and forwards now and then so that they last longer; the media manager can also automate the winding of tapes that have not been used for a long time. A further important function of the media manager is the management of data in a so-called off-site store. To this end, the media manager keeps two copies of all data to be backed up. The ﬁrst copy is always stored on the back-up server, so that data can be quickly restored if it is required. However, in the event of a large-scale disaster (ﬁre in the data centre) the copies on the back-up server could be destroyed. For such cases the media manager keeps a second copy in an off-site store that can be several kilometres away. The media manager supports the system administrator in moving the correct tapes back and forwards between back-up server and off-site store. It even supports tape reclamation for tapes that are currently in the off-site store and it.

Free tutors on Network Back-up administration ,GENERAL CONDITIONS FOR BACK-UP, NETWORK BACK-UP SERVICES

(Network back-up systems such as Arcserve (Computer Associates), NetBackup (Veritas), Networker (EMC/Legato) and Tivoli Storage Manager (IBM):)

Network back-up systems can back up heterogeneous IT environments incorporating several thousands of computers largely automatically. In the classical form, network back-up systems move the data to be backed up via the LAN; this is where the name 'network back-up' comes from. This chapter explains the basic principles of network back-up and shows typical performance bottlenecks for conventional server-centric IT architectures. Finally, it shows how storage networks and intelligent storage systems help to overcome these performance bottlenecks. Before getting involved in technical details, we will ﬁrst discuss a few general conditions that should be taken into account in back-up . Then the back-up, archiving and hierarchical storage management services will be discussed and we will show which components are necessary for their implementation. This is followed by a summary of the measures discussed up to this point that are available to network back-up systems to increase performance . Then, on the basis of network back-up, further technical boundaries of server-centric IT architectures will be described that are beyond the scope of and we will explain why these performance bottlenecks can only be overcome to a limited degree within the server-centric IT architecture. Then we will show how data can be backed up signiﬁcantly more efﬁciently with a storage-centric IT architecture (Section 7.8). Building upon this, the protection of ﬁle servers and databases using storage networks and network back-up systems will be discussed. Finally, organizational aspects of data protection will be considered The consideration of network back-up concludes the use of storage networks.

GENERAL CONDITIONS FOR BACK-UP

Back-up is always a headache for system administrators. Increasing amounts of data have to be backed up in ever shorter periods of time. Although modern operating systems come with their own back-up tools, these tools only represent isolated solutions, which are completely inadequate in the face of the increasing number and heterogeneity of systems to be backed up. For example, there may be no option for monitoring centrally whether all back-ups have been successfully completed overnight or there may be a lack of overall management of the back-up media. Changing preconditions represent an additional hindrance to data protection. There are three main reasons for this: 1. As discussed in Chapter 1, installed storage capacity doubles every four to twelve

months depending upon the company in question. The data set is thus often growing more quickly than the infrastructure in general (personnel, network capacity).

1.Nevertheless, the ever-increasing quantities of data still have to be backed up.

2. Nowadays, business processes have to be adapted to changing requirements all the time. As business processes change, so the IT systems that support them also have to be adapted. As a result, the daily back-up routine must be continuously adapted to the ever-changing IT infrastructure.

3. As a result of globalization, the Internet and e-business, more and more data has to be available around the clock: it is no longer feasible to block user access to applications and data for hours whilst data is backed up. The time window for back-ups is becoming ever smaller. Network back-up can help us to get to grips with these problems.

NETWORK BACK-UP SERVICES

Network back-up systems such as Arcserve (Computer Associates), NetBackup (Veritas), Networker (EMC/Legato) and Tivoli Storage Manager (IBM) provide the following services:

• back-up

• archive

• hierarchical storage management.

The main task of network back-up systems is to back data up regularly. To this end,at least one up-to-date copy must be kept of all data, so that it can be restored after a hardware or application error ('ﬁle accidentally deleted or destroyed by editing', 'error in the database programming').

The purpose of archiving is to freeze a certain version of the data so that this precise version can be restored later on. For example, after the conclusion of a project its data can be archived on the back-up server and then deleted from the local hard disk. This saves local disk space and accelerates back-up and restore processes, since only the data that is actually being worked with has to be backed up or restored. Hierarchical storage management (HSM) ﬁnally leads the end user to believe that any desired size of hard disk is present. HSM moves ﬁles that have not been accessed for a long time from the local disk to the back-up server; only a directory entry remains in the local ﬁle server. The entry in the directory contains meta information such as ﬁle name, owner, access rights, date of last modiﬁcation and so on. The metadata takes up hardly any space in the ﬁle system compared to the actual ﬁle contents, so space is actually gained by moving the ﬁle content from the local disk to the back-up server. If a process accesses the content of a ﬁle that has been moved in this way, HSM blocks the accessing process, copies the ﬁle content back from the back-up server to the local ﬁle system and only then gives clearance to the accessing process. Apart from the longer access time, this process remains completely hidden to the accessing processes and thus also to end users. Older ﬁles can thus be automatically moved to cheaper media (tapes) and, if necessary, fetched back again without the end user having to alter his behaviour. Strictly speaking, HSM and back-up and archive are separate concepts. However, HSM is a component of many network back-up products, so the same components (media, software) can be used both for back-up, archive and also for HSM. When HSM is used, the back-up software used must at least be HSM-capable: it must back up the metadata of the moved ﬁles and the moved ﬁles themselves, without moving the ﬁle contents back to the client. HSM-capable back-up software can speed up back-up and restore processes because only the meta-information of the moved ﬁles has to be backed up and restored, not their ﬁle contents. A network back-up system realizes the above-mentioned functions of back-up, archive and hierarchical storage management by the co-ordination of back-up server and a range of back-up clients. The server provides central components such as the management of back-up media that are required by all back-up clients. However, different back-up clients are used for different operating systems and applications. These are specialized in the individual operating systems or applications in order to increase the efﬁciency of data protection or the efﬁciency of the movement of data. The use of terminology regarding network back-up systems is somewhat sloppy: the main task of network back-up systems is the back-up of data. Server and client instances of network back-up systems are therefore often known as the back-up server and back-up client, regardless of what tasks they perform or what they are used for. A particular server instance of a network back-up system could, for example, be used exclusively for HSM, so that this instance should actually be called a HSM server – nevertheless this instance would generally be called a back-up server. A client that provides the back-up function usually also supports archive and the restore of back-ups and archives – nevertheless this

client is generally just known as a back-up client. In this book we follow the general, untidy conventions, because the phrase 'back-up client' reads better than 'back-up-archive- HSM and restore client'. The two following sections discuss details of the back-up server) and the back-up client We then turn our attention to the performance and the use of network back-up systems.

Free Tutors on APPLICATION OF STORAGE NETWORKS in Web applications based upon the 'travel portal' case study

This section uses the 'travel portal' case study to demonstrate the implementation of a so called web application. The case study is transferable to web applications for the support of business processes. It thus shows the possibilities opened up by the Internet and highlights the potential and the change that stand before us with the transformation to e-business. Furthermore, the example demonstrates once again how storage networks, server clusters and the ﬁve tier architecture can fulﬁl requirements such as the fault-tolerance, adaptability and scalability of IT systems.

Figure 6.36 shows the realization of the travel portal in the form of a web application. Web application means that users can use the information and services of the travel portal from various end devices such as PC, PDA and mobile phone if these are connected to the Internet. The travel portal initially supports only editorially prepared content (including ﬁlm reports, travel catalogues, transport timetable information) and content added by the users themselves (travel tips and discussion forums), which can be called up via conventional web browsers. Figure 6.37 shows the expansion of the travel portal by further end devices such as mobile phones and PDAs and by further services. To use the travel portal, users ﬁrst of all build up a connection to the representation server by entering the URL. Depending upon its type, the end device connects to a web server (HTTP server) or, for example, to a WAP server. The end user only perceives the web server as being a single web server. In fact, a cluster of representation servers is working in the background. The load balancer of the representation server accepts the request to build up a connection and passes it on to the computer with the lowest load. Once a connection has been built up the web browser transfers the user identiﬁer, for example, in the form of a cookie or the mobile number, and the properties of the end device (for example, screen resolution). The web server calls up the user proﬁle from the user management. Using this information the web server dynamically generates websites (HTML, WML or iMode) that are optimally oriented towards the requirements of the user. Thus the representation of content can be adjusted to suit the end device in use at

for the implementation of a three-tier architecture, since the clients (representation layer) communicate only with the application servers the time. Likewise, content, adverts and information can be matched to the preferences of the user; one person may be interested in the category of city tips for good restaurants,

whilst another is interested in museums. The expansion of the travel portal to include the new 'hotel tips' application takes place by the linking of the existing 'city maps' and 'hotel directory' databases (Figure 6.37). The application could limit the selection of hotels by a preferred price category stored in the user proﬁle or the current co-ordinates of the user transmitted by a mobile end device equipped with GPS. Likewise, the ﬁve-tier architecture facilitates the support of new end devices, without the underlying applications having to be modiﬁed. For example, in addition to the con- ventional web browsers and WAP phones shown in Figure 6.37 you could also implement mobile PDAs (low resolution end devices) and a pure voice interface for car drivers.

	Representation Client (Web browser) Ø Represents graphical user interface generated by the Representation tier Ø Passes on user interaction to Representation servers

Representation server (Web server) Ø Generates graphical user interface to External systems Ø Converts user interactions into application calls
Applications Ø Converts block oriented storage into tables(databases)or file systems Ø Reads ,Processes and deletes data that the data tier stores
Data Ø Converts block oriented storage into tables(databases)or file systems
Storage Ø Stores application data permanently on block- oriented storage devices

Figure In the ﬁve-tier architecture the representation layer is split up into representation server and representation client and the data layer is split into data management and storage devices All server machines are connected together via a fast network. Today primarily Gigabit Ethernet is used; in future InﬁniBand will presumably also be used. With the aid of appropriate cluster software, applications can be moved from one computer to another. Further computers can be added to the cluster if the overall performance of the cluster is not sufﬁcient. Storage networks bring with them the ﬂexibility needed to provide the travel portal with the necessary storage capacity. The individual servers impose different requirements on the storage network:

• Databases

The databases require storage space that meets the highest performance requirements. To simplify the administration of databases the data should not be stored directly upon raw devices, instead it should be stored within a ﬁle system in ﬁles that have been specially formatted for the database. Nowadays (2003), only disk subsystems connected via Fibre Channel are considered storage devices. NAS servers cannot yet be used in this situation due to the lack of availability of standardized high-speed network ﬁle systems such as RDMA enabled NFS. In future, it will be possible to use storage virtualization on ﬁle level here.

• Representation server and media servers

The representation servers augment the user interfaces with photos and small ﬁlms. These are stored on separate media servers that the end user's web browser can access directly over the Internet. As a result, the media do not need to travel through the internal buses of the representation servers, thus freeing these up. Since the end users access the media over the Internet via comparatively slow connections, NAS servers are very suitable. Depending upon the load upon the media servers, shared-nothing or shared-everything NAS servers can be used. Storage virtualization on ﬁle level again offers itself as an alternative here.

• Replication of the media servers

The users of the travel portal access it from various locations around the globe. Therefore, it is a good idea to store pictures and ﬁlms at various sites around the world so that the large data quantities are supplied to users from a server located near the user (Figure 6.38). This saves network capacity and generally accelerates the transmission of the data. The data on the various cache servers is synchronized by appropriate replication software. Incidentally, the use of the replication software is independent of whether the media servers at the various sites are conﬁgured as shared-nothing NAS servers, shared-everything NAS servers, or as a storage virtualization on the ﬁle-level.

In the ﬁrst part of the book the building blocks of storage networks were introduced. Building upon these, this chapter has explained the fundamental principles of the use of storage networks and shown how storage networks help to increase the availability and the adaptability of IT systems.

As an introduction to the use of storage networks, we elaborated upon the characteristics of storage networks by illustrating the layering of the techniques for storage networks, investigated various forms of storage networks in the I/O path and deﬁned storage networks in relation to data networks and voice networks. Storage resource sharing was introduced as a ﬁrst application of storage networks. Individually, disk storage pooling, tape library partitioning, tape library sharing and data sharing were considered. We described the redundant I/O buses and multipathing software, redundant server and cluster software, redundant disk subsystems and volume manager mirroring or disk sub- system remote mirroring to increase the availability of applications and data, and ﬁnally redundant storage virtualization. Based upon the case study 'protection of an important database' we showed how these measures can be combined to protect against the failure of a data centre. With regard to adaptability and scalability, the term 'cluster' was expanded to include the property of load distribution. Individually, shared-null conﬁgrations, shared-nothing clusters, enhanced shared-nothing clusters and shared-everything clusters were introduced. We then introduced the ﬁve-tier architecture – a ﬂexible and scalable architecture for IT systems. Finally, based upon the case study 'travel portal', we showed how clusters and the ﬁve-tier architecture can be used to implement ﬂexible and scalable web applications. As a further important application of storage networks, the next chapter discusses network back-up (Chapter 7). A ﬂexible and adaptable architecture for data protection is introduced and we show how network back-up systems can beneﬁt from the use of disk subsystems and storage networks.

Know more about ADAPTABILITY AND SCALABILITY OF IT SYSTEMS and Clustering for load distribution, Web architecture

A further requirement of IT systems is that of adaptability and scalability: successful companies have to adapt their business processes to new market conditions in ever shorter cycles. Along the same lines, IT systems must be adapted to new business processes so that they can provide optimal support for these processes. Storage networks are also required to be scalable: on average the storage capacity required by a company doubles in the course of each year. This means that anyone who has 1 terabyte of data to manage today will have 32 terabytes in ﬁve years time. A company with only 250 gigabytes today

will reach 32 terabytes in seven years time. This section discusses the adaptability and scalability of IT systems on the basis of clusters for load distribution (Section 6.4.1), the ﬁve-tier architecture for web application servers (Section 6.4.2) and the case study 'structure of a travel portal' (Section 6.4.3).

Clustering for load distribution

The term 'cluster' is very frequently used in information technology, but which is not clearly deﬁned. The meaning of the term 'cluster' varies greatly depending upon context. As the greatest common denominator we can only state that a cluster is a combination of components or servers that perform a common function in one form or another. This section expands the cluster concept for protection against the failure of a server introduced in Section 6.3.2 to include clustering for load distribution. We discuss three different forms of clusters based upon the example of a ﬁle server. The three different forms of cluster are comparable to the modes of multipathing software. The starting point is the so-called shared-null conﬁguration (Figure 6.24). The components are not designed with built-in redundancy. If a server fails, the ﬁle system itself is no longer available, even if the data is mirrored on two different disk subsystems and

redundant I/O buses are installed between server and disk subsystems (Figure 6.25) In contrast to the shared-null conﬁguration, shared-nothing clusters protect against the failure of a server. The basic form of the shared-nothing cluster was discussed in Section 6.3.2 in relation to the protection of a ﬁle server against the failure of a server. Figure 6.26 once again shows two shared-nothing clusters each with two servers. Shared-nothing clusters can be differentiated into active/active and active/passive con- ﬁgurations. In the active/active conﬁguration, applications run on both computers; for example, the computers 'server 1' and 'server 2' in Figure 6.26 each export a ﬁle sys- tem. If one of the two computers fails, the other computer takes over the tasks of the failed computer in addition to its own (Figure 6.27, top). This taking over of the applications of the failed server can lead to performance bottlenecks in active/active conﬁgurations. The active/passive conﬁguration can help in this situation. In this approach the application runs only on the primary server, the second computer in the cluster (stand-by server) does nothing in normal operation. It is exclusively there to take over the applications of the primary server if this fails. If the primary server fails, the stand-by server takes over its tasks (Figure 6.27, bottom). The examples in Figures 6.26 and 6.27 show that shared-nothing clusters with only two servers are relatively inﬂexible. More ﬂexibility is offered by shared-nothing clusters with more than two servers, so-called enhanced shared-nothing clusters. Current shared-nothing cluster software supports shared-nothing clusters with several dozens of computers. Figures 6.28 and 6.29 show the use of an enhanced shared-nothing cluster for static load distribution: during the daytime when the system is busy, three different servers each export two ﬁle systems (Figure 6.28). At night, access to the data is still needed; however, a single server can manage the load for the six ﬁle systems (Figure 6.28). The two otherservers are freed up for other tasks in this period (data mining, batch processes, back-up, maintenance). One disadvantage of the enhanced shared-nothing cluster is that it can only react to load peaks very slowly. Appropriate load balancing software can, for example, move the ﬁle system '/fs2' to one of the other two servers even during the day if the load on the ﬁle system '/fs1' is higher. However, this takes some time, which means that this process

is only worthwhile for extended load peaks. A so-called shared-everything cluster offers more ﬂexibility in comparison to enhanced shared-nothing clusters. For ﬁle servers, shared disk ﬁle systems are used as local ﬁle systems here, so that all servers can access the data efﬁciently over the storage network. Figure 6.30 shows a ﬁle server that is conﬁgured as a shared-everything cluster with three servers. The shared disk ﬁle system is distributed over several disk subsystems. All three servers export this ﬁle system to the clients in the LAN over the same virtual IP address by means of a conventional network ﬁle system such as NFS or CIFS. Suitable load balancing software distributes new incoming accesses on the network ﬁle system equally amongst all three servers. If the three servers are not powerful enough, a fourth server can simply be linked to the cluster. The shared-everything cluster also offers advantages in the event of the failure of a

single server. For example, the ﬁle server in Figure 6.30 is realized in the form of a distributed application. If one server fails, as in Figure 6.31, recovery measures are only necessary for those clients that have just been served by the failed computer. Likewise, recovery measures are necessary for the parts of the shared disk ﬁle system and the network ﬁle system have just been managed by the failed computer. None of the other clients of the ﬁle server notice the failure of a computer apart from a possible reduction in performance. Despite their advantages, shared-everything clusters are very seldom used. The reason

for this is quite simply that this form of cluster is the most difﬁcult to realize, so most cluster products and applications only support the more simply realized variants of shared- nothing or enhanced shared-nothing.

Web architecture

In the 1990s the so-called three-tier architecture established itself as a ﬂexible architecture for IT systems (Figure 6.32). The three-tier architecture isolates the tasks of data management, applications and representation into three separate layers. Figure 6.33 shows a possible implementation of the three-tier architecture. Individually the three layers have the following tasks:

• Data Information in the form of data forms the basis for the three-tier architecture. Databases and ﬁle systems store the data of the applications on block-oriented disks or disk subsystems. In addition, the data layer can provide interfaces to external systems and legacy applications.

• Applications Applications generate and process data. Several applications can work on the same databases or ﬁle systems. Depending upon changes to the business processes, existing applications are modiﬁed and new applications added. The separation of applications and databases makes it possible for no changes, or only minimal changes, to have to be made to the underlying databases or ﬁle systems in the event of changes to applications.

• Representation The representation layer provides the user interface for the end user. In the 1990s the user interface was normally realized in the form of the graphical interface on a PC.

The corresponding function calls of the application are integrated into the graphical interface so that the application can be controlled from there. Currently, the two outer layers can be broken down into sublayers so that the three-tier architecture is further developed into a ﬁve-tier architecture Figure 6.34 and Figure 5: • Splitting of the representation layer

In recent years the representation layer has been split up by the World Wide Web into web servers and web browsers. The web servers provide statically or dynamically generated websites that are represented in the browsers. Websites with a functional scope comparable to that of conventional user interfaces can currently be generated using Java and various script languages. The arrival of mobile end devices such as mobile phones and PDAs has meant that

web servers had to make huge modiﬁcations to websites to bring them into line with the properties of the end devices. In future there will be user interfaces that are exclusively controlled by means of the spoken word – for example navigation systems for use in the car, that are connected to the Internet for requesting up-to-date trafﬁc data. • Splitting of the data layer In the 1990s, storage devices for data were closely coupled to the data servers (storage centric IT architecture). In the previous chapters storage networks were discussed in detail, so at this point of the book it should be no surprise to learn that the data layer

is split into the organization of the data (databases, ﬁle servers) and the storage space for data (disk subsystems).

Free Storage tutors Failure of virtualization in the storage network and case study on the Failure of a data centre

Failure of virtualization in the storage network

Virtualization in the storage network is currently (2003) treated as the solution for the consolidation of storage resources in large storage networks, with which the storage resources of several disk subsystems can be centrally managed (Chapter 5). However, it is necessary to be clear about the fact that precisely such a central virtualization instance represents a single point of failure. Even if the virtualization instance is protected against the failure of a single component by measures such as clustering, the data of an entire data centre can be lost as a result of conﬁguration errors or software errors in the virtualization instance,

since the storage virtualization aims to span all the storage resources of a data centre. Therefore, the same considerations apply for the protection of a virtualization instance positioned in the storage network (Section 5.7, 'Symmetric and Asymmetric Storage Virtualization in the Network') against the failure as the measures to protect against the failure of a disk subsystem discussed in the previous section. Therefore, the mirroring of important data from the server via two virtualization instances should also be considered in the case of virtualization in the storage network.

Failure of a data centre based upon the case study

'protection of an important database' The measures of server clustering, redundant I/O buses and disk subsystem mirroring (volume manager mirroring or remote mirroring) discussed above protect against the failure of a component within a data centre. However, these measures are useless in the event of the failure of a complete data centre (ﬁre, water damage). To protect against the failure of a data centre it is necessary to duplicate the necessary infrastructure in a

back-up data centre for the operation of the most important applications. Figure 6.22 shows the interaction between the primary data centre and back-up data

centre based upon the case study 'protection of an important database'. In the case study, all the measures discussed in this section for protection against the failure of a component are used. In the primary data centre all components are designed with built-in redundancy. The primary server is connected via two independent Fibre Channel SANs (Dual SAN) to two disk subsystems, on which the data of the database lies. Dual SANs have the advantage that even in the event of a serious fault in a SAN (defective switch, which corrupts the SAN with corrupt frames), the connection via the other SAN remains intact. The redundant paths between servers and storage devices are managed by appropriate multipathing software. Each disk subsystem is conﬁgured using a RAID procedure so that the failure of individual physical disks within the disk subsystem in question can be rectiﬁed. In addition, the data is mirrored in the volume manager so that the system can withstand the failure of a disk subsystem. The two disk subsystems are located at a distance from one another in the primary data centre. They are separated from one another by a ﬁre protection wall. Like the disk subsystems, the two servers are spatially separated by a ﬁre protection

wall. In normal operation the database runs on one server; in the meantime the second server is used for other, less important tasks. If the primary server fails, the cluster software automatically starts the database on the second computer. It also terminates all other activities on the second computer, thus making all its resources fully available to the main application. Remote mirroring takes place via an IP connection. Mirroring utilizes knowledge of the

data structure of the database: in a similarmanner to journaling in ﬁle systems (Section 4.1.2), databases write each change into a log ﬁle before then integrating it into the actual data set. In the example, only the log ﬁles are mirrored in the back-up data centre. The complete data set was only transferred to the back-up data centre once at the start of mirroring. Thereafter this data set is only ever adjusted with the aid of the log ﬁles. This has two advantages: the powerful network connection between the primary data centre and the remote back-up data centre is very expensive. The necessary data rate for this connection can be halved by only transferring the changes to the log ﬁle. This cuts costs. In the back-up data centre the log ﬁles are integrated into the data set after a delay of

two hours. As a result, a copy of the data set that is two hours old is always available in the back-up data centre. This additionally protects against application errors: if a table space is accidentally deleted in the database then the user has two hours to notice the error and interrupt the copying of the changes in the back-up data centre. A second server and a second disk subsystem are also operated in the back-up data centre, which in normal operation can be used as a test system or for other, less time- critical tasks such as data mining. If the operation of the database is moved to the back-up data centre, these activities are suspended (Figure 6.23). The second server is conﬁgured as a stand-by server for the ﬁrst server in the cluster; the data of the ﬁrst disk subsystem is mirrored to the second disk subsystem via the volume manager. Thus a completely redundant system is available in the back-up data centre. The realization of the case study discussed here is possible with current technology. However, it comes at a price; for most applications this cost will certainly not be justiﬁed. The main point of the case study is to highlight the possibilities of storage networks. In practice you have to decide how much failure protection is necessary and how uch this may cost. At the end of the day, protection against the loss of data or the temporary non-availability of applications must cost less than the data loss or the temporary non- availability of applications itself.

How avoid Failure of an I/O bus Protection against the failure of an entire server Failure of a disk subsystem

Protection against the failure of an I/O bus is relatively simple and involves the installationof several I/O buses between server and storage device. Figure 6.11 shows a scenario for SCSI and Figure 6.12 shows one for Fibre Channel. In Figure 6.12 protection against the failure of an I/O bus is achieved by two storage networks that are independent of one another. Such separate storage networks are also known as a 'dual storage network' or 'dual SAN'. The problem here: operating systems manage storage devices via the triple host bus adapter, SCSI target ID and SCSI LUN. If, for example, there are two connections from a server to a disk subsystem, the operating system recognizes the same disk twice So-called multipathing software recognizes that a storage device can be reached over several paths. Figure 6.14 shows how multipathing software reintegrates the disk found twice in Figure 6.13 to form a single disk again. Multipathing software can act at various points depending upon the product:

• in the volume manager (Figure 6.14, right);

• as an additional virtual device driver between the volume manager and the device driver of the disk subsystem (Figure 6.14, left);

• in the device driver of the disk subsystem;

• in the device driver of the host bus adapter card.

Fibre Channel plans to realize this function in the FC-3 layer. However, this part of the Fibre Channel standard has not yet been realized in real products. We believe it is rather unlikely that these functions will ever actually be realized within the Fibre Channel protocol stack. In the past the principle of keeping the network protocol as simple as possible and realizing the necessary intelligence in the end devices has prevailed

in networks. The multipathing software currently available on the market differs in the mode in which it uses redundant I/O buses:

• Active/passive mode In active/passive mode the multipathing software manages all I/O paths between server and storage device. Only one of the I/O paths is used for actual data trafﬁc. If the active I/O path fails, the multipathing software activates one of the other I/O paths in

order to send the data via this one instead.

• Active/active modeIn active/active mode the multipathing software uses all available I/O paths between server and storage device. It distributes the load evenly over all available I/O channels. In addition, the multipathing software continuously monitors the availability of the

individual I/O paths; it activates or deactivates the individual I/O paths depending upon their availability. It is obvious that the active/active mode utilizes the underlying hardware better than the active/passive mode, since it combines fault-tolerance with load distribution.

Failure of a server

Protection against the failure of an entire server is somewhat trickier. The only thing that can help here is to provide a second server that takes over the tasks of the actual application server in the event of its failure. So-called cluster software monitors the state of the two

computers and starts the application on the second computer if the ﬁrst computer fails. Figure 6.15 shows a cluster for a ﬁle server, the disks of which are connected over Fibre Channel SAN. Both computers have access to the disks, but only one computer actively accesses them. The ﬁle system stored on the disks is exported over a network ﬁle system such as NFS or CIFS. To this end a virtual IP address is conﬁgured for the

cluster. Clients access the ﬁle system via this virtual IP address. If the ﬁrst computer fails, the cluster software automatically initiates the following steps:

1. Activation of the disks on the stand-by computer.

2. File system check of the local ﬁle system stored on the disk subsystem.

3. Mounting of the local ﬁle system on the stand-by computer.

4. Transfer of the virtual cluster IP address.

5. Export of the local ﬁle system via the virtual cluster IP address.

This process is invisible to clients of the ﬁle server apart from the fact that they cannot access the network ﬁle system for a brief period so ﬁle accesses may possibly have to be repeated (Figure 6.16). Server clustering and redundant I/O buses are two measures that are completely inde- pendent of each other. In practice, as shown in Figure 6.17, the two measures are nevertheless combined. The multipathing software reacts to errors in the I/O buses signiﬁcantly more quickly than the cluster software so the extra cost of the redundant I/O buses is

usually justiﬁed.

Failure of a disk subsystem

In Chapter 2 we discussed how disk subsystems implement a whole range of measures to increase their own fault-tolerance. Nevertheless, disk subsystems can sometimes fail, for example in the event of physical impairments such as ﬁre or water damage or due to faults that should not happen at all according to the manufacturer. The only thing that helps in the event of faults in the disk subsystem is to mirror the data on two disk subsystems. Mirroring (RAID 1) is a form of virtualization, for which various realization locations were discussed in Section 5.1. In contrast to classical RAID 1 within the disk subsystem for protection against its failure, the data is mirrored on two different disk subsystems,

which are wherever possible separated by a ﬁre protection wall and connected to two independent electric circuits. From the point of view of reducing the load on the server, the realization of the mirroring by the disk subsystem in the form of remote mirroring is optimal (Figure 6.18, cf. also Section 2.7.2 and Section 5.1.) From the point of view of fault-tolerance, however, remote mirroring through the disk subsystem represents a single point of failure: if the data in the disk subsystem is falsiﬁed on the way to the disk subsystem (controller faults, connection port faults), the copy of the data is also erroneous. Therefore, from the point of view of fault-tolerance, mirroring in the volume manager or in the application itself is optimal (Figure 6.19). In this approach the data is written to two different disk subsystems via two different physical I/O paths. A further advantage of volume manager mirroring compared to remote mirroring is due to the way the two variants are integrated into the operating system. Volume manager mirroring is a solid component of every good volume manager: the volume manager reacts

automatically to the failure and the restarting of a disk subsystem. On the other hand, today's operating systems in the Open System ﬁeld are not yet good at handling copies of disks created by a disk subsystem. Switching to such a copy generally requires manual support. Although, technically, an automated reaction to the failure or the restarting of a disk subsystem is possible, this currently (2003) requires specially written scripts due to lack of integration in the operating system.

On the other hand, there are some arguments in favour of remote mirroring. In addition to the performance beneﬁts discussed above, we should also mention the fact that remote mirroring is supported over greater distances than volume manager mirroring. As a rule of thumb, volume manager can be used up to a maximum distance of six to ten kilometres between server and disk subsystem; for greater distances remote mirroring currently has to be used. Figure 6.20 shows how volume manager mirroring, server clustering and redundant I/O

buses can be combined. In this conﬁguration the management of the disks is somewhat more complicated: each server sees each disk made available by the disk subsystem four

times because each host bus adapter ﬁnds each disk over two connection ports of the disk subsystem. In addition, the volume manager mirrors the data on two disk subsystems. Figure 6.21 shows how the software in the server brings the disks recognized by the operating system back together again: the ﬁle system writes the data to a logical disk provided by the volume manager. The volume manager mirrors the data on two different virtual disks, which are managed by the multipathing software. The multipathing software also manages the four different paths of the two disks. It is not visible here whether the disks exported from the disk subsystem are also virtualized within the disk subsystem. The conﬁguration shown in Figure 6.20 offers good protection against the failure of various components, whilst at the same time providing a high level of availability of data and applications. However, this solution comes at a price. Therefore, in practice, sometimes one and sometimes another protective measure is dispensed with for cost reasons. Often, for example, the following argument is used: 'The data is mirrored within the disk subsystem by RAID and additionally protected by means of network back-up.That should be enough.'

Learn more on Application and Management of Storage Networks Data sharing and Availability of data

In contrast to device sharing (disk storage pooling, tape library partitioning and tape library sharing) discussed earlier, in which several servers share a storage device at block level, data sharing is the use of data by several applications. In data sharing we differentiate between data copying and real time data sharing.

In data copying, as the name suggests, data is copied. This means that several versions of the data are kept, which is fundamentally a bad thing: each copy of the data requires storage space and care must be taken to ensure that the different versions are copied according to the requirements of the applications at the right times. Errors occur in particular in the maintenance of the various versions of the data set, so that subsequent applications repeatedly work with the wrong data. Despite these disadvantages, data copying is used in production environments. The reasons for this can be:

• Generation of test data Copies of the production data are helpful for the testing of new versions of applications and operating systems and for the testing of new hardware. In Section 1.3 we used the example of a server upgrade to show how test data for the testing of new hardware could be generated using instant copies in the disk subsystem. The important point here was that the applications are brieﬂy interrupted so that the consistency of the copied data is guaranteed. As an alternative to instant copies the test data could also be generated using snapshots in the ﬁle system.

• Data protection (back-up) The aim of data protection is to keep up-to-date copies of data at various locations as a precaution to protect against the loss of data by hardware or operating errors. Data protection is an important application in the ﬁeld of storage networks. It is therefore dealt with separately in Chapter 7.

• Data replication Data replication is the name for the copying of data for access to data on computers that are far apart geographically. The objective of data replication is to accelerate data access and save network capacity. There are many applications that automate the replication of data. Within the World Wide Web the data is replicated at two points: ﬁrst, every web browser caches local data in order to accelerate access to pages called up frequently by an individual user. Second, many Internet Service Providers (ISPs) install a so-called proxy server. This caches the contents of web pages that are called up by many users. Other examples of data replication are the mirroring of FTP servers (FTP mirror), replicated ﬁle sets in the Andrew File System (AFS) or the Distributed File

System (DFS) of the Distributed Computing Environment (DCE), and the replication of mail databases.

• Conversion into more efﬁcient data formats It is often necessary to convert data into a different data format because certain calculations are cheaper in the new format. In the days before the pocket calculator logarithms were often used for calculations because, for example, the addition of logarithms yielded the same result as the multiplication in the origin space only more simply. For the same reasons, in modern IT systems data is converted to different data formats. In data mining, for example, data from various sources is brought together in a database and converted into a data format in which the search for regularities in the data set is simpler.

• Conversion of incompatible data formats A further reason for the copying of data is the conversion of incompatible data formats. A classic example is when applications originally developed independently of one another are being brought together over time. Real-time data sharing represents an alternative to data copying. In real-time data sharing all applications work on the same data set. Real-time data sharing saves storage space, avoids the cost and errors associated with the management of several data versions and all applications work on the up-to-date data set. For the reasons mentioned above for data copying it is particularly important to replace the conversion of incompatible data sets by real-time data sharing. The logical separation of applications and data is continued in the implementation. In general, applications and data in the form of ﬁle systems and databases are installed on different computers. This physical separation aids the adaptability, and thus the maintain- ability, of overall systems. Figure 6.8 shows several applications that work on the same data set, with applications and data being managed independently of one another. This has the advantage that new applications can be introduced without existing applications

having to be changed. However, in the conﬁguration shown in Figure 6.8 the applications may generate so much load that a single data server becomes a bottleneck and the load has to be divided amongst several data servers. There are two options for resolving this bottleneck without data copying: ﬁrst, the data set can be partitioned (Figure 6.9) by splitting it over several data servers. If this is not sufﬁcient, then several parallel access paths can be established to the same data set (Figure 6.10). Parallel databases and shared disk ﬁle systems such as the General Parallel File System (GPFS) introduced in Section 4.3.1 provide the functions necessary for this.

AVAILABILITY OF DATA

Nowadays, the availability of data is an important requirement made of IT systems. This section discusses how the availability of data and applications can be maintained in various fault situations. Individually, the following will be discussed: the failure of an I/O bus (Section 6.3.1), the failure of a server (Section 6.3.2), the failure of a disk subsystem (Section 6.3.3), and the failure of a storage virtualization instance which is placed in the storage network (Section 6.3.4). The case study 'protection of an important database' discusses a scenario in which the protective measures that have previously been discussed are combined in order to protect an application against the failure of an entire data centre

Know more About On Data networks, voice networks and storage networks, Storage sharing, Disk storage pooling, Dynamic tape library sharing

Today we can differentiate between three different types of communication networks: data networks, voice networks and storage networks. The term 'voice network' hides the omnipresent telephone network. Data networks describe the networks developed in the 1990s for the exchange of application data. Data networks are subdivided into LAN, MAN and WAN, depending upon range. Storage networks were deﬁned in the ﬁrst chapter as networks that are installed in addition to the existing LAN and are primarily used for data exchange between computers and storage devices. In introductions to storage networks, storage networks are often called SANs and com- pared with conventional LANs (a term for data networks with low geographic extension). Fibre Channel technology is often drawn upon as a representative for the entire category of storage networks. This is clearly because Fibre Channel is currently the dominant technology for storage networks. Two reasons lead us to compare LAN and SAN: ﬁrst, LANs and Fibre Channel SANs currently have approximately the same geographic range.

Second, quite apart from capacity bottlenecks, separate networks currently have to be installed for LANs and SANs because the underlying transmission technologies (Ethernet or Fibre Channel) are incompatible. e believe it is very likely that the three network categories – storage networks, data

networks and voice networks – will converge in the future, ith TCP/IP, or at least IP, being the transport protocol jointly used by all three network types. We discussed the economic advantages of storage networks over Ethernet in Section 3.5 ('IP Storage'). We see it as an indication of the economic advantages of voice transmission over IP (Voice over IP, VoIP) that more and more reputable network manufacturers are offering VoIP devices.

STORAGE SHARING

In Part I of the book you heard several times that one advantage of storage networks is that several servers can share storage resources via the storage network. In this context, storage resources mean both storage devices such as disk subsystems and tape libraries and also the data stored upon them. This section discusses various variants of storage device sharing and data sharing based upon the examples of disk storage pooling dynamic tape library sharing (Section 6.2.2) and data sharing (Section 6.2.3). 6.2.1 Disk storage pooling Disk storage pooling describes the possibility that several servers share the capacity of a disk subsystem. In a server-centric IT architecture each server possesses its own storage: Figure 6.4 shows three servers with their own storage. Server 2 needs more storage space, but the free space in the servers 1 and 3 cannot be assigned to server 2. Therefore, further storage must be purchased for server 2, even though free storage capacity is available on the other servers.

Disk storage pooling

Disk storage pooling describes the possibility that several servers share the capacity of a disk subsystem. In a server-centric IT architecture each server possesses its own storage: Figure 6.4 shows three servers with their own storage. Server 2 needs more storage space, but the free space in the servers 1 and 3 cannot be assigned to server 2. Therefore, further storage must be purchased for server 2, even though free storage capacity is available on the other servers.

In a server-centric IT architecture the storage capacity available in the storage network can be assigned much more ﬂexibly. Figure 6.5 shows the same three servers as Figure 6.4. The same storage capacity is installed in the two ﬁgures. However, in Figure 6.5 only one storage system is present, which is shared by several servers (disk storage pooling). In this arrangement, server 2 can be assigned additional storage capacity by the reconﬁguration of the disk subsystem without the need for changes to the hardware or even the purchase of a new disk subsystem. In Section 5.2.2 ('Implementation-related limitations of storage networks') we dis- cussed how storage pooling across several storage devices from various manufacturers is currently (2003) no simple matter. The main reason for this is the incompatibility of the device drivers for various disk subsystems. In the further course of Chapter 5 we showed how virtualization in the storage network can help to overcome these incompatibilities and, in addition, further increase the efﬁciency of the storage pooling.

Dynamic tape library sharing

Tape libraries, like disk subsystems, can be shared among several servers. In tape library sharing we distinguish between static partitioning of the tape library and dynamic tape library sharing. In static partitioning the tape library is broken down into several virtual tape libraries; each server is assigned its own virtual tape library (Figure 6.6). Each tape drive and each tape in the tape library are unambiguously assigned a virtual tape library; all virtual tape libraries share the media changer that move the tapes cartridges back and

Free tutors on Application and Management of Storage Networks (DEFINITION OF THE TERM 'STORAGE NETWORK, Networks in the I/O path)

Application of Storage Networks

In the ﬁrst part of the book we introduced the fundamental building blocks of storage networks such as disk subsystems, ﬁle systems, virtualization and transmission techniques. In the second part of the book our objective is to show how these building blocks can be combined in order to fulﬁl the requirements of IT systems such as ﬂexibility, fault- tolerance and maintainability. As a prelude to the second part, this chapter discusses the fundamental requirements that are imposed independently of a particular application. First of all, Section 6.1 contrasts the characteristics of various kinds of networks in order to emphasize the shape of a storage network. Section 6.2 introduces various possibilities in the storage network for device sharing and data sharing among several servers. The ﬁnal part of the chapter deals with the two fundamental requirements of IT systems: availability of data (fault-tolerance)

DEFINITION OF THE TERM 'STORAGE NETWORK

In our experience, ambiguities regarding the deﬁnition of the various transmission techniques for storage networks crop up again and again. This section therefore illustrates

networks and storage networks once again from various points of view. It considers the layering of the various protocols and transmission techniques, discusses once again at which points in the I/O path networks can be implemented and it again deﬁnes the terms LAN, MAN, WAN and SAN

Layering of the transmission techniques and protocols

If we greatly simplify the OSI reference model, then we can broadly divide the protocols for storage networks into three layers that build upon one another: transmission

techniques, transport protocols and application protocols (Figure 6.1). The transmission techniques provide the necessary physical connection between several end devices. Build-

ing upon these, transport protocols facilitate the data exchange between end devices via the underlying networks. Finally, the application protocols determine which type of data

the end participants exchange over the transport protocol. Transmission techniques represent the necessary prerequisite for data exchange between several participants. In addition to the already established Ethernet, the ﬁrst part of the book introduces Fibre Channel and Inﬁni-Band. They all deﬁne a medium (cable, radio frequency) and the encoding of data in the form of physical signals, which are transmitted over the medium. Transport protocols facilitate the exchange of data over a network. In addition to the

use of the tried and tested and omnipresent TCP protocol the ﬁrst part of the book introduces Fibre Channel and the Virtual Interface Architecture (VIA). Transport proto-

cols can either be based directly upon a transmission technique such as, for example, Virtual Interfaces over Fibre Channel, Inﬁni-Band or Ethernet or they can use an alter-

native transport protocol as a medium. Examples are Fibre Channel over IP (FCIP) and IP over Fibre Channel (IPFC). Additional confusion is caused by the fact that Fibre Channel deﬁnes both a transmission technique (FC-0, FC-1, FC-2) and a transport protocol (FC-2, FC-3) plus various application protocols (FC-4). Application protocols deﬁne the type of data that is transmitted over a transport protocol. With regard to storage networks we differentiate between block-oriented and ﬁle-oriented application protocols. SCSI is the mother of all block-oriented application protocols for block-oriented data transfer. All further block-oriented application protocols such as FCP, iFCP and iSCSI were derived from the SCSI protocol. File-oriented application protocols transmit ﬁles or ﬁle fragments. Examples of ﬁle-oriented application protocols discussed in this book are NFS, CIFS, FTP, HTTP and DAFS.

Networks in the I/O path

The logical I/O path offers a second point of view for the deﬁnition of transmission techniques for storage networks. Figure 6.2 illustrates the logical I/O path from the disk

to the application and shows at which points in the I/O path networks can be used. Different application protocols are used depending upon location. The same transport

protocols and transmission techniques can be used regardless of this. Below the volume manager, block-oriented application protocols are used. Depending upon technique these are SCSI and SCSI offshoots such as FCP, iFCP and iSCSI. Today, block-oriented storage networks are found primarily between computers and storage sys- tems. However, within large disk subsystems too the SCSI cable is increasingly being replaced by a network transmission technique (Figure 6.3). Above the volume manager and ﬁle system, ﬁle-oriented application protocols are used. Here we ﬁnd application protocols such as NFS, CIFS, HTTP, FTP and DAFS. In Chapter 4 three different ﬁelds of application for ﬁle-oriented application protocols were discussed: traditional ﬁle sharing, high-speed LAN ﬁle sharing and the World Wide Web. Shared disk ﬁle systems, which realize the network within the ﬁle system, should also be mentioned as a special case.

KNOW MORE ABOUT SYMMETRIC AND ASYMMETRIC STORAGE VIRTUALIZATION IN THE NETWORK

The symmetric and asymmetric virtualization models are representatives of storage virtualization in the network. In both approaches it is possible to perform virtualization both on block and on ﬁle level. In both models the virtualization entity that undertakes the separation between physical and logical storage is placed in the storage network in the form of a specialized server or a device. This holds all the meta-information needed for the virtualization. The virtualization entity is therefore also called the metadata controller. Its duties also include the management of storage resources and the control of all storage functions that are offered in addition to virtualization. Symmetric and asymmetric virtualization differ primarily with regard to their distribution of data and control ﬂow. Data ﬂow is the transfer of the application data between the servers and storage devices. The control ﬂow consists of all metadata and control information necessary for virtualization between virtualization entity and storage devices and servers. In symmetric storage virtualization the data ﬂow and the control ﬂow raveldown the same path. By contrast, in asymmetric virtualization the data ﬂow is separated from the control ﬂow.

Symmetric storage virtualization

In symmetric storage virtualization the data and control ﬂow go down the same path This means that the abstraction from physical to logical storage necessary for virtualization must take place within the data ﬂow. As a result, the metadata controller is positioned precisely in the data ﬂow between server and storage devices, which is why symmetric virtualization is also called in-band virtualization. In addition to the control of the virtualization, all data between servers and storage devices now ﬂow through the metadata controller. To this end virtualization is logically structured in two layers: the layer for the management of the logical volumes and the data access layer

1. The volume management layer is responsible for the management and conﬁguration of the storage devices that can be accessed directly or via a storage network and it provides the aggregation of these resources into logical disks.

2. The data access layer makes the logical drives available for access either on block or ﬁle level, depending upon what degree of abstraction is required. These logical drives can thus be made available to the application servers by means of appropriate protocols. In the case of virtualization on block level, this occurs in the form of a virtual disk and in the case of virtualization on ﬁle level it takes place in the form of a ﬁle system. In symmetric virtualization all data ﬂow through the metadata controller, which means that this represents a potential bottleneck. To increase performance, therefore, the metadata controller is upgraded by the addition of a cache. With the use of caching and symmetric virtualization it is even possible to improve the performance of an existing storage network

as long as exclusively write-intensive applications are not used. A further issue is fault-tolerance. A single metadata controller represents a single point

of failure. The use of cluster technology (6.3.2) makes it possible to remove the single point of failure by using several metadata controllers in parallel. In addition, a corresponding load distribution provides a performance increase. However, a conﬁguration failure or a software failure of that cluster can lead to data loss on all virtualized resources. in the case of a network-based virtualization spanning several servers and storage devices, this can halt the activity of a complete data centre

Thus the advantages of symmetric virtualization are evident:

• The application servers can easily be provided with data access both on block and ﬁle level, regardless of the underlying physical storage devices.

• The administrator has complete control over which storage resources are available to which servers at a central point. This increases security and eases the administration.

• Assuming that the appropriate protocols are supported, symmetric virtualization does not place any limit on speciﬁc operating system platforms. It can thus also be used in heterogeneous environments.

• The performance of existing storage networks can be improved by the use of caching and clustering in the metadata controllers.

• The use of a metadata controller means that techniques such as snapshots or mirroring can be implemented in a simple manner, since they control the storage access directly. They can also be used on storage devices such as JBODs or simple RAID arrays that do not provide to these techniques themselves.

The disadvantages of a symmetric virtualization are:

• Each individual metadata controller must be administered. If several metadata controllers are used in a cluster arrangement, then the administration is relatively complex and time-consuming particularly due to the cross-computer data access layer. This disadvantage can, however, be reduced by the use of a central administration console for the metadata controller.

• Several controllers plus cluster technology are indispensable to guarantee the fault tolerance of data access.

• As an additional element in the data path, the controller can lead to performance problems, which makes the use of caching or load distribution over several controllers indispensable.

• It can sometimes be difﬁcult to move the data between storage devices if this is managed by different metadata controllers.

Asymmetric storage virtualization

In contrast to symmetric virtualization, in asymmetric virtualization the data ﬂow is separated from the control ﬂow. This is achieved by moving all mapping operations from logical to physical drives to a metadata controller outside the data path The metadata controller now only has to look after the administrative and control tasks of virtualization, the ﬂow of data takes place directly from the application servers to the storage devices. As a result, this approach is also called out-band virtualization. The communication between metadata controller and agents generally takes place via the LAN (out-band) but can also be realized in-band via the storage network. Hence, in our opinion the terms 'in-band virtualization' and 'out-band virtualization' are a little misleading. Therefore, we use instead the terms 'symmetric virtualization' and 'asymmetric virtualization' to refer to the two network-based virtualization approaches.

Like the symmetric approach, the metadata controller is logically structured in two layers (Figure 7.9). The volume management layer has the same duties as in the symmetric approach. The second layer is the control layer, which is responsible for the communication with an agent software that runs on the servers.

The agent is required in order to enable direct access to the physical storage resources. It is made up of a data access layer with the same tasks as in symmetric virtualization and a control layer Via the latter it loads the appropriate location and access information about the physical storage from the metadata controller when the virtual storage is accessed by the operating system or an application. In this manner, access control to the physical resources is still centrally managed by the metadata controller. An agent need not necessarily run in the memory of the server. It can also be integrated into a host bus adapter. This has the advantage that the server can be freed from the processes necessary for virtualization. In asymmetric storage virtualization – as is also the case for symmetric storage virtualization – advanced storage functions such as snapshots, mirroring or data migration can be realized. The asymmetric model is, however, not so easy to realize as the symmetric one, but performance bottlenecks as a result of an additional device in the data path do not occur here. If we want to increase performance by the use of caching for both application as well as metadata, this caching must be implemented locally on every application server. The caching algorithm to be used becomes very complex since it is a distributed environment, in which every agent holds its own cache

Data inconsistencies as a result of different cache contents for the same underlying physical storage contents must be avoided and error situations prevented in which an application crashes, that still has data in the cache. Therefore, additional mechanisms are necessary to guarantee the consistency of the distributed cache. Alternatively, the installation of a dedicated cache server in the storage network that devotes itself exclusively to the caching of the data ﬂow would also be possible. Unfortunately, such products are not currently (end of 2003) available on the market. Metadata controllers can also be constructed as clusters for the load distribution of the control ﬂow and to increase fault-tolerance. The implementation is, however, easier with the asymmetric approach than it is with the symmetric since only the control ﬂow has to be divided over several computers. In contrast to the symmetric approach, the splitting of the data ﬂow is dispensed with.

The following advantages of asymmetric virtualization can be established:

• Complete control of storage resources by an absolutely centralized management on the

metadata controller.

• Maximum throughput between servers and storage devices by the separation of the

control ﬂow from the data ﬂow, thus avoiding additional devices in the data path.

• In comparison to the development and administration of a fully functional volume manager on every server, the porting of the agent software is associated with a low cost.

• As in the symmetric approach, advanced storage functions such as snapshots or mirroring can be used on storage devices that do not themselves support these functions.

• To improve fault-tolerance, several metadata controllers can be brought together to form a cluster. This is easier than in the symmetric approach, since no physical connection from the servers to the metadata controllers is necessary for the data ﬂow.

The disadvantages of asymmetric virtualization are:

• A special agent software is required on the servers or the host bus adapters. This can make it more difﬁcult to use this approach in heterogeneous environments, since such software or a suitable host bus adapter must be present for every platform. Incompatibilities between the agent software and existing applications may sometimes make the use of asymmetric virtualization impossible.

• The agent software must be absolutely stable in order to avoid errors in storage accesses.In situations where there are many different platforms to be supported, this is a very complex development and testing task.

• The development cost increases further if the agent software and the metadata controller

are also to permit access on ﬁle level in addition to access on block level.

• A performance bottleneck can arise as a result of the frequent communication between agent software and metadata controller. These performance bottlenecks can be remedied by the caching of the physical storage information.

• Caching to increase performance requires an ingenious distributed caching algorithm to avoid data inconsistencies. A further option would be the installation of a dedicated cache server in the storage network.

Buy Vmware Interview Questions & Storage Interview Questions for $150. 100+ Interview Questions with Answers.Get additional free bonus reference materials. You can download immediately even if its 1 AM. You will recieve download link immediately after payment completion.You can buy using credit card or paypal. ----------------------------------------- Get 100 Storage Interview Questions. 500+ Software Testing Interview Questions with Answers are also available plz email roger.smithson1@gmail.com if you are interested to buy them. 200 Storage Interview Questions word file @ $97
Vmware Interview Questions with Answers $100 Fast Download Immediately after payment.: Get 100 Technical Interview Questions with Answers for $100. ------------------------------------------ For $24 Get 100 Vmware Interview Questions only(No Answers) Vmware Interview Questions - 100 Questions from people who attended Technical Interview related to Vmware virtualization jobs ($24 - Questions only) ------------------------------------------- Virtualization Video Training How to Get High Salary Jobs Software Testing Tutorials Storage Job Openings Interview Questions

Subscribe To Blog Feed

Your Name:
Your E-Mail:

Your Name:
Your E-Mail:

Your Name:
Your E-Mail:

Representation

Ø Graphical user interface and interfaces to external Systems

Ø Converts user Interactions into applications calls

Ø Represents the returns of the applications

Applications

Ø Carries Out operations initiated by users in the Representation tier

Ø Reads processes and deletes dara that the data tier stores

Ø Stores applications data permanently on block-oriented storage devices

Data

Ø Converts block oriented storage into tables(databases)or file systems

Ø Stores applications data permanently on block-oriented storage devices