Free Storage tutors Failure of virtualization in the storage network and case study on the Failure of a data centre
Failure of virtualization in the storage network
Virtualization in the storage network is currently (2003) treated as the solution for the consolidation of storage resources in large storage networks, with which the storage resources of several disk subsystems can be centrally managed (Chapter 5). However, it is necessary to be clear about the fact that precisely such a central virtualization instance represents a single point of failure. Even if the virtualization instance is protected against the failure of a single component by measures such as clustering, the data of an entire data centre can be lost as a result of configuration errors or software errors in the virtualization instance,
since the storage virtualization aims to span all the storage resources of a data centre. Therefore, the same considerations apply for the protection of a virtualization instance positioned in the storage network (Section 5.7, 'Symmetric and Asymmetric Storage Virtualization in the Network') against the failure as the measures to protect against the failure of a disk subsystem discussed in the previous section. Therefore, the mirroring of important data from the server via two virtualization instances should also be considered in the case of virtualization in the storage network.
Failure of a data centre based upon the case study
'protection of an important database' The measures of server clustering, redundant I/O buses and disk subsystem mirroring (volume manager mirroring or remote mirroring) discussed above protect against the failure of a component within a data centre. However, these measures are useless in the event of the failure of a complete data centre (fire, water damage). To protect against the failure of a data centre it is necessary to duplicate the necessary infrastructure in a
back-up data centre for the operation of the most important applications. Figure 6.22 shows the interaction between the primary data centre and back-up data
centre based upon the case study 'protection of an important database'. In the case study, all the measures discussed in this section for protection against the failure of a component are used. In the primary data centre all components are designed with built-in redundancy. The primary server is connected via two independent Fibre Channel SANs (Dual SAN) to two disk subsystems, on which the data of the database lies. Dual SANs have the advantage that even in the event of a serious fault in a SAN (defective switch, which corrupts the SAN with corrupt frames), the connection via the other SAN remains intact. The redundant paths between servers and storage devices are managed by appropriate multipathing software. Each disk subsystem is configured using a RAID procedure so that the failure of individual physical disks within the disk subsystem in question can be rectified. In addition, the data is mirrored in the volume manager so that the system can withstand the failure of a disk subsystem. The two disk subsystems are located at a distance from one another in the primary data centre. They are separated from one another by a fire protection wall. Like the disk subsystems, the two servers are spatially separated by a fire protection
wall. In normal operation the database runs on one server; in the meantime the second server is used for other, less important tasks. If the primary server fails, the cluster software automatically starts the database on the second computer. It also terminates all other activities on the second computer, thus making all its resources fully available to the main application. Remote mirroring takes place via an IP connection. Mirroring utilizes knowledge of the
data structure of the database: in a similarmanner to journaling in file systems (Section 4.1.2), databases write each change into a log file before then integrating it into the actual data set. In the example, only the log files are mirrored in the back-up data centre. The complete data set was only transferred to the back-up data centre once at the start of mirroring. Thereafter this data set is only ever adjusted with the aid of the log files. This has two advantages: the powerful network connection between the primary data centre and the remote back-up data centre is very expensive. The necessary data rate for this connection can be halved by only transferring the changes to the log file. This cuts costs. In the back-up data centre the log files are integrated into the data set after a delay of
two hours. As a result, a copy of the data set that is two hours old is always available in the back-up data centre. This additionally protects against application errors: if a table space is accidentally deleted in the database then the user has two hours to notice the error and interrupt the copying of the changes in the back-up data centre. A second server and a second disk subsystem are also operated in the back-up data centre, which in normal operation can be used as a test system or for other, less time- critical tasks such as data mining. If the operation of the database is moved to the back-up data centre, these activities are suspended (Figure 6.23). The second server is configured as a stand-by server for the first server in the cluster; the data of the first disk subsystem is mirrored to the second disk subsystem via the volume manager. Thus a completely redundant system is available in the back-up data centre. The realization of the case study discussed here is possible with current technology. However, it comes at a price; for most applications this cost will certainly not be justified. The main point of the case study is to highlight the possibilities of storage networks. In practice you have to decide how much failure protection is necessary and how uch this may cost. At the end of the day, protection against the loss of data or the temporary non-availability of applications must cost less than the data loss or the temporary non- availability of applications itself.
No comments:
Post a Comment