Learn more on Application and Management of Storage Networks Data sharing and Availability of data
In contrast to device sharing (disk storage pooling, tape library partitioning and tape library sharing) discussed earlier, in which several servers share a storage device at block level, data sharing is the use of data by several applications. In data sharing we differentiate between data copying and real time data sharing.
In data copying, as the name suggests, data is copied. This means that several versions of the data are kept, which is fundamentally a bad thing: each copy of the data requires storage space and care must be taken to ensure that the different versions are copied according to the requirements of the applications at the right times. Errors occur in particular in the maintenance of the various versions of the data set, so that subsequent applications repeatedly work with the wrong data. Despite these disadvantages, data copying is used in production environments. The reasons for this can be:
• Generation of test data Copies of the production data are helpful for the testing of new versions of applications and operating systems and for the testing of new hardware. In Section 1.3 we used the example of a server upgrade to show how test data for the testing of new hardware could be generated using instant copies in the disk subsystem. The important point here was that the applications are briefly interrupted so that the consistency of the copied data is guaranteed. As an alternative to instant copies the test data could also be generated using snapshots in the file system.
• Data protection (back-up) The aim of data protection is to keep up-to-date copies of data at various locations as a precaution to protect against the loss of data by hardware or operating errors. Data protection is an important application in the field of storage networks. It is therefore dealt with separately in Chapter 7.
• Data replication Data replication is the name for the copying of data for access to data on computers that are far apart geographically. The objective of data replication is to accelerate data access and save network capacity. There are many applications that automate the replication of data. Within the World Wide Web the data is replicated at two points: first, every web browser caches local data in order to accelerate access to pages called up frequently by an individual user. Second, many Internet Service Providers (ISPs) install a so-called proxy server. This caches the contents of web pages that are called up by many users. Other examples of data replication are the mirroring of FTP servers (FTP mirror), replicated file sets in the Andrew File System (AFS) or the Distributed File
System (DFS) of the Distributed Computing Environment (DCE), and the replication of mail databases.
• Conversion into more efficient data formats It is often necessary to convert data into a different data format because certain calculations are cheaper in the new format. In the days before the pocket calculator logarithms were often used for calculations because, for example, the addition of logarithms yielded the same result as the multiplication in the origin space only more simply. For the same reasons, in modern IT systems data is converted to different data formats. In data mining, for example, data from various sources is brought together in a database and converted into a data format in which the search for regularities in the data set is simpler.
• Conversion of incompatible data formats A further reason for the copying of data is the conversion of incompatible data formats. A classic example is when applications originally developed independently of one another are being brought together over time. Real-time data sharing represents an alternative to data copying. In real-time data sharing all applications work on the same data set. Real-time data sharing saves storage space, avoids the cost and errors associated with the management of several data versions and all applications work on the up-to-date data set. For the reasons mentioned above for data copying it is particularly important to replace the conversion of incompatible data sets by real-time data sharing. The logical separation of applications and data is continued in the implementation. In general, applications and data in the form of file systems and databases are installed on different computers. This physical separation aids the adaptability, and thus the maintain- ability, of overall systems. Figure 6.8 shows several applications that work on the same data set, with applications and data being managed independently of one another. This has the advantage that new applications can be introduced without existing applications
having to be changed. However, in the configuration shown in Figure 6.8 the applications may generate so much load that a single data server becomes a bottleneck and the load has to be divided amongst several data servers. There are two options for resolving this bottleneck without data copying: first, the data set can be partitioned (Figure 6.9) by splitting it over several data servers. If this is not sufficient, then several parallel access paths can be established to the same data set (Figure 6.10). Parallel databases and shared disk file systems such as the General Parallel File System (GPFS) introduced in Section 4.3.1 provide the functions necessary for this.
AVAILABILITY OF DATA
Nowadays, the availability of data is an important requirement made of IT systems. This section discusses how the availability of data and applications can be maintained in various fault situations. Individually, the following will be discussed: the failure of an I/O bus (Section 6.3.1), the failure of a server (Section 6.3.2), the failure of a disk subsystem (Section 6.3.3), and the failure of a storage virtualization instance which is placed in the storage network (Section 6.3.4). The case study 'protection of an important database' discusses a scenario in which the protective measures that have previously been discussed are combined in order to protect an application against the failure of an entire data centre
No comments:
Post a Comment