High Salary Software Jobs: SHARED DISK FILE SYSTEMS and Case study: the General Parallel File System (GPFS)

SHARED DISK FILE SYSTEMS and Case study: the General Parallel File System (GPFS)

The greatest performance limitation of NAS servers and self-conﬁgured ﬁle servers is that each ﬁle must pass through the internal buses of the ﬁle servers wice before the ﬁles arrive at the computer where they are required (Figure 4.7). Even DAFS and its alternatives like NFS over RDMA cannot get around this 'eye of the needle'. With storage networks it is possible for several computers to access a storage device simultaneously. The I/O bottleneck in the ﬁle server can be circumvented if all clients fetch the ﬁles from the disk directly via the storage network (Figure 4.11). The difﬁculty here: today's ﬁle systems consider their storage devices as local. They concentrate upon the caching and the aggregation of I/O operations; they increase perfor- mance by reducing the number of disk accesses needed. So-called shared disk ﬁle systems can deal with this problem. Integrated into them are special algorithms that synchronize the simultaneous accesses of several computers to common disks. As a result, shared disk ﬁle systems make it possible for several computers to access ﬁles simultaneously without causing version conﬂict. To achieve this, shared disk ﬁle systems must synchronize write accesses in addition to the functions of local ﬁle systems. It should be ensured locally that new ﬁles are written to different areas of the hard disk. It must also be ensured that cache entries are marked as invalid. Let us assume that two computers each have a ﬁle in their local cache and one of the computers changes the ﬁle. If the second computer subsequently reads the ﬁle

again it may not take the now invalid copy from the cache. The great advantage of shared disk ﬁle systems is that the computers accessing ﬁles and the storage devices in question now communicate with each other directly. The diversion via a central ﬁle server, which represents the bottleneck in conventional network ﬁle systems and also in DAFS and RDMA-enabled NFS, is no longer necessary. In addition, the load on the CPU in the accessing machine is reduced because com-

munication via Fibre Channel places less of a load on the processor than communication via IP and Ethernet. The sequential access to large ﬁles can thus more than make up for the extra cost for access synchronization. On the other hand, in applications with many small ﬁles or in the case of many random accesses within the same ﬁle, we should check whether the use of a shared disk ﬁle system is really worthwhile. One side-effect of ﬁle sharing over the storage network is that the availability of the shared disk ﬁle system can be better than that of conventional network ﬁle systems. This is because a central ﬁle server is no longer needed. If a machine in the shared disk ﬁle system cluster fails, then the other machines can carry on working. This means that the availability of the underlying storage devices largely determines the availability of shared disk ﬁle systems.

Case study: the General Parallel File System (GPFS)

We have decided at this point to introduce a product of our employer, IBM, for once. The General Parallel File System (GPFS) is a shared disk ﬁle system that has for many years been used on cluster computers of type RS/6000 SP (currently IBM eServer Cluster 1600). We believe that this section on GPFS illustrates the requirements of a shared disk ﬁle system very nicely. The reason for introducing GPFS at this point is quite simply that it is the shared disk ﬁle system that we know best. The RS/6000 SP is a cluster computer. It was, for example, used for Deep Blue, the computer that beat the chess champion Gary Kasparov. An RS/6000 SP consists of up to 512 conventional AIX computers that can also be connected together via a so-called high performance switch (HPS). The individual computers of an RS/6000 SP are also called nodes. Originally GPFS is based upon so-called Virtual Shared Disks (Figure 4.12). The VSD

subsystem makes hard disks that are physically connected to a computer visible to other nodes of the SP. This means that several nodes can access the same physical hard disk. The VSD subsystem ensures that there is consistency at block level, which means that a block is either written completely or not written at all. From today's perspective we could say that VSDs emulate the function of a storage network. In more recent versions of GPFS the VSD layer can be replaced by an SSA SAN or a Fibre Channel SAN.

GPFS uses the VSDs to ensure the consistency of the ﬁle system, i.e. to ensure that the metadata structure of the ﬁle system is maintained. For example, no ﬁle names are allocated twice. Furthermore, GPFS realizes some RAID functions such as the striping and mirroring of data and metadata.

Figure 4.12 illustrates two beneﬁts of shared disk ﬁle systems. First, they can use RAID 0 to stripe the data over several hard disks, host bus adapters and even disk subsystems, which means that shared disk ﬁle systems can achieve a very high throughput. All applications that have at least a partially sequential access pattern proﬁt from this. Second, the location of the application becomes independent of the location of the data. In Figure 4.12 the system administrator can start applications on the four GPFS nodes that have the most resources (CPU, main memory, buses) available at the time. A so-called workload manager can move applications from one node to the other depending upon load. In conventional ﬁle systems this is not possible. Instead, applications have to run

on the nodes on which the ﬁle system is mounted since access via a network ﬁle system such as NFS or CIFS is generally too slow. The unusual thing about GPFS is that there is no individual ﬁle server. Each node in the GPFS cluster can mount a GPFS ﬁle system. For end users and applications the GPFS ﬁle system behaves – apart from its signiﬁcantly better performance – like a conventional local ﬁle system. GPFS introduces the so-called node set as an additional management unit. Several node sets can exist within a GPFS cluster, with a single node only ever being able to belong to a maximum of one node set (Figure 4.13). GPFS ﬁle systems are only ever visible within a node set. Several GPFS ﬁle systems can be active in every node set. The GPFS Daemon must run on every node in the GPFS cluster. GPFS is realized as distributed application, with all nodes in a GPFS cluster having the same rights and duties. In addition, depending upon the conﬁguration of the GPFS cluster, the GPFS Daemon must take on further administrative functions over and above the normal tasks of a ﬁle system.

• Conﬁguration Manager

In every node set one GPFS Daemon takes on the role of the Conﬁguration Manager. The Conﬁguration Manager determines the File System Manager for every ﬁle system and monitors the so-called quorum. The quorum is a common procedure in distributed systems that maintains the consistency of the distributed application in the event of a network split. For GPFS more than half of the nodes of a node set must be active. If the quorum is lost in a node set, the GPFS ﬁle system is automatically deactivated (unmount) on all nodes of the node set. • File System Manager Every ﬁle system has its own File System Manager. Its tasks include the following: – conﬁguration changes of the ﬁle system; – management of the hard disk blocks; – token administration; – management and monitoring of the quota; and – security services. Token administration is particularly worth highlighting. One of the design objectives

of GPFS is the support of parallel applications that read and modify common ﬁles from different nodes. Like every ﬁle system, GPFS buffers ﬁles or ﬁle fragments inorder to increase performance. GPFS uses a token mechanism in order to synchronize the cache entries on various computers in the event of parallel write and read accesses (Figure 4.14). However, this synchronization only ensures that GPFS behaves precisely in the same way as a local ﬁle system that can only be mounted on one computer. This means that in GPFS – as in every ﬁle system – parallel applications still have to synchronize the accesses to common ﬁles, for example, by means of locks. • Metadata Manager Finally, one GPFS Daemon takes on the role of the Metadata Manager for every

open ﬁle. GPFS guarantees the consistency of the metadata of a ﬁle because only the Metadata Manager may change a ﬁle's metadata. Generally, the GPFS Daemon of the node on which the ﬁle has been open for the longest is the Metadata Manager for the ﬁle. The assignment of the Metadata Manager of a ﬁle to a node can change in relation to the access behaviour of the applications. The example of GPFS shows that a shared disk ﬁle system has to achieve a great deal

more than a conventional local ﬁle system, which is only managed on one computer. GPFS has been used successfully on the RS/6000 SP for some years. The complexity of shared disk ﬁle systems is illustrated by the fact that IBM is only gradually transferring the GPFS ﬁle system to other operating systems such as Linux, which is strategically supported by IBM, and to new I/O technologies such as Fibre Channel.

High Salary Software Jobs

Friday, March 28, 2008

SHARED DISK FILE SYSTEMS and Case study: the General Parallel File System (GPFS)

No comments:

Blog Archive

Your Name:
Your E-Mail:

Your Name:
Your E-Mail:

Your Name:
Your E-Mail: