High Salary Software Jobs: Learn How avoid Failure of an I/O bus Protection against the failure of an entire server Failure of a disk subsystem

How avoid Failure of an I/O bus Protection against the failure of an entire server Failure of a disk subsystem

Protection against the failure of an I/O bus is relatively simple and involves the installationof several I/O buses between server and storage device. Figure 6.11 shows a scenario for SCSI and Figure 6.12 shows one for Fibre Channel. In Figure 6.12 protection against the failure of an I/O bus is achieved by two storage networks that are independent of one another. Such separate storage networks are also known as a 'dual storage network' or 'dual SAN'. The problem here: operating systems manage storage devices via the triple host bus adapter, SCSI target ID and SCSI LUN. If, for example, there are two connections from a server to a disk subsystem, the operating system recognizes the same disk twice So-called multipathing software recognizes that a storage device can be reached over several paths. Figure 6.14 shows how multipathing software reintegrates the disk found twice in Figure 6.13 to form a single disk again. Multipathing software can act at various points depending upon the product:

• in the volume manager (Figure 6.14, right);

• as an additional virtual device driver between the volume manager and the device driver of the disk subsystem (Figure 6.14, left);

• in the device driver of the disk subsystem;

• in the device driver of the host bus adapter card.

Fibre Channel plans to realize this function in the FC-3 layer. However, this part of the Fibre Channel standard has not yet been realized in real products. We believe it is rather unlikely that these functions will ever actually be realized within the Fibre Channel protocol stack. In the past the principle of keeping the network protocol as simple as possible and realizing the necessary intelligence in the end devices has prevailed

in networks. The multipathing software currently available on the market differs in the mode in which it uses redundant I/O buses:

• Active/passive mode In active/passive mode the multipathing software manages all I/O paths between server and storage device. Only one of the I/O paths is used for actual data trafﬁc. If the active I/O path fails, the multipathing software activates one of the other I/O paths in

order to send the data via this one instead.

• Active/active modeIn active/active mode the multipathing software uses all available I/O paths between server and storage device. It distributes the load evenly over all available I/O channels. In addition, the multipathing software continuously monitors the availability of the

individual I/O paths; it activates or deactivates the individual I/O paths depending upon their availability. It is obvious that the active/active mode utilizes the underlying hardware better than the active/passive mode, since it combines fault-tolerance with load distribution.

Failure of a server

Protection against the failure of an entire server is somewhat trickier. The only thing that can help here is to provide a second server that takes over the tasks of the actual application server in the event of its failure. So-called cluster software monitors the state of the two

computers and starts the application on the second computer if the ﬁrst computer fails. Figure 6.15 shows a cluster for a ﬁle server, the disks of which are connected over Fibre Channel SAN. Both computers have access to the disks, but only one computer actively accesses them. The ﬁle system stored on the disks is exported over a network ﬁle system such as NFS or CIFS. To this end a virtual IP address is conﬁgured for the

cluster. Clients access the ﬁle system via this virtual IP address. If the ﬁrst computer fails, the cluster software automatically initiates the following steps:

1. Activation of the disks on the stand-by computer.

2. File system check of the local ﬁle system stored on the disk subsystem.

3. Mounting of the local ﬁle system on the stand-by computer.

4. Transfer of the virtual cluster IP address.

5. Export of the local ﬁle system via the virtual cluster IP address.

This process is invisible to clients of the ﬁle server apart from the fact that they cannot access the network ﬁle system for a brief period so ﬁle accesses may possibly have to be repeated (Figure 6.16). Server clustering and redundant I/O buses are two measures that are completely inde- pendent of each other. In practice, as shown in Figure 6.17, the two measures are nevertheless combined. The multipathing software reacts to errors in the I/O buses signiﬁcantly more quickly than the cluster software so the extra cost of the redundant I/O buses is

usually justiﬁed.

Failure of a disk subsystem

In Chapter 2 we discussed how disk subsystems implement a whole range of measures to increase their own fault-tolerance. Nevertheless, disk subsystems can sometimes fail, for example in the event of physical impairments such as ﬁre or water damage or due to faults that should not happen at all according to the manufacturer. The only thing that helps in the event of faults in the disk subsystem is to mirror the data on two disk subsystems. Mirroring (RAID 1) is a form of virtualization, for which various realization locations were discussed in Section 5.1. In contrast to classical RAID 1 within the disk subsystem for protection against its failure, the data is mirrored on two different disk subsystems,

which are wherever possible separated by a ﬁre protection wall and connected to two independent electric circuits. From the point of view of reducing the load on the server, the realization of the mirroring by the disk subsystem in the form of remote mirroring is optimal (Figure 6.18, cf. also Section 2.7.2 and Section 5.1.) From the point of view of fault-tolerance, however, remote mirroring through the disk subsystem represents a single point of failure: if the data in the disk subsystem is falsiﬁed on the way to the disk subsystem (controller faults, connection port faults), the copy of the data is also erroneous. Therefore, from the point of view of fault-tolerance, mirroring in the volume manager or in the application itself is optimal (Figure 6.19). In this approach the data is written to two different disk subsystems via two different physical I/O paths. A further advantage of volume manager mirroring compared to remote mirroring is due to the way the two variants are integrated into the operating system. Volume manager mirroring is a solid component of every good volume manager: the volume manager reacts

automatically to the failure and the restarting of a disk subsystem. On the other hand, today's operating systems in the Open System ﬁeld are not yet good at handling copies of disks created by a disk subsystem. Switching to such a copy generally requires manual support. Although, technically, an automated reaction to the failure or the restarting of a disk subsystem is possible, this currently (2003) requires specially written scripts due to lack of integration in the operating system.

On the other hand, there are some arguments in favour of remote mirroring. In addition to the performance beneﬁts discussed above, we should also mention the fact that remote mirroring is supported over greater distances than volume manager mirroring. As a rule of thumb, volume manager can be used up to a maximum distance of six to ten kilometres between server and disk subsystem; for greater distances remote mirroring currently has to be used. Figure 6.20 shows how volume manager mirroring, server clustering and redundant I/O

buses can be combined. In this conﬁguration the management of the disks is somewhat more complicated: each server sees each disk made available by the disk subsystem four

times because each host bus adapter ﬁnds each disk over two connection ports of the disk subsystem. In addition, the volume manager mirrors the data on two disk subsystems. Figure 6.21 shows how the software in the server brings the disks recognized by the operating system back together again: the ﬁle system writes the data to a logical disk provided by the volume manager. The volume manager mirrors the data on two different virtual disks, which are managed by the multipathing software. The multipathing software also manages the four different paths of the two disks. It is not visible here whether the disks exported from the disk subsystem are also virtualized within the disk subsystem. The conﬁguration shown in Figure 6.20 offers good protection against the failure of various components, whilst at the same time providing a high level of availability of data and applications. However, this solution comes at a price. Therefore, in practice, sometimes one and sometimes another protective measure is dispensed with for cost reasons. Often, for example, the following argument is used: 'The data is mirrored within the disk subsystem by RAID and additionally protected by means of network back-up.That should be enough.'

High Salary Software Jobs

Monday, March 31, 2008

Learn How avoid Failure of an I/O bus Protection against the failure of an entire server Failure of a disk subsystem

No comments:

Blog Archive

Your Name:
Your E-Mail:

Your Name:
Your E-Mail:

Your Name:
Your E-Mail: