KNOW MORE ABOUT SYMMETRIC AND ASYMMETRIC STORAGE VIRTUALIZATION IN THE NETWORK
The symmetric and asymmetric virtualization models are representatives of storage virtualization in the network. In both approaches it is possible to perform virtualization both on block and on file level. In both models the virtualization entity that undertakes the separation between physical and logical storage is placed in the storage network in the form of a specialized server or a device. This holds all the meta-information needed for the virtualization. The virtualization entity is therefore also called the metadata controller. Its duties also include the management of storage resources and the control of all storage functions that are offered in addition to virtualization. Symmetric and asymmetric virtualization differ primarily with regard to their distribution of data and control flow. Data flow is the transfer of the application data between the servers and storage devices. The control flow consists of all metadata and control information necessary for virtualization between virtualization entity and storage devices and servers. In symmetric storage virtualization the data flow and the control flow raveldown the same path. By contrast, in asymmetric virtualization the data flow is separated from the control flow.
Symmetric storage virtualization
In symmetric storage virtualization the data and control flow go down the same path This means that the abstraction from physical to logical storage necessary for virtualization must take place within the data flow. As a result, the metadata controller is positioned precisely in the data flow between server and storage devices, which is why symmetric virtualization is also called in-band virtualization. In addition to the control of the virtualization, all data between servers and storage devices now flow through the metadata controller. To this end virtualization is logically structured in two layers: the layer for the management of the logical volumes and the data access layer
1. The volume management layer is responsible for the management and configuration of the storage devices that can be accessed directly or via a storage network and it provides the aggregation of these resources into logical disks.
2. The data access layer makes the logical drives available for access either on block or file level, depending upon what degree of abstraction is required. These logical drives can thus be made available to the application servers by means of appropriate protocols. In the case of virtualization on block level, this occurs in the form of a virtual disk and in the case of virtualization on file level it takes place in the form of a file system. In symmetric virtualization all data flow through the metadata controller, which means that this represents a potential bottleneck. To increase performance, therefore, the metadata controller is upgraded by the addition of a cache. With the use of caching and symmetric virtualization it is even possible to improve the performance of an existing storage network
as long as exclusively write-intensive applications are not used. A further issue is fault-tolerance. A single metadata controller represents a single point
of failure. The use of cluster technology (6.3.2) makes it possible to remove the single point of failure by using several metadata controllers in parallel. In addition, a corresponding load distribution provides a performance increase. However, a configuration failure or a software failure of that cluster can lead to data loss on all virtualized resources. in the case of a network-based virtualization spanning several servers and storage devices, this can halt the activity of a complete data centre
Thus the advantages of symmetric virtualization are evident:
• The application servers can easily be provided with data access both on block and file level, regardless of the underlying physical storage devices.
• The administrator has complete control over which storage resources are available to which servers at a central point. This increases security and eases the administration.
• Assuming that the appropriate protocols are supported, symmetric virtualization does not place any limit on specific operating system platforms. It can thus also be used in heterogeneous environments.
• The performance of existing storage networks can be improved by the use of caching and clustering in the metadata controllers.
• The use of a metadata controller means that techniques such as snapshots or mirroring can be implemented in a simple manner, since they control the storage access directly. They can also be used on storage devices such as JBODs or simple RAID arrays that do not provide to these techniques themselves.
The disadvantages of a symmetric virtualization are:
• Each individual metadata controller must be administered. If several metadata controllers are used in a cluster arrangement, then the administration is relatively complex and time-consuming particularly due to the cross-computer data access layer. This disadvantage can, however, be reduced by the use of a central administration console for the metadata controller.
• Several controllers plus cluster technology are indispensable to guarantee the fault tolerance of data access.
• As an additional element in the data path, the controller can lead to performance problems, which makes the use of caching or load distribution over several controllers indispensable.
• It can sometimes be difficult to move the data between storage devices if this is managed by different metadata controllers.
Asymmetric storage virtualization
In contrast to symmetric virtualization, in asymmetric virtualization the data flow is separated from the control flow. This is achieved by moving all mapping operations from logical to physical drives to a metadata controller outside the data path The metadata controller now only has to look after the administrative and control tasks of virtualization, the flow of data takes place directly from the application servers to the storage devices. As a result, this approach is also called out-band virtualization. The communication between metadata controller and agents generally takes place via the LAN (out-band) but can also be realized in-band via the storage network. Hence, in our opinion the terms 'in-band virtualization' and 'out-band virtualization' are a little misleading. Therefore, we use instead the terms 'symmetric virtualization' and 'asymmetric virtualization' to refer to the two network-based virtualization approaches.
Like the symmetric approach, the metadata controller is logically structured in two layers (Figure 7.9). The volume management layer has the same duties as in the symmetric approach. The second layer is the control layer, which is responsible for the communication with an agent software that runs on the servers.
The agent is required in order to enable direct access to the physical storage resources. It is made up of a data access layer with the same tasks as in symmetric virtualization and a control layer Via the latter it loads the appropriate location and access information about the physical storage from the metadata controller when the virtual storage is accessed by the operating system or an application. In this manner, access control to the physical resources is still centrally managed by the metadata controller. An agent need not necessarily run in the memory of the server. It can also be integrated into a host bus adapter. This has the advantage that the server can be freed from the processes necessary for virtualization. In asymmetric storage virtualization – as is also the case for symmetric storage virtualization – advanced storage functions such as snapshots, mirroring or data migration can be realized. The asymmetric model is, however, not so easy to realize as the symmetric one, but performance bottlenecks as a result of an additional device in the data path do not occur here. If we want to increase performance by the use of caching for both application as well as metadata, this caching must be implemented locally on every application server. The caching algorithm to be used becomes very complex since it is a distributed environment, in which every agent holds its own cache
Data inconsistencies as a result of different cache contents for the same underlying physical storage contents must be avoided and error situations prevented in which an application crashes, that still has data in the cache. Therefore, additional mechanisms are necessary to guarantee the consistency of the distributed cache. Alternatively, the installation of a dedicated cache server in the storage network that devotes itself exclusively to the caching of the data flow would also be possible. Unfortunately, such products are not currently (end of 2003) available on the market. Metadata controllers can also be constructed as clusters for the load distribution of the control flow and to increase fault-tolerance. The implementation is, however, easier with the asymmetric approach than it is with the symmetric since only the control flow has to be divided over several computers. In contrast to the symmetric approach, the splitting of the data flow is dispensed with.
The following advantages of asymmetric virtualization can be established:
• Complete control of storage resources by an absolutely centralized management on the
metadata controller.
• Maximum throughput between servers and storage devices by the separation of the
control flow from the data flow, thus avoiding additional devices in the data path.
• In comparison to the development and administration of a fully functional volume manager on every server, the porting of the agent software is associated with a low cost.
• As in the symmetric approach, advanced storage functions such as snapshots or mirroring can be used on storage devices that do not themselves support these functions.
• To improve fault-tolerance, several metadata controllers can be brought together to form a cluster. This is easier than in the symmetric approach, since no physical connection from the servers to the metadata controllers is necessary for the data flow.
The disadvantages of asymmetric virtualization are:
• A special agent software is required on the servers or the host bus adapters. This can make it more difficult to use this approach in heterogeneous environments, since such software or a suitable host bus adapter must be present for every platform. Incompatibilities between the agent software and existing applications may sometimes make the use of asymmetric virtualization impossible.
• The agent software must be absolutely stable in order to avoid errors in storage accesses.In situations where there are many different platforms to be supported, this is a very complex development and testing task.
• The development cost increases further if the agent software and the metadata controller
are also to permit access on file level in addition to access on block level.
• A performance bottleneck can arise as a result of the frequent communication between agent software and metadata controller. These performance bottlenecks can be remedied by the caching of the physical storage information.
• Caching to increase performance requires an ingenious distributed caching algorithm to avoid data inconsistencies. A further option would be the installation of a dedicated cache server in the storage network.
No comments:
Post a Comment