Data & System Availability

Article Index
Data & System Availability
Storage options
Indexing services
Virtualization
Failover sites
System Portability
Backup
More information
All Pages

 

The dynamics of applications and desktops are making them location, device and time independent. Data and systems have completely different availability requirements. They are typically stored in a datacenter that is not dynamically provisioned, although we may see that change in the near future with the upcoming cloud computing initiatives.

 

Data & System Availability Solutions Schema

pqr data  system availability solutions

In a modern ict infrastructure, users are working with applications. The applications use data and the data is being delivered by servers and stored and being backupped centrally. To create an overview of the data center products that are needed for datacenter solutions and to easily show the used components, we have developed the Data & System Availability solutions scheme as shown below. 

The explination of the various parts follows below. To download the whitepaper, please go to the data & system availability whitepaper.

 

 

 

Dynamic Datacenter

Often the dynamic datacenter is being named as ‘the Cloud'. To build and install a complete dynamic datacenter, the basic infrastructure needs to be set up accurately. Distinctive for a dynamic datacenter are the workflows to be automated as completely as possible. This way the costs are completely clarified and therefor the ROI will be directly measurable. This way the dynamic datacenter is a strategic tool and can be additional to business decissions. Key topic for a dynamisch datacenter are the management tools that ascent the datacenter from a static to a flexible and dynamic infrastructure. To show you a complete view of the Dynamic Datacenter, we have developed a special landing page. We like to refer to www.dynamischdatacenter.com (language dutch) where you can get a clear view of this datacenter.

 

Servers

It's servers that provide users or applications with the services they need. Services can be anything: web services, file and print services, authentication, database services, etc. In a traditional datacenter, these services are mostly executed on physical servers. These physical servers come with a lot of resources that most services don't need. They either have way too much storage, CPU power and memory, or too little. When there's not enough resources available, adding more usually adds too much of the resource, over-dimensioning them. Also, physical servers with local storage have a few disadvantages that limit their availability. If a physical server fails, the service is no longer available. A new server has to be setup, data restored and settings reconfigured. All in all a process that could take up to several days.

 

Storage

To cope with these availability problems, it makes sense to start with centralizing the storage. This makes it easier to allocate the right amount of storage to a service and makes it easier for the service to access it from another location, thus enhancing its availability. Centralizing storage also has some disadvantages. All storage is now on one system that becomes a new single point of failure. If it fails, the whole infrastructure fails. So this central storage has to be redundant in every aspect. It needs redundant connections, redundant switches, redundant power, redundant hard disks, redundant everything. This is what makes a Storage Area Network (SAN) more expensive than local storage.

 

The SAN infrastructure

Connectivity to the SAN can be divided into two main groups: Fiber Channel (FC) and Ethernet.

Where Fiber Channel provides the best performance, it's also the most expensive. A very valid question in designing a storage infrastructure therefore is 'does the customer really need that high end performance?'. The alternatives aren't really that far behind anymore. Ethernet based infrastructures are less expensive because connectivity takes place over regular Ethernet switches and regular Network Interface Cards (NICs). Not too long ago, iSCSI was the main storage protocol to be used over Ethernet. It allows LUNs to be presented as full disks to a host. With the upcoming virtualization technology however, NAS is a strong contestant now too. Whether it's NFS or CIFS, a host simply connects to a network share and stores it's data on the file system that the storage provides. This flexibility has some disadvantages though. Hosts are no longer managing the storage and proprietary file systems like VMFS don't work on it. On the other hand, a storage solution with a smart file system like NetApp's Write Anywhere File Layout (WAFL) makes it very easy, with the right toolset, to work with (consistent!) snapshots.


Thin provisioning

With a SAN, data per gigabyte is more expensive than with local storage. The advantages of having it available independent of the servers make up for a lot of the cost but it's still better to be conservative with allocating storage. Application developers or server administrators tend to ask for more storage than they actually need.

One solution to this problem is to give them the storage they need, but only actually store what they really use. This is called 'thin provisioning'. It's a smart way to dynamically size the LUN on the array as it's needed.

figuur 2 - Thin Provisioning

Figure 2: Thin Provisioning

Linked Clones

Another way to save storage is to use linked clones. The principle of this technique is that it provides one set of data to multiple virtual machines, while keeping track of the differences between them and storing those differences in a separate location. When this is done on the array, the performance impact is negligible.

A physical server can also provide virtual machines with linked clone disks. This is a little bit slower and does take some CPU resources away from the VMs but it doesn't need an intelligent storage array and is also a very good solution.

figuur 3 - linked clones

Figure 3: Linked Clones

Deduplication

At the moment, deduplication is mainly used in backup scenario's. That means that data is first stored on a main storage system and at backup time deduplicated at a separate system or a different tier in the storage system. The reason it is not used on active data yet is mainly because the deduplication process is a very calculation intensive process that, at the moment, simply isn't fast enough for modern storage demands.

The deduplication process works by first accepting all data. It then either inline or in a background process, first compresses it and then at a block level, checks if that block already exists. If it does, it simply points to that block, if not, the new block is stored. This can reduce the backup data size of multiple backups by 50% to even 90% of a traditional backup data set.

figuur 4 - deduplicatie

Figure 4: Deduplication

Archiving

Because a central high performance storage system can be quite expensive, a lot of companies decide to move less used data to less expensive high capacity storage. This is typically done by setting up the storage in multiple tiers. This process can be all inline of the storage system that moves data on block level to a slower, and therefore less expensive set of disks. When the data is accessed again, it is moved back to the fast storage tier. Clients and applications can access all the data as it stays online at all times. Another way to archive data is to have a data management solution decide what data to move. This is then done at a file-, mail- or database object level. The advantage of this system is that it actually moves data out of the systems, possibly leaving a so called 'stub' behind as a reference for clients and applications. This means that when the data is accessed again, it needs to be restored from another location which can be a time consuming process. On the other hand, this significantly reduces the active data size which in turn reduces backup time by large factors.


Indexing service

When data is moved between different storage tiers or systems, clients, applications and backup systems can get confused about where the data is actually stored.

An indexing server keeps track of the location of all the data in a storage system. It interfaces with the archiving solution and provides a transparent interface to clients and applications. The archiving solution on demand moves data back to other tiers.

figuur 5 - indexing server

Figure 5: Indexing server


Virtualization

Once the availability of data is improved, it's time to do the same for the servers. Having data online is only half the solution. Without services to deliver it to the clients and applications, it is of no more use than a backup. A physical solution to improve server availability is clustering. Clustered systems require shared storage or have their own copy of the data that is kept in sync by using application level replication. Another solution to improving server availability is virtualization. Virtual machines are independent of the physical hardware and can very easily be moved from one host to another, whether this host is on the same site or a failover site. Higher server availability can be achieved by a virtualization solution that actively monitors all virtual machines and in case of a physical host failure, automatically restarts the virtual machine on another host. Depending on the management tools available, it's also possible to load balance all virtual machines across the available physical hosts by implementing live migration options. There are two main types of hypervisors for virtualization solutions; the thin hypervisor, also called microkernelized hypervisor and the thick hypervisor, also called monolithic hypervisor. Thin hypervisors are used by virtualization solutions like XenServer and Hyper-V whiles ESX uses a thick hypervisor.

Thin hypervisor

Thin hypervisors are intended to only translate calls from guests to hypercall instructions. This means that all the emulation, drivers, scheduling, is done in a separate Operating System that provides just that. This means that the hypervisor cannot exist on its own. It always needs an OS like Linux or Windows to provide full virtualization functionality. This makes it very accessible though. The hypervisor just becomes a part of the Operating System that everybody knows.

Thick hypervisor

The thick or monolithic hypervisor provides everything needed to run virtual machines like drivers, the scheduler, VM monitoring and maintenance. This means that this type of hypervisor can run completely standalone. The management OS (service console) is not necessary for normal operation, it just makes monitoring and configuring the host easier.


Storage replicated failover site

The story doesn't end when the datacenter is protected against hardware component failures. Bigger calamities can cause a whole site to fail because of a flood or fire or simply because of a power outage that lasts longer than UPS's can handle. There are different scenarios that help insure against these kinds of catastrophes. Requirements on how much data may be lost (Recovery Point Objective, RPO) and how long systems can be down (Recovery Time Objective, RTO) dictate the right solution.

 

First of all, to failover to another site, the data has to be available there. It's a matter of requirements what the right method of replication is. If no data can be lost, the only option is synchronous replication. This is however the most critical and expensive solution. With asynchronous replication, the data loss is typically very low, but not zero. The disadvantage of storage replication is that it's unaware of applications or data consistency. If data corruption occurs on the main site, it gets replicated to the standby site immediately.

 

Once the data is available at the failover site, servers are needed to access it and provide services. Servers at the recovery site can be virtual as well as physical, identical to the main datacenter. With physical servers it is generally speaking more difficult to deliver services again than it is with virtual servers. The main reason for this lies in the fact that virtual machines are hardware independent and can move, even without downtime, from one physical machine to another. When a fallout of the main site takes longer, data changes by clients and applications need to be backed up. This means that a replication site that can take over production needs its own full offsite infrastructure.

 

Application replication failover site

In order to replicate on application level, both sides need to have an active server. These servers run an application that has its own method of synchronizing application data. For example, databases or mail servers store transactions in local log files that are transmitted to a secondary system. This passive system injects the transactions logs into its own copy of the data. This method can result in data loss if closed transactions on the main site are not replicated to the standby system before the main system crashes. If setup properly this data loss can be minimized but will never be zero. The advantage of having application replication is that the standby system can be guaranteed to be consistent.


System portability

System portability (SP) is in essence a form of server replication. It replicates the system and data drives without awareness of the applications. This can be done either at file system level, block level or using Volume Shadow Services (VSS). The first and third option can maintain consistency of the data on the replicated server, the second one can't.

This replication process uses a technique called X2V or Any To Virtual. This means it can replicate virtual as well as physical machines. This process can be a full replication or an incremental replication. Besides replicating to Direct Attached Storage (DAS) or Network Attached Storage (NAS) on the system portability host, a typical solution also incorporates a virtualization solution. That means that once the protected systems are replicated, they can easily be started on the host itself. Replication can be on a file level basis or on a block level basis. For relatively static servers, it's usually better to use file level replication while for transaction rich hosts it's usually better to choose block level.

figuur 6 - OS portability

Figure 6: OS Portability


Backup

It makes sense to have the backup offsite from the main site. In case of loss of the main datacenter, the data will still be available somewhere else. Having the backup offsite does not qualify as a disaster recovery solution since there are no servers at the recovery site to deliver the data to clients and applications. Also, there is usually no infrastructure present at a site that just delivers backup services. Building a production environment at such a site is often physically impossible.

 

Creating a traditional backup requires an agent inside the machine. It doesn't matter if this is a physical or a virtual machine. From this agent, data is sent directly to the backup server over the LAN. When these servers are virtual, it's also possible to create a backup straight from the virtual hard disk that resides on the SAN. Creating a backup this way leaves all resources available for virtual machines to run production instead of needing them to run backups. The disadvantage of this method is that it takes a snapshot of the data without application awareness so the data that's being backed up is in an inconsistent state. This method therefore only works for crash recoverable services.

figuur 7 - proxy VM backup

Figure 12: Proxy VM backup


More information

If, after reading this whitepaper, more information is needed, please contact PQR at 0031‑30‑6629‑729. A printed copy of the diagrams from this article are available at http://www.pqr.com or http://www.virtuall.nl or can be requested by sending an email to This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

 

Related...