Saturday, August 11, 2012

Cloud Data Storage

1. Advantages of Cloud Data Storage

Storing extremely large volumes of information on a local area network (LAN) is expensive. High capacity electronic data storage devices like file servers, Storage Area Networks (SAN) and Network Attached Storage (NAS) provide high performance, high availability data storage accessible via industry standard interfaces. However, electronic data storage devices have many drawbacks, including that they are costly to purchase, have limited lifetimes, require backup and recovery systems, have a physical presence requiring specific environmental conditions, require personnel to manage and consume considerable amounts of energy for both power and cooling.
Cloud data storage providers, such as AmazonS3, provide cheap, virtually unlimited electronic data storage in remotely hosted facilities. Information stored with these providers is accessible via the internet or Wide Area Network (WAN). Economies of scale enable providers to supply data storage cheaper than the equivalent electronic data storage devices.
Cloud data storage has many advantages. It’s cheap, doesn’t require installation, doesn’t need replacing, has backup and recovery systems, has no physical presence, requires no environmental conditions, requires no personnel and doesn’t require energy for power or cooling. Cloud data storage however has several major drawbacks, including performance, availability, incompatible interfaces and lack of standards.

2. Disadvantages of Cloud Data Storage

Performance of cloud data storage is limited by bandwidth. Internet and WAN speeds are typically 10 to 100 times slower than LAN speeds. For example, accessing a typical file on a LAN takes 1 second, accessing the same file in cloud data storage may take 10 to 100 seconds. While consumers are used to slow internet downloads, they aren’t accustomed to waiting long periods of time for a document or spreadsheet to load.
Availability of cloud data storage is a serious issue. Cloud data storage relies on network connectivity between the LAN and the cloud data storage provider. Network connectivity can be affected by any number of issues including global networks disruptions, solar flares, severed underground cables and satellite damage. Cloud data storage has many more points of failure and is not resilient to network outages. Network outages mean the cloud data storage is completely unavailable.
Cloud data storage providers use proprietary networking protocols often not compatible with normal file serving on the LAN. Accessing cloud data storage often involves ad hoc programs to be created to bridge the difference in protocols.
The cloud data storage industry doesn’t have a common set of standard protocols. This means that different interfaces need to be created to access different cloud data storage providers. Swapping or choosing between providers is complicated as their protocols are incompatible.
The cloud drive data storage is small enough to be used on laptops while having enterprise class features that enable it to be scaled out to the largest organization.

3. Cloud Drive Architecture

Cloud Drive is a gateway to cloud storage. Cloud Drive supports many cloud data storage providers including Microsoft Azure, Amazon S3, Amazon EC2, Rackspace, EMC Atmos, Nirvanix, GoGrid, vcloud, Zetta, Scality, Dunkel, Mezeo,, Webdav and FTP. Cloud Drive hides the complexity of the underlying protocols allowing you to deploy cloud storage as simply as deploying storage via an IP SAN.
Cloud Drive is like an IP SAN that never runs out of space. As usage increases, Cloud Drive starts “offloading” data to the cloud data provider. Cloud Drive caches and optimizes traffic to/from cloud storage dramatically increasing performance and availability while also reducing network traffic.
Computers on the LAN access data via the block based iSCSI protocol. The storage service communicates via an internet connection with the cloud data storage provider. When the iSCSI initiator saves data to the data storage server, it initially stores the data in the local cache. Each data unit is uniquely located within the local cache and is flagged as either “online” in the local cache or “offline” in the cloud data storage provider. All data units in the local cache are checked periodically for usage. Least recently used (or “dormant”) data units are uploaded to the cloud data storage provider, flagged as “offline” and deleted from the local cache.

4. Cloud Drive Storage Service

The cloud drive storage service is simple to install and configure. It can be installed on a range of hardware, from laptop for personal use, a server in the office, or a cluster of high end 64 bit servers for the enterprise. Once the service in installed and configured, many clients can connect to it using the iSCSI protocol.
The storage service reduces the data storage requirements while maintaining performance by moving the least recently used data to the cloud data storage provider as well as one or more of the data storage accelerators. Cloud Drive accelerates performance by assuming that actual writes to data can happen anytime before a subsequent read to the same data. Cloud Drive accelerates performance by scheduling this “delayed” write data to periods of low activity and by not downloading data from the cloud data storage provider when the “delayed” write data has wholly overwritten data stored in the cloud. Cloud Drive further accelerates performance by assuming that delete operations can happen anytime after the data is downloaded.

5. Cloud Drive Optimizer

An optional component, cloud drive optimizer, improves performance, reduces bandwidth and reduces your data storage requirements. The optimizer should be installed on all iSCSI clients using the cloud drive storage service.
The data storage optimizer has access to the virtual hard drive to optimize the data stored in the local cache. The optimizer periodically reads virtual hard drive or virtual file share metadata including directories, filenames, permissions and attributes in order to maintain that data in the local cache. In this way, the data storage optimizer also accelerates performance of the data storage server by preventing data other than file data from being identified as “dormant”. The data storage optimizer also reduces storage requirements of the data storage server by periodically overwriting “all zeros” to unused parts of the virtual hard drive. The data storage optimizer is also adapted to periodically run disk checking utilities against the virtual hard drive to prevent important internal file systems data structures from being marked as dormant.

6. Cloud Drive Network Accelerator

An optional component, cloud drive network accelerator, improves the performance and availability of the Storage Service. This component can be installed on all computers in the home, office or enterprise.
The network accelerators allow the office to reclaim all those “small spaces” of data storage already available on the 10’s, 100’s or 1000’s of computers within the enterprise. A typical office with 100 computers having on average 100 GB of space available could potentially reclaim 100 x 100 GB = 10 TB of data storage space by reclaiming and consolidating this unused space. Network accelerators boost performance and improve resilience to slowness or unavailability of the cloud data storage providers by redundantly storing data uploaded to the cloud data storage provider on the local network in the already existing “unused spaces”
Network accelerators work like a massive cache within the enterprise. In the above example, the Storage Services local cache is complimented by a 10 TB onsite cache running throughout the enterprise.

7. Cloud Drive Solution

Cloud Drive increases the apparent availability of the cloud data storage provider. If the local cache satisfies 99% of requests for data without requiring the cloud data storage provider, the apparent availability of the cloud data storage provider is increased 100 fold and 99% of data accesses occur at local network speeds rather than the network connection speeds to the cloud data storage provider. Cloud Drive also manages the data formatting and communication with the cloud data storage provider while allowing seamless access to data using standard protocols such as iSCSI and NFS. Further, Cloud Drive allows concurrent processing of read and writes requests to different data as well as synchronized and serialized access to the same data.
Cloud Drive virtualizes data storage by allowing a limited amount of physical data storage to appear many times larger than it actually is. Cloud Drive allows fast, expensive physical data storage to be supplemented by cheaper, slower remote data storage without incurring substantial performance degradation Cloud Drive also reduces the physical data storage requirements to a small fraction of the total storage requirements, while the rest of the data can be “offloaded” into slower, cheaper online cloud data storage providers.