CALL +44 (0)20 7183 3893
Blog

Wednesday, 26 January 2011

The Cloud Vs Grid!!!

Some of the major issues with grid computing at the moment are:
  1. Its static nature: Users who require access to computational resources need to make a request to the resource providers in order to host their apps on their nodes, which then becomes available as a service to anyone with the right login credentials.
  2. Data transfer time: Data needs to be transferred manually to any resource that requires it. This could take a long period of time, further delaying the start of the execution of the process.
  3. Difficulties with fine-grained parallelism: Due to the latency involved in inter-node communications with the grid, approaches using fine-grained parallelism, using communication paradigms such as MPI or OpenMP are rendered impracticable on the grid. This limits the remit of applications run on the grid.
However, Cloud Computing can be used to resolve many of these issues:
  1. Pay-as-you-go paradigm: Cloud computing allows for resource billing on a usage basis.
  2. Data transfer: As data on Infrastructure as a Service (IaaS) providers can be moved by reference, it is possible to transfer data (once on the cloud) to the node(s) that require it by reference.
  3. Fine-grained parallelism: With HPC on the public Cloud becoming a reality with offerings such as Amazon’s Cluster Compute, it is possible to purchase time on a virtualised cluster that may be used for tightly-coupled parallelised processes (for instance, parallel computing applications for bio-informatics that cannot be solved using the Map/Reduce model, such as the construction of a Bayesian network.
As Cloud Computing can solve several of the issues inherent in grid computing as we have seen above, there is an emergent need to wrap up several of the tools that may be used for (distributed and HPC) scientific programming into an SDK, overlaid with a workflow management system. This could be provided as a Platform as a Service.

The most interesting part of such an SDK, from this perspective, would be the aforementioned workflow management system*. The workflow management system could enable the automation of the resource provisioning, requesting and obtaining the right type of resource - a cluster on the Cloud, or a number of nodes without spatial locality - for the type of process that is to be executed.
This approach could potentially be more convenient and cost-effective than using the grid because:
  • an automated approach to resource allocation would save the time consumed in resource request and provisioning in the "static" grid,
  • the Cloud can be used for processes that use either coarse- or fine-grained parallelism,
  • small commercial organisations in a variety of domains (Oil and Gas, Bioinformatics, etc) can save on the immense cost of purchasing cluster hardware and training users (who may be scientists without an informatics background) in the use of the cluster, and
  • IaaS providers like Amazon provide credits for the use of resources on the public Cloud for research in educational institutions.
*It
is a moot point if we may refer to a workflow management system as part
of an SDK. An IDE for the Cloud might perhaps be mot juste in this
case.

No comments:

Post a Comment

Pontus is ready and waiting to answer your questions