CALL +44 (0)20 7183 3893
Blog
Showing posts with label network. Show all posts
Showing posts with label network. Show all posts

Monday, 14 January 2013

Comparing Amazon VPC connectivity options

In August 2009 Amazon announced its Virtual Private Cloud (VPC) service, essentially giving enterprise customers worried about security and control in the cloud a solution to that concern. Since then the Amazon VPC has matured as more and more services have become available from within the VPC.

Amazon Virtual Private Cloud allows IT administrators to provision a private, isolated section of the Amazon Web Services (AWS) Cloud where they can launch AWS resources in a virtual network that they define. They can have complete control over the virtual networking environment, including selection of IP address ranges, configuration of routing tables, subnets and network gateways.

Furthermore customers can connect their existing data centers and branch offices to the Amazon VPC and access the AWS cloud as if it is an extension of the corporate network. This connectivity between the corporate offices and the Amazon VPC can be accomplished in several ways.

In this short blog, we will explore the options available for connecting the enterprise network to the Amazon VPC whilst we compare and contrast the advantages, disadvantages and associated costs.


Amazon Direct Connect


AWS Direct Connect is an AWS service that allows you establish a dedicated network connection between your WAN network and the Amazon Web Service global network. If your corporate network has presence in one of these locations, Direct Connect facilitates dedicated 1G or 10G connectivity between your network equipment at that location and Amazon's routers.

Pricing information can be found here.

If connecting in London Telecity, a single 1G port will cost at least $223 per month for the port connection-hours. Additionally you pay $0.03 per GB for data transfers outbound from the VPC to the corporate network. Furthermore, if your corporate offices and datacenters are already reachable from the Direct Connect peering location across the enterprise WAN, only minimal configuration will be required to route traffic between the VPC and those offices.

Advantages

  • Reduces bandwidth costs for traffic-heavy applications.
  • Provides consistent network performance compared to other options.
  • Can be used for accessing AWS services outside the VPC.

Disadvantages

  • Requires existing network presence in a very limited set of locations.
  • Requires more complex network hardware and configuration, for example 802.1q VLANs, BGP ..etc.
  • If the traffic loads are not heavy enough, this is an expensive option.
  • Not very elastic, the options are 1G or 10G ports, there is nothing in between. 

Friday, 4 February 2011

The Cloud = High Performance Computing

The cloud is a perfectly fitting platform for solving many high-performance computational problems. It may be actually cheaper and may offer faster return of results than traditional clusters, for both occasional tasks and periodic use.

For a number of years, science and analytics users have been using clusters for high-performance computing in areas such as bioinformatics, climate science, financial market predictions, data mining, finite element modelling etc. Companies working with vast amounts of data, such as Google, Yahoo! and Facebook, use vast dedicated clusters to crawl, index and search websites.

Dedicated Company Clusters.  
Often a company will own it’s own dedicated cluster for high-performance computations. The utilisation will likely be below 100% most of the time as the cluster needs to be scaled for peak demand, e.g. overnight analyses. The cluster will likely rapidly become business-critical, and it may become difficult or prohibitive to schedule longer maintenance shutdowns: hence the cluster may become running on outdated software. If the cluster has been growing in ad-hoc fashion from very small, there will occur a critical point, when any further growth requires disruptive hardware infrastructure upgrade and software re-configuration or upgrade i.e. a long shutdown. This may actually not be an option or carry an unacceptable risk.

Shared institutional clusters
In the case of a shared cluster (such as UK’s HECToR) the end users will likely face availability challenges:
  • There may not be enough task slots in the job pool for “surge” needs
  • Job queues may cause the job to wait for a few days
  • Often departments will need to watch monthly cluster utilisation quotas or face temporary black-listing for the job pool
Clusters Are Finite and Don’t Grow On Demand
Given the exponential nature of growth of data that we process, our needs (e.g. an experiment in Next Generation Sequencing) may simply outgrow the pace with which the clusters keep pace.

The Cloud Alternative

For those who feel constrained by the above problems, Amazon Web Services offer a viable HPC alternative:
  • AWS Elastic Compute Cloud (EC2) brings on-demand instances
  • The recently (late 2010) introduced AWS Cluster Compute Instances are high-performance instances running inside a high-speed, low-latency sub-network
  • For loosely coupled, easily parallelised problems, AWS Elastic Map Reduce is the offering of Hadoop (version 0.20.2), Hive and Pig as a service (well integrated into the rest of AWS stack such as the S3 storage).
  • For tightly coupled problems, Message Passing Interface, OpenMP and similar technologies will benefit from fast network
  • For analysis requiring a central, clustered, database, MySQL is offered as a service called AWS Relational Database Service (RDS), with Oracle DB announced as next
The Downside of The Cloud Approach: The Data Localisation Challenge (& Solutions). The fact that customer’s data (potentially in vast amounts) need to get to AWS over public Internet, is a limiting factor. Often the customer’s own network may be the actual bottleneck. There are 2 considerations to make:
  • Many NP-complete problems are actually above the rule-of-thumb break-even point for moving data over slow link vs. available CPU power (1 byte / 100,000 CPU cycles)
  • Often the actual “big data” are the reference datasets that are mostly static (e.g. in bioinformatics, reference genomes). AWS contains a number of public datasets already. For others, it may make sense to send the first batch of data for upload to AWS on a physical medium by post, although later only apply incremental changes.
Martin Kochan
Cloud Developer
Cloudreach
Pontus is ready and waiting to answer your questions