Wednesday, July 28, 2010

To "cloud" or not to "cloud"

Lately I have interviewed many candidates for our engineering positions. A common question from the interviewees is always "why don't you host your service on the cloud"?

This is actually a question that we often ask ourselves. We love the idea of hosting all of the services on the cloud, so we don't have to manage hardwares. But why haven't we done so?
  1. We started Livemocha in early 2007. Cloud computing wasn't mature enough. The only cloud service out there was Amazon S3. We simply could not setup an entire DC on the cloud.
  2. Better control of hardware specs. Most of the cloud computing service use VM. We still can't have full control of the hardware spec, number of CPUs, side of hard disk, speed of hard disk, memory size and etc.
  3. NFS solution. S3 is the most mature cloud file storage system. Up till today, it still can't replace the good old simple NFS.
    1. S3 can be mounted to multiple EC2 instances, but it's slow. You can't stream data to S3 drives.
    2. There no good solution to backup S3 data. With tradition NFS, we can both hardware or software solutions to back an entire disk at real time.
  4. No LB support. Amazon just started offering LB last year. But its LB configuration is very simple. There are nothing much you can do besides simply round robin load balancing. We use F5 LB, which can be configured to do hardware based https acceleration, reverse proxy, and dynamic caching.
Here is a list of things that we do use on the cloud
  1. EC2 computing on demand. If we want to generate tons of PDFs or video, we request new instances of EC2 and schedule jobs there.
  2. S3 as secondary storage. We keep a copy of all user data on our NFS, then transfer duplicates to S3.
  3. CloudFront. CloudFront is awesome. It's cheap, and it's faster.
  4. SQS. We have more than 1000 queues running in SQS. They are persistent, and guaranteed delivery.

No comments: