Thursday, February 12, 2009

NFS to S3/SQS/CF

In the last 18 months, our site has been running on NFS, all user images, audios are stored on an NFS server that's mounted on each web hosts.  So far, we have 5.8 millions files.


Sometime last year, we realized that we made a huge mistake in designing the NFS file structure.  We only had two directories for storing all user generated files, one for image, one for audio.  With more than a million files stored in a single directory, a simple "ls" command takes hours or even days.  It could even bring the NFS server down to its knee if we try to delete or move files around.  We made some changes to audio recording so at least audio files are stored in a hierarchy.  

Another incident happend last week.  Our primary NFS went down dur to some driver issue.  The backup NFS didn't kick in in time so we had few hours of down time on the production site.  We finally decided to move to Amazon S3 and start with user images only.

The design is simple.  User still uploads images to our own NFS server.  When the file upload job is done, we then send a message to SQS to indicate that a user has uploaded an email.  A cron job listens to the SQS queue.  When the cron job is executed and it finds a message in SQS, it then uploads the file from our NFS to S3 and change the file pointer in the DB.  Our view layer code uses the pointer to determine where the file is located, either on NFS or in S3.  Since we also enabled CouldFront, view layer code will then use our CloudFront URLs to render the image.

There are several key decisions in this practice:
1. We create a backup queue in our DB in case SQS is down.  Chances are that it's more likely we own site goes down before SQS goes down, but we just want to be ready.
2. We already tries to write to SQS first.  If SQS goes down, we write to our DB queue as long as there are something in the DB queue.
3. There's a cron job that move messages from DB queue to SQS.  So we make sure the order are maintained (SQS doesn't really guarantee orderings, but we'll do our best).
4. The cron job that listens to SQS only need to worry about SQS, not the DB queue.
5. We create 2 cnames for our CloudFront and randomly pick one at a time when we render an image.  This will allow browers to utilize additional threads to retreive data (most of browers only allow up to 4 connections to the same server).

No comments: