Sunday, May 5, 2013

Test in Production!



Anyone has a problem with the picture above?  I certainly don't!  However, I'll add this, "when I test on production, I test it carefully".

In software service development, testing is so critical.  Services often go through multiple tiers of testing environments before the bits are finally released to the customer.  But why can't we do a one-stop testing directly on production?

The typical answer to this question is "NO, YOU CANNOT IMPACT CUSTOMERS!".   This is because the traditional practice of "testing on production" is to take a slice of the production traffic, and to put a pre-release version service behind the VIP along with all other current version services.  This way, only a small portion of customers are impacted.  But some customers ARE impacted!

This is where "port forwarding" comes in handy.  To ensure that NO customers are impacted at all, we can "clone" a slice of the production traffic, and send them over to a pre-release version along with all current version services.  The cloned traffic is only one-way: it goes into the pre-release version service, but never returns back to the customer.  This way, NO customers are impacted by the behavior of "un-tested" pre-release service.

What do we get out of this?

  • Free stress test!  Instead of trying to setup stress testing environment that simulates production volumes and requests patterns, you will have a service running in production, getting production requests and throughput.
  • Free regression test! If you compare the responses from both current version service and pre-release version service, you get yourself a simple regression test suite.
  • Frequent regression test!  Hook this into your favorite CI framework, you get to run regression tests 24/7, given you have production traffic 24/7.
There are multiple ways to do "port forwarding", advanced load balancers have built-in features to clone a subset of requests.  If you are running linux, you can just configure IPtables to forward specific ports.  Since I'm not a sysadmin, I prefer software solutions that I can modify and manage. 

Node.js has a nice plugin node-proxy.  You can run a proxy service that bridge traffic to a "target" service, and a "forwarding" service.  The target service is the one that handles real traffic, its responses are sent back to the customer through proxy.  The forwarding service is only one-way.  It gets the same requests as the target service, but never returns anything back to the customer.  With this setup, you can TEST on PRODUCTION!

No comments: