Chaos Engineering: Testing known unknowns using ToxiProxy

Ravindra Elicherla
4 min readFeb 25, 2018

How do we build resilient applications which do not have single point of failure? ToxiProxy comes to rescue to test this. It is a framework for simulating network conditions open sourced by Shopify. It is used to prove your application doesn’t have a single points of failure. Successful testing against frameworks like ToxiProxy will enable application teams to adopt chaos engineering which tests unknown unknowns. Thanks to Simon for a talk on this topic last year at @scale conference.

Before dwelling into how to use the platform let's look at what happens in real time in distributed systems. Take an example on Amazon landing page. Data in Amazon landing page is populated by 100s of micro services tirelessly working to show what is relevant to that particular customer. Just not that, these microservices talk to each other through http and take decision on the fly.

Amazon internal services architecture circa 2009 https://apigee.com/about/blog/api-technology/microservices-amazon

Few challenges with this type of architecture is latency, services being fully down, over load of requests. Application developer should think what should happen to application if the service is fully down or latency is more than 200 ms (or any number). How do we simulate these situations in testing environment? This is where ToxiProxy comes handy.

Chaos Engineering is great for exposing unknown weaknesses in your production system, but if you are certain that a Chaos Engineering experiment will lead to a significant problem with the system, there’s no sense in running that experiment. Fix that weakness first. Then come back to Chaos Engineering and it will either uncover other weaknesses that you didn’t know about, or it will give you more confidence that your system is in fact resilient.- From Chaos Engineering Book

How ToxiProxy works?

ToxiProxy allows us to write proxies on the fly. These proxies in turn allows to make modifications to various services. For example a toxin can add latency and jitter to a service, limit the bandwidth, slice the data into bits with optional delay etc.

Lets do some hands on now.

Install ToxiProxy:

on OSx

brew tap shopify/shopify
brew install toxiproxy

Docker image:

$ docker pull shopify/toxiproxy
$ docker run -it shopify/toxiproxy

If the installation is successful, in the terminal window type

$ toxiproxy-server

ToxiProxy server is runnng on port 8474

$ toxiproxy-cli

Your screen will look something like this.

$ toxiproxy-cli list

As we did not create any proxies, you will get below screen

Before creating a proxy lets run a redis server.

Redis is running on port 6379.

Now two servers are running on two different ports and they are not communicating with each other.

$ toxiproxy-cli create redis1 --listen localhost:20000 --upstream localhost:6379

if you type

$ toxiproxy-cli list

now you will see that redis1 is created.

Now access redis through port 20000

$ redis-cli -p 20000

Now lets add latency to the server.

$ toxiproxy-cli toxic add redis1 --type latency --attribute latency=2000
$ toxiproxy-cli inspect redis1

You can see the latency and it is for 2000ms.

Lets test now

You can see now there is a delay of 2 secs.

I created one more toxic called redis2 with 5000ms delay.

Try experimenting with other use cases.

--

--

Ravindra Elicherla

Geek, Painter, Fitness enthusiast, Book worm, Options expert and Simple human