Mockintosh: Chaos and Resilience Testing

In this part we will talk about doing chaos and resilience testing using Mockintosh. ​

Defining Performance Profiles

​ Mockintosh's configuration syntax let's you define a list of performance profiles at top.

performanceProfiles:
    profile1:
        ratio: 1
        delay: 1.5
        faults:
            "200": 0.3
            "201": 0.1
            "400": 0.1
            "500": 0.2
            "503": 0.1
            PASS: 0.4
            RST: 0.2
            FIN: 0.1
        profile2:
            ratio: 0.3
            delay: 4.8

The purpose of performance profiles in Mockintosh is to simulate a service outage or a service under stress such that the service becomes slow or even unresponsive. ​ These kind of issues that happen in the network or the data center infrastructure are often unpredictable except the scheduled maintainances. It's also possible to the root cause to be a faulty software that's deployed into the production. Which is again another unpredictable outcome for a client or a microservice that's dependant on such a service. ​ Therefore a software that depends on a service should be nondeterministically tested against such service outage scenarios. ​ To replicate such nondeterministic behaviors the config syntax have the required ratio field which sets the probability of the performance profile being triggered in case of a request. ​ The optional delay field can be used to apply a certain amount of delay to response time and its unit is "seconds". The delay is not applied unless the performance profile is not triggered. ​ The optional faults field is a list of key-value pairs to define probabilistic status code overrides. The keys are the status codes while the values are the distribution of probabilities. PASS special key means ignore the fault. ​ So for this example, services and endpoints that uses profile1 will always have a 1.5 seconds delay and a status code override according the probability distribution under faults field. While the services and endpoints that uses profile2 with a 30% probability will have a 4.8 seconds delay and experience no status code overrides. ​

Using the Performance Profiles

With performanceProfile field, it's possible to set a performance profile for a service or an endpoint.

performanceProfiles:
    profile1:
        ratio: 1
        delay: 1.5
        faults:
            "200": 0.3
            "201": 0.1
            "400": 0.1
            "500": 0.2
            "503": 0.1
            PASS: 0.4
            RST: 0.2
            FIN: 0.1
        profile2:
            ratio: 0.3
            delay: 4.8

services:
    - name: "Service One"
      port: 8081
      performanceProfile: profile1
      endpoints:
          - path: /example1
            response: example1

    - name: "Service Two"
      port: 8082
      performanceProfile: profile1
      endpoints:
          - path: /example2
            response: example2

          - path: /example3
            performanceProfile: profile2
            response: example3

​ In this example, performance profile "profile1" is applied to all endpoints under "Service One". So if we visit localhost:8081/example1 we experience a 1.5 seconds delay and a fault sometimes. ​ The performance profile "profile2" is only applied to the second endpoint of the "Service Two" while the first endpoint is not under the effect of any performance profiles. Therefore if we visit localhost:8082/example2 we experience nothing. While localhost:8082/example3 gives us a 4.8 seconds delay with a 30% probability. ​

Support

You can view these examples in action in this video, and learn more at Mockintosh and UP9. Join our community Slack at up9.slack.com.