Mockintosh: Chaos and Resilience Testing
In this part we will talk about doing chaos and resilience testing using Mockintosh.
Defining Performance Profiles
Mockintosh's configuration syntax let's you define a list of performance profiles at top.
performanceProfiles: profile1: ratio: 1 delay: 1.5 faults: "200": 0.3 "201": 0.1 "400": 0.1 "500": 0.2 "503": 0.1 PASS: 0.4 RST: 0.2 FIN: 0.1 profile2: ratio: 0.3 delay: 4.8
The purpose of performance profiles in Mockintosh is to simulate a service outage or a service under stress
such that the service becomes slow or even unresponsive.
These kind of issues that happen in the network or the data center infrastructure are often unpredictable
except the scheduled maintainances. It's also possible to the root cause to be a faulty software that's deployed
into the production. Which is again another unpredictable outcome for a client or a microservice that's dependant
on such a service.
Therefore a software that depends on a service should be nondeterministically tested against such service outage scenarios.
To replicate such nondeterministic behaviors the config syntax have the required
ratio field which sets the probability of
the performance profile being triggered in case of a request.
delay field can be used to apply a certain amount of delay to response time and its unit is "seconds".
The delay is not applied unless the performance profile is not triggered.
faults field is a list of key-value pairs to define probabilistic status code overrides. The keys are
the status codes while the values are the distribution of probabilities.
PASS special key means ignore the fault.
So for this example, services and endpoints that uses
profile1 will always have a 1.5 seconds delay
and a status code override according the probability distribution under
faults field. While the services and
endpoints that uses
profile2 with a 30% probability will have a 4.8 seconds delay and experience no status code
Using the Performance Profiles
performanceProfile field, it's possible to set a performance profile for a service or an endpoint.
performanceProfiles: profile1: ratio: 1 delay: 1.5 faults: "200": 0.3 "201": 0.1 "400": 0.1 "500": 0.2 "503": 0.1 PASS: 0.4 RST: 0.2 FIN: 0.1 profile2: ratio: 0.3 delay: 4.8 services: - name: "Service One" port: 8081 performanceProfile: profile1 endpoints: - path: /example1 response: example1 - name: "Service Two" port: 8082 performanceProfile: profile1 endpoints: - path: /example2 response: example2 - path: /example3 performanceProfile: profile2 response: example3
In this example, performance profile "profile1" is applied to all endpoints under "Service One". So if we visit
localhost:8081/example1 we experience a 1.5 seconds delay and a fault sometimes.
The performance profile "profile2" is only applied to the second endpoint of the "Service Two" while the first endpoint
is not under the effect of any performance profiles. Therefore if we visit
we experience nothing. While
localhost:8082/example3 gives us a 4.8 seconds delay with a 30% probability.