We as the product owners need to know if and when an outage happen, and we need to go in front it rather than knowing by a customer calling support. The good news is that most of the teams understand the need and are ramping up the effort for monitoring. Also, fortunately, most of all our modern applications/APIs have some health check which we rely heavily on monitoring. It is an excellent start; we need to know if the APIs are up and running, but that alone will not help.
Let me introduce to a new term and must have as we embark on building new monitoring and alerting, ‘Synthetic Transaction Monitoring.’ Synthetic transaction monitoring is running a script by simulating the most typical user behavior of our application end to end. Think of it as your core smoke test. One is to check if the site is up and API’s are all responding and another is to check if the application as a whole behaving as it is supposed to. Find out what is the most critical and most used path of the user in the application, script the behavior and run in production as often as you can.
During the production release, most of the products run some smoke tests before handing it for manual tests. So we can leverage one or more of those smoke test as synthetic transactions quickly.
The core principals of Synthetic Transaction Monitoring are
- It should mimic most typical user behavior of our application end to end.
- It can be more than one.
- It must be one of the first tests you run right after production release for validation.
- We need to run this script as often as we can in production.
- Alert if and when it fails.
- These monitorings need to run in all environments, not just production.
- Make these synthetic transactions as a part of the performance test in CI pipeline so that no one accidentally decreases the performance.
The fourth point “We need to run this script as often as we can in production” is deceivingly simple but it is very critical. We should be able to execute the synthetic transaction every minute. If for some reason the synthetic transactions take more than a minute to run then we need to ask the questions to ourselves;
- if that transaction indeed a core user behavior?
- If it is core behavior, is it acceptable to have that performance?
One final note, when we start building synthetic transactions, we rely on domain experts and business analyst to develop these behaviors but in long run, we should be able to look at the logs to identify true customer usage and modify these synthetic transactions.