Site reliability engineering (SRE), first developed by Google, is quickly increasing in popularity. For organizations that want to get started with SRE, Senior Consultant Daniel Tharp argues that you need to get started with observability, first. In this Tech in 2, he shares three tips to help you harness observability in SRE.
3 Tips To Harness Observability In SRE
Observability is the starting point for SRE, because you can’t quantify what you can’t measure. If your team wants to get started with site reliability engineering, it has to start with observability.
We have three tips for getting started with observability:
- Empower your developers with an ‘error budget’. This is a number that is set in collaboration between the team and leadership. What the error budget does, is it lets your developers iterate more rapidly by giving them pre-baked in downtime. Any sort of failure that may happen, we’ve already allotted a certain allowance for that.
- Make your alerts meaningful. A human being can only respond with a real level of urgency to an alert 2-3 times a day. If they’re getting alerted more than that, they’re going to get fatigued. They’re not going to treat it as urgently as it may deserve. If they’re getting alerted much more than that, then it may be worth investigating what’s going on with the alerts. If there’s something in there that doesn’t require any action and it’s just letting people know, then somebody shouldn’t get paged for it.
- Use the first two tips to automate away your problems. Once you start to see you’ve got observability and you have an understanding of your errors and your common failures, that’s going to give you the groundwork for fixing them through automation.
By following these three tips, you’re enabling your developers to work more confidently, develop more quickly, and to have the entire team much more in tune with the real health of their application.
Modernization in the Insurance Industry
Platform modernization is becoming an increasing priority for insurers, particularly for tenured insurers with legacy applications. In this Tech in 2, Client Services Partner John Suminski discusses the many benefits of modernizing in the insurance industry and why keeping applications current is so important.
Three First Steps to Cloud Modernization
Modernizing in the Cloud allows you to reduce tech debt and take advantage of the benefits of modern technology. In this Tech in 2, Principal AWS Consultant Jeff Pabian shares the three first steps you should take if you want to modernize in the Cloud.
How Sparq Uses Generative AI
Sparq began using generative AI tools in early 2023, when we started offering it to our customers as an option on their projects. In this Tech in 2, Chief Engineering Officer Janet Pierce shares where we've found the most success with AI, the productivity gains we're seeing and how other organizations can get started with it.
Prototyping and Testing in the Cloud
“Cloud services are like Legos and the magic happens when you learn how to put them together to make something useful.” In this Tech in 2, Principal AWS Consultant Jeff Pabian shares four benefits of Cloud prototyping and testing.