-
How Jaeger tracing fits into software observability
In this article, I will share how tracing and more specifically Jaeger tracing can fit into your wider software observability strategy. Before we get into tracing, let’s define observability. What is observability? Observability is a comprehensive means of gaining data on how software services perform in production. This data gives you a picture of the…
-
How 6 system resilience patterns increase software reliability
Introduction System resilience thinking can inform better Site Reliability Engineering decisions. Specifically, it can affect how the SRE culture unfolds and handles critical situations. The system resilience concept is rooted in theoretical computer science. Don’t panic. I will explain how it can – in a practical way – support increased software reliability in production. We…
-
Runbooks for better incident response
Introduction I can confidently tell you that runbooks form a critical part of the incident response toolkit. I will also tell you that SREs are well-placed to start and oversee the development of runbooks. If you don’t have a runbook yet, let me entice you with the thought of checklist-type documentation to follow when you’re…