-
#14 Faster Incident Resolution through Data-Driven Notebooks (with Ivan Merrill)
Episode 14 [SREpath Podcast] Ash Patel interviews Ivan Merill who is head of solutions engineering at Fiberplane. Ivan shares insights about making sense of the big data that comes from observability and incident response, to improve learning and drive faster incident resolution in the future. He also sheds light on the importance of fostering collaboration…
-
#13 Making Sense of OpenTelemetry and Observability (with Adriana Villela)
Episode 13 [SREpath Podcast] Ash Patel interviews Adriana Villela who is a CNCF ambassador, OpenTelemetry contributor, and senior developer advocate at Lightstep. Adriana talks about her experiences discovering observability, life as a team leader, and the promise of OpenTelemetry. She sheds light on the importance of observability practices and the role of OpenTelemetry in standardizing…
-
#12 From Incident Firefighting to Reliability First (with Robert Ross)
Episode 12 [SREpath Podcast] Ash Patel interviews Robert Ross who is the founder and CEO of Firehydrant, an incident management platform. Robert talks about his experiences as an SRE and making tools for making developers’ lives easier. He also shares his insights from offering incident management software to SREs and other software incident responders. Highlights…
-
#11 Rising to Staff Engineer in DevOps and SRE (with Rajesh Reddy N)
Episode 11 [SREpath Podcast] Ash Patel interviews Rajesh Reddy N who is a senior DevOps architect at CoinDCX. Rajesh shares his thoughts on effective patterns in SRE, DevOps, and platform engineering. He emphasizes the importance of understanding the systems, prioritizing issues, and avoiding buzzword-driven decisions. Rajesh also highlights challenges like alert noise and treating all…
-
#10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)
Episode 10 [SREpath Podcast] Ash Patel interviews Kyle Forster of RunWhen about his perspective on AI and its usefulness in achieving reliability goals. RunWhen has developed a tool that uses visual cluster mapping and GenAI for troubleshooting Kubernetes problems. Its localhost version has hit over 1900 downloads in the 6 weeks since launch. Transcript Don’t want to…
-
#9 Inside Booking.com’s Site Reliability Engineering Practice
Episode 9 [SREpath Podcast] Ash Patel interviews Samuele Tonon and Yoann Fouquet about their experiences in managing and growing the Site Reliability Engineering (SRE) function at Booking.com. Booking.com is one of the world’s largest travel sites with a market capitalization of over $100 billion and over 1.5 million bookings per day. Here are key highlights…
-
How developers can survive “you build it, you run it”
Introduction As a developer, you might not have anything to do with your code once it’s been committed all the way to looking after the code right up to production. The latter is called the “you build it, you run it” model. It’s not going away. But that depends on your organization. It’s likely to…
-
#8 Software Reliability Ninja Who is NOT An SRE (with Pablo Bouzada)
Episode 8 [SREpath Podcast] Ash Patel interviews Pablo Bouzada about his beliefs on software reliability as a non-SRE software engineering leader. They discuss the importance of leadership to drive effective reliability changes in the software system, as well as the challenges of providing reliable service within video streaming giant, ViaPlay. Read the Episode Transcript Don’t…
-
10 Tips for Onboarding New SRE Hires
How new SRE hires can get stuck There’s more than one way to mess up your new SRE hire and get them stuck in a loop. Here are 6 ways new hires will know you’ve made this mistake: This article will unpack these 6 sticking points and show how to solve them. Later on, I…
-
Cost-benefit analysis of infrastructure-as-code (IAC)
You might have heard that Infrastructure-as-code (IaC) contributes to better cloud-native software architecture. But what is IaC, what are its benefits & trade-offs and how can it be improved? This guide aims to give clarity around IaC through: It can serve as a starting point for business-specific conversations with stakeholders. At some point, senior management…