Category: Podcast

Check out episodes covering SRE, DevOps and platform engineering. Some have only our insights. Others are interviews with amazing SREs and engineers. ⬇️⬇️⬇️

  • #14 Faster Incident Resolution through Data-Driven Notebooks (with Ivan Merrill)

    Episode 14 [SREpath Podcast] Ash Patel interviews Ivan Merill who is head of solutions engineering at Fiberplane. Ivan shares insights about making sense of the big data that comes from observability and incident response, to improve learning and drive faster incident resolution in the future. He also sheds light on the importance of fostering collaboration…

  • #13 Making Sense of OpenTelemetry and Observability (with Adriana Villela)

    Episode 13 [SREpath Podcast] Ash Patel interviews Adriana Villela who is a CNCF ambassador, OpenTelemetry contributor, and senior developer advocate at Lightstep. Adriana talks about her experiences discovering observability, life as a team leader, and the promise of OpenTelemetry. She sheds light on the importance of observability practices and the role of OpenTelemetry in standardizing…

  • #12 From Incident Firefighting to Reliability First (with Robert Ross)

    Episode 12 [SREpath Podcast] Ash Patel interviews Robert Ross who is the founder and CEO of Firehydrant, an incident management platform. Robert talks about his experiences as an SRE and making tools for making developers’ lives easier. He also shares his insights from offering incident management software to SREs and other software incident responders. Highlights…

  • #11 Rising to Staff Engineer in DevOps and SRE (with Rajesh Reddy N)

    Episode 11 [SREpath Podcast] Ash Patel interviews Rajesh Reddy N who is a senior DevOps architect at CoinDCX. Rajesh shares his thoughts on effective patterns in SRE, DevOps, and platform engineering. He emphasizes the importance of understanding the systems, prioritizing issues, and avoiding buzzword-driven decisions. Rajesh also highlights challenges like alert noise and treating all…

  • #10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)

    Episode 10 [SREpath Podcast] Ash Patel interviews Kyle Forster of RunWhen about his perspective on AI and its usefulness in achieving reliability goals. RunWhen has developed a tool that uses visual cluster mapping and GenAI for troubleshooting Kubernetes problems. Its localhost version has hit over 1900 downloads in the 6 weeks since launch. Transcript Don’t want to…

  • #9 Inside Booking.com’s Site Reliability Engineering Practice

    Episode 9 [SREpath Podcast] Ash Patel interviews Samuele Tonon and Yoann Fouquet about their experiences in managing and growing the Site Reliability Engineering (SRE) function at Booking.com. Booking.com is one of the world’s largest travel sites with a market capitalization of over $100 billion and over 1.5 million bookings per day. Here are key highlights…

  • #8 Software Reliability Ninja Who is NOT An SRE (with Pablo Bouzada)

    Episode 8 [SREpath Podcast] Ash Patel interviews Pablo Bouzada about his beliefs on software reliability as a non-SRE software engineering leader. They discuss the importance of leadership to drive effective reliability changes in the software system, as well as the challenges of providing reliable service within video streaming giant, ViaPlay. Read the Episode Transcript Don’t…

  • Spotify’s Site Reliability Engineering (SRE) Culture and Practices [Audio]

    Listen to this audio case study now: In this episode, we explore Spotify’s Site Reliability Engineering (SRE) practice and how it transformed the company’s software operations. We’ll unravel the hypergrowth challenge that Spotify faced before it implemented SRE practices. We will also deep dive into how SRE has helped Spotify’s broader engineering work. You’ll get…

  • SREs are risk managers, IAC hate and more! [Audio]

    Episode 1 [SRE Review Podcast] Listen to this episode now: In this maiden episode of SRE Review, I cover the following articles:

  • #6 Building a successful SRE practice through capabilities

    Episode 6 [SREpath Podcast] We discuss the need for a framework to guide the development of Site Reliability Engineers (SREs) and drive value for organizations. You will learn about our pillar view of areas like observability and service management, to identify areas for improvement and emphasize the importance of focusing on a few key areas…