-
#32 Clarifying Platform Engineering’s Role (with Ajay Chankramath)
Episode 32 [SREpath Podcast] Show notes Platform Engineering is a hot topic right now with some pundits saying it will replace DevOps or SRE or both. I don’t think this is the case at all. Neither does Ajay Chankramath. He is the Head of Platform Engineering at ThoughtWorks North America, an innovator consulting group. I’d…
-
#23 – The Danger of Unreliable Platforms (with Jade Rubick)
Episode 23 [SREpath Podcast] Show notes Jade Rubick needs no introduction in the reliability and observability space. He was VP of Engineering at New Relic from 2010 to 2019. It was my pleasure to have him talk about issues like managing expectations with teams, especially platform-based teams. We even had a spicy take on their…
-
#10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)
Episode 10 [SREpath Podcast] Ash Patel interviews Kyle Forster of RunWhen about his perspective on AI and its usefulness in achieving reliability goals. RunWhen has developed a tool that uses visual cluster mapping and GenAI for troubleshooting Kubernetes problems. Its localhost version has hit over 1900 downloads in the 6 weeks since launch. Transcript Don’t want to…
-
Inside Spotify’s Site Reliability Engineering (SRE) practice
You’ve undoubtedly caught wind of the latest Netflix series, dubbed “The Playlist,” a show loosely inspired by the birth of Spotify. Chances are, you may have already devoured it in one glorious binge-watching session. As for me, I only got around to it recently. I was enticed by a Youtube ad that hinted at a…
-
Spotify’s Site Reliability Engineering (SRE) Culture and Practices [Audio]
Listen to this audio case study now: In this episode, we explore Spotify’s Site Reliability Engineering (SRE) practice and how it transformed the company’s software operations. We’ll unravel the hypergrowth challenge that Spotify faced before it implemented SRE practices. We will also deep dive into how SRE has helped Spotify’s broader engineering work. You’ll get…
-
#3 SRE vs DevOps vs Platform Engineering [Audio]
Episode 3 [SREpath Podcast] In this episode of SREpath, Ash and Sebastian discuss the unnecessary debate surrounding Site Reliability Engineering (SRE), DevOps, and platform engineering. They argue that these disciplines should not be pitted against each other, but rather seen as complementary and able to coexist within an organization. The focus should be on continuous…
-
Analysis of SRE and platform setup at 10+ tech companies
In this article, you will see a breakdown of the platform setup and SRE practices within 12 non-FAANG technology companies. This is based on the case studies by Andrios Robert. “There is a lot of content available on how Google did [Site Reliability Engineering]; let’s uncover what happens with the rest of the world.” —…
-
Is platform engineering at risk of shiny object syndrome?
So much has been debated lately about the emergence of “Platform Engineering” as a solution to software operations problems. It’s an interesting proposition. However, it is not your silver bullet that will fix all things one felt didn’t work out with Dev versus Ops, DevOps, or SRE. We are missing something very important in our…
-
Reduce software outage risk with passive guardrails
Shocking fact: only 10-25% of software outages are because of hardware or network failure. The rest are the result of human error like misconfiguration — paraphrasing Martin Kleppman, Designing Data-Intensive Applications In this article, I will share with you how setting up passive guardrails in and around developer workflows can reduce the frequency and severity…
-
How cloud infrastructure teams evolve – from start to maturity
I recently read a post by Will Larson, who started SRE at Uber. The post is called the Trunks and branches model for scaling infrastructure organizations. Several passages in the post covered how infrastructure teams can evolve from the startup phase. I felt it would be easier to comprehend the dense-and-rich advice with a visual…