{"id":794,"date":"2022-10-12T01:51:54","date_gmt":"2022-10-11T15:51:54","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=794"},"modified":"2023-12-13T15:28:02","modified_gmt":"2023-12-13T05:28:02","slug":"team-topologies-for-site-reliability-engineering","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/team-topologies-for-site-reliability-engineering\/","title":{"rendered":"Where in team topologies does Site Reliability Engineering fit in?"},"content":{"rendered":"\n
We will explore the workings of the Team Topologies model and how Site Reliability Engineering (SRE) teams can fit into it.<\/p>\n\n\n\n
In more detail, I will share with you the following:<\/p>\n\n\n\n
Let\u2019s get started.<\/p>\n\n\n
Team topologies is a relatively new model\/framework, having been officially introduced in 2019.<\/p>\n\n\n\n
It\u2019s a response by authors, Manuel Pais and Matthew Skelton, to fundamental and recurring issues in software delivery.<\/p>\n\n\n
Modern software systems typically call for a fast flow of change. Examples of change include:<\/p>\n\n\n\n
Slower flow of change can lead to a backlog of work that keeps piling up. Meanwhile, the software falls behind in terms of meeting user needs and market competitiveness.<\/p>\n\n\n\n
Here\u2019s the thing: cognitively overloaded software engineering teams can only work to a certain speed of change.<\/p>\n\n\n\n
This is why it\u2019s important to reduce their cognitive load, in order to support higher release velocity.<\/p>\n\n\n
The authors have created a new level of clarity around how various software teams – from product teams all the way to platform teams – should operate.<\/p>\n\n\n\n
A well-defined team topology within an org will give engineers the luxury of one of the most precious things lacking in modern business: focus.<\/p>\n\n\n\n
Software teams in the last decade have experienced a myriad of change. They have been:<\/p>\n\n\n\n
Team topologies is a conceptual framework that overlays dynamic team structures. These structures can enhance the service that NFR-responsible software engineers like SREs provide to feature teams.<\/p>\n\n\n\n
By NFR, I mean non-functional requirements, which refers to areas of software engineering other than developing features of the software.<\/p>\n\n\n\n
Areas like platform, reliability, performance etc.<\/p>\n\n\n\n
\ud83d\udcad Side note:<\/em> I\u2019m personally not a fan of the term NFR because the above-mentioned areas are critical to what end-users perceive as functional <\/em>software.<\/p>\n\n\n\n The foremost aim of proposing unique and dynamic team structures is to optimise for team cognitive load, which can factor heavily on a team\u2019s effectiveness.<\/p>\n\n\n\n Cognitive load: The total amount of mental effort being used in the working memory \u2014 John Sweller<\/p>\n<\/blockquote>\n\n\n\n Failing to optimise for cognitive load can lead to lower work quality, delays and unmotivated engineers.<\/p>\n\n\n\n I suppose that means many teams today are poorly optimised for cognitive load!<\/p>\n\n\n\n A lot of the metrics that are now considered de rigeur in software delivery – like MTTR, deployment frequency etc – would be difficult to improve if teams are mentally overloaded.<\/p>\n\n\n\n Moving on.<\/em><\/p>\n\n\n \ud83d\udca1 My key highlight from the book:<\/strong> Effective value creation requires forming dynamic team structures that reflect the kind of value that needs to be released.<\/p>\n\n\n I have noticed that Conway\u2019s Law is talked about often in DevOps and SRE circles. Perhaps because it\u2019s a law and us analytical types love rules? I jest.<\/p>\n\n\n\n Very quickly, Conway\u2019s Law stipulates that software architecture won\u2019t change (effectively) without changing how the people working on it are organized.<\/p>\n\n\n\n It fundamentally comes down to how the people are organized around the work.<\/p>\n\n\n\n How many orgs have you worked in where there is a methodical process for making the above happen?<\/p>\n\n\n\n Taking an SRE angle for a moment, I find from my conversations with engineering managers that the SRE needs of software are not being effectively covered.<\/p>\n\n\n\n And so the endless hiring rat-race continues\u2026<\/p>\n\n\n\n Perhaps identifying the work at a deep level then adapting the team structure to it can alleviate some of this tension. <\/p>\n\n\n\n (I am developing an SRE capability auditing service to help with this)<\/p>\n\n\n\n\n I particularly love how the authors refer to Naomi Stanford, who is one of the foremost thinkers on organizational design.<\/p>\n\n\n\n They draw from her 5 rules for org design. These 5 rules are:<\/p>\n\n\n\n As basic as the above sounds, very few orgs have the knowhow, bandwidth or interest to do this in a proper fashion. Let\u2019s change that, shall we?<\/p>\n\n\n\n For a complex area of practice like SRE, it is so critical to implement the right team design and continue to evolve it as the software system\u2019s needs change.<\/p>\n\n\n\n Now, let\u2019s explore the 4 team modalities that team topologies proposes\u2026<\/p>\n\n\n Let\u2019s try and work out which team topology Site Reliability Engineers and SRE teams can fit into, as one of them is better suited than others. <\/p>\n\n\n\n We will explore each team topology 1-by-1.<\/p>\n\n\n From a team topology perspective, an SRE team is unlikely to be considered to be a stream-aligned team.<\/p>\n\n\n\n The work of SREs goes across the full business domain, not a specific sliver of the value stream.<\/p>\n\n\n\n SREs are better considered as enabling teams supporting stream-aligned teams, which brings us to\u2026<\/p>\n\n\n At their core, Site Reliability Engineers exist to support the ongoing reliability of software in production.<\/p>\n\n\n\n The existence of SRE teams for enabling the work of stream-aligned \u201cfeature\u201d teams makes sense.<\/p>\n\n\n\n SRE teams can work on their own volition to assure system reliability by doing on-call work, capacity planning, code reviews etc.<\/p>\n\n\n\n Site Reliability Engineers themselves can embed into stream-aligned teams to:<\/p>\n\n\n\n From an SRE purist\u2019s perspective, Site Reliability Engineers would rarely own a complicated subsystem.<\/p>\n\n\n\n SREs are known to run APM tools and play around with the Chaos suite, but that would only be a part of their wider role in assuring reliable software-in-production.<\/p>\n\n\n\n They may work with the likes of specialists like performance and chaos engineers to support the implementation of a service across the org, but rarely would an SRE team or SRE focus on developing the end-to-end service.<\/p>\n\n\n Site Reliability Engineers are well-aware of DIY efforts from developers that can bring down software services, so they will play a role in the platform.<\/p>\n\n\n\n They can support the platform by building passive guardrails that keep the developer’s workflows within safe confines.<\/p>\n\n\n\n The extent of their role in this will depend on past platform issues, the current developer climate and platform complexity.<\/p>\n\n\n\n SREs are not and should not become the main people for ownership of the underlying platform and its tooling.<\/p>\n\n\n There is so much more to the Team Topologies model and book. I have done a high-level overview of key concepts to give you an understanding how it applies to SRE.<\/p>\n\n\n\n Be sure to check out the book and the authors\u2019 videos for a deeper understanding of the model.<\/p>\n\n\n\n Additional reading:<\/strong><\/p>\n\n\n\n We will explore the workings of the Team Topologies model and how Site Reliability Engineering (SRE) teams can fit into it. In more detail, I will share with you the following: Let\u2019s get started. Overview of team topologies Team topologies is a relatively new model\/framework, having been officially introduced in 2019. It\u2019s a response by […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60,29],"tags":[13,7,36],"_links":{"self":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/794"}],"collection":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/comments?post=794"}],"version-history":[{"count":7,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/794\/revisions"}],"predecessor-version":[{"id":5757,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/794\/revisions\/5757"}],"wp:attachment":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/media?parent=794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/categories?post=794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/tags?post=794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}\n
Here\u2019s a quick rundown of key learnings from the book on the concept of organization itself:<\/h3>\n\n\n
\n
The following parameters apply for ideal team dynamics:<\/h3>\n\n\n
\n
Several outside influences are at play in the team topologies model. They include:<\/h3>\n\n\n
\n
Team topologies nods to modern organizational design principles<\/h3>\n\n\n
\n
4 types of team topologies<\/strong><\/h2>\n\n
1. Stream-aligned team<\/h3>\n\n\n
\n
2. Enabling team<\/h3>\n\n\n
\n
3. Complicated subsystem team<\/h3>\n\n\n
\n
4. Platform team<\/h3>\n\n\n
\n
How SRE fits into team topologies<\/h2>\n\n\n
SREs as stream-aligned teams<\/h3>\n\n\n
SREs as enabling teams<\/h3>\n\n\n
\n
SREs as complicated subsystem teams<\/h3>\n\n\n
SREs as platform teams<\/h3>\n\n\n
Parting words<\/h2>\n\n\n
\n