{"id":385,"date":"2022-04-01T23:42:00","date_gmt":"2022-04-01T13:42:00","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=385"},"modified":"2023-12-13T15:28:02","modified_gmt":"2023-12-13T05:28:02","slug":"6-system-resilience-patterns-for-increasing-software-reliability","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/6-system-resilience-patterns-for-increasing-software-reliability\/","title":{"rendered":"How 6 system resilience patterns increase software reliability"},"content":{"rendered":"
System resilience thinking can inform better Site Reliability Engineering decisions. Specifically, it can affect how the SRE culture unfolds and handles critical situations. <\/p>\n\n\n\n
The system resilience <\/em>concept is rooted in theoretical computer science. <\/p>\n\n\n\n Don’t panic. I will explain how it can – in a practical way – support increased software reliability<\/strong> in production. <\/p>\n\n\n\n We will cover six patterns that comprise system resilience:<\/p>\n\n\n The above terms likely make little sense, but we will unpack each in a moment. <\/p>\n\n\n\n First, let’s define system resilience in the software context:<\/p>\n\n\n\n System resilience is the ability of organizational, hardware and software systems to mitigate the severity and likelihood of failures<\/strong> or losses<\/strong>, to adapt to changing conditions, and to respond appropriately after the fact.<\/em><\/p>\n\u2014 Jackson, Scott. (2007). System Resilience: Capabilities, Culture and Infrastructure. INCOSE International Symposium.<\/cite><\/blockquote>\n\n\n\n It’s a very academic definition but very precise in its meaning. The concept of system resilience is important for proactively addressing software performance and reliability<\/strong>. <\/p>\n\n\n\n Now, let’s unpack each of the six patterns of system resilience:<\/p>\n\n\n Monitor for and detect adverse events in a timely manner, well before they can snowball into a critical issue. <\/p>\n\n\n You can make for a superior monitoring effort by:<\/p>\n\n\n\n Respond to the adverse event in a timely and effective manner<\/p>\n\n\n Make these balancing considerations when you detect an adverse event and decide to respond to it:<\/p>\n\n\n\n\n
<\/figure>\n\n<\/div><\/div><\/div>\n<\/div>\n\n\n
\n
Resilience pattern #1: Superior monitoring<\/h2>\n\n
What does it mean?<\/h3>\n\n\n
How to apply it to SRE practice<\/h3>\n\n\n
\n
Resilience pattern #2: Adaptive response<\/h2>\n\n
What does it mean?<\/h3>\n\n\n
How to apply it to SRE practice<\/h3>\n\n\n