{"id":5728,"date":"2023-08-16T09:34:17","date_gmt":"2023-08-15T23:34:17","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=5728"},"modified":"2023-12-13T15:26:50","modified_gmt":"2023-12-13T05:26:50","slug":"cost-benefit-analysis-of-infrastructure-as-code-iac","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/cost-benefit-analysis-of-infrastructure-as-code-iac\/","title":{"rendered":"Cost-benefit analysis of infrastructure-as-code (IAC)"},"content":{"rendered":"\n<p>You might have heard that Infrastructure-as-code (IaC) contributes to better cloud-native software architecture. <\/p>\n\n\n\n<p>But what is IaC, what are its benefits &amp; trade-offs and how can it be improved?<\/p>\n\n\n\n<p>This guide aims to give clarity around IaC through:<\/p>\n\n\n\n<ul>\n<li>a rundown of infrastructure-as-code (IaC)<\/li>\n\n\n\n<li>cost-benefit view of implementing IaC<\/li>\n<\/ul>\n\n\n\n<p>It can serve as a starting point for business-specific conversations with stakeholders.<\/p>\n\n\n\n<p>At some point, senior management may want to know what exactly they\u2019re buying into (or have bought into already).<\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"introduction-to-iac\">Introduction to IaC<\/h2>\n\n\n<p><strong>Fact #1<\/strong> \u2014 Running reliable software in the real world depends in a significant part on properly provisioned and configured infrastructure<\/p>\n\n\n\n<p><strong>Fact #2 \u2014<\/strong> Your codebase is constantly evolving to support new features and feature improvements<\/p>\n\n\n\n<p><strong>Fact #3<\/strong> \u2014 Infrastructure these days is a complicated combination of bare metal, VMs, networks and software-based abstractions like Kubernetes.<\/p>\n\n\n\n<p><strong>Fact #4<\/strong> \u2014 The combination of infrastructure and evolving code needs careful orchestration in order to support the growing and volatile demands of users on production software systems.<\/p>\n\n\n\n<p>Infrastructure-as-code (IaC) allows engineers to operate effectively with these facts in mind. <\/p>\n\n\n\n<p>They define and manage infrastructure using code. You can automate what was previously a series of costly manual processes.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"the-changing-infrastructure-landscape-that-spawned-iac\">The changing infrastructure landscape that spawned IaC<\/h3>\n\n\n<p>In the past, the infrastructure that supported applications was seen as something that didn\u2019t change often \u2014 at least not without noticeable capital investment. <\/p>\n\n\n\n<p>This was a time when engineers were limited to the physical infrastructure tucked away in the basement or at a nearby data center.<\/p>\n\n\n\n<p>The SDLC was similarly restrictive \u2014 developers wrote code, which then had to pass unit tests. <\/p>\n\n\n\n<p>If all was okay, the code got deployed with the help of systems administrators who manually set up the infrastructure.<\/p>\n\n\n\n<p>The work was clunky because:<\/p>\n\n\n\n<ul>\n<li>A lot of the time, code was wrangled back and forth between developers and operations, as it failed tests or performed poorly in the production environment.<\/li>\n\n\n\n<li>If software needed to scale up, system administrators needed to physically connect more bare metal to the network and then provision them.<\/li>\n<\/ul>\n\n\n\n<p>The type of infrastructure in this olden-day scenario is called <em>on-prem<\/em>, short for on-premises infrastructure. <\/p>\n\n\n\n<p>The speed and scalability of this approach would not be able to cope with the volatile infrastructure demands of today.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"demands-on-infrastructure-are-highly-fluid\">Demands on infrastructure are highly fluid<\/h3>\n\n\n<p>The on-prem model was okay to support demand levels for most software at a time when the few users around were not so dependent on technology. <\/p>\n\n\n\n<p>But now, <strong>there are way more users today and they demand a consistently high level of application performance<\/strong>.<\/p>\n\n\n\n<p>There is a clear need to be fluid and react quickly to corresponding changes in demand for infrastructure. <\/p>\n\n\n\n<p>With a cloud-first architecture, scaling up to meet demand and subsequently winding it down is made easier.<\/p>\n\n\n\n<p>The machines are no longer in your data center, and with virtual machine (VM) technology, they\u2019re not even physical. <\/p>\n\n\n\n<p>You can spool up 100s of new VMs at your cloud provider in minutes.<\/p>\n\n\n\n<p>But doing this kind of spooling up action regularly and at scale comes with its own kinds of challenges, in particular, what can become a lot of manual work.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"manual-infrastructure-work-is-fine-until-it-isnt\">Manual infrastructure work is fine \u2014 until it isn\u2019t<\/h3>\n\n\n<p>Ask any Site Reliability Engineer (SRE) what their unique superpower is, and they will tell you it\u2019s their relentless pursuit of <em>reducing toil<\/em> i.e. manual, repetitive work. **<\/p>\n\n\n\n<p>Without IaC, which I\u2019ll get to in a moment, the engineer manually loads up VMs through command line prompts or clicks around on an administrator panel. <\/p>\n\n\n\n<p>Either that or they send a ticket to the assigned infrastructure engineer to spool up a new VM.<\/p>\n\n\n\n<p>This of course adds significant human-related lagtime, as well as error risk from the <em>handoff effect<\/em>. <\/p>\n\n\n\n<p>Very briefly, the handoff effect implies that the work that gets passed from one person to another to contribute is more likely to have errors.<\/p>\n\n\n\n<p>The errors in handoffs can occur because:<\/p>\n\n\n\n<ul>\n<li>of how the work gets interpreted by each person along the chain<\/li>\n\n\n\n<li>requirements can change in the time gap between request and action, but only the original request gets actioned<\/li>\n<\/ul>\n\n\n\n<p>The above might seem like an acceptable risk if you\u2019ve got a non-critical use case with small infrastructure needs, but for anything else with a reasonable scale, this becomes burdensome.<\/p>\n\n\n\n<p>From my experience, high 10s of VMs and up can start chipping away on the cognitive load capacity of an infrastructure-focused engineer. <\/p>\n\n\n\n<p>At 100s of VMs and up, manual work and handoffs are debilitating to an engineer\u2019s workflow and hinder providing reliable services to end-users.<\/p>\n\n\n\n<p>This is compounded by the fact that VMs that used to run for months to years now only run for a few days to weeks. <\/p>\n\n\n\n<p>The frequency of change to infrastructure has accelerated and so ticket build-up to spool up and spin down VMs is a real possibility.<\/p>\n\n\n\n<p>To me, this seems like a clear case of onerous manual work \u2014 a waste of highly-paid engineer time \u2014 at best, and a high risk for compounding human error at worst.<\/p>\n\n\n\n\n<h3 class=\"wp-block-heading\" id=\"iac-reduces-toil-in-infrastructure-provisioning\">IaC reduces toil in infrastructure provisioning<\/h3>\n\n\n<p>IaC solves the toil problem by maintaining the infrastructure\u2019s properties as a code within a file. When you need to change the infrastructure, you modify the code within the file. No endless clicking around control panels. No tickets to send.<\/p>\n\n\n\n<p><strong>With IaC,<\/strong> t<strong>he code you add and amend automatically drives the infrastructure changes you need<\/strong>.<\/p>\n\n\n\n<p>I know I repeated myself there, but this distinction of IaC is important to let sink in.<\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"costbenefit-view-of-implementing-iac\">Cost-benefit view of implementing IaC<\/h2>\n\n\n<p>For implementations with a reasonable scale \u2014 high 10s of VMs as I mentioned earlier \u2014 the benefits outweigh the time, money, and energy costs required to do IaC. <\/p>\n\n\n\n<p>If you\u2019re starting out on IaC, your transformation cost will be minimized by 2 factors:<\/p>\n\n\n\n<p>(1) your primary infrastructure is already in the cloud <em>and<\/em><\/p>\n\n\n\n<p>(2) there\u2019s institutional or team knowledge of IaC tools and methods<\/p>\n\n\n\n<p>The former is already a given in most modern organizations and the latter can be developed rapidly with an effective IaC capability development approach.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"heres-a-quick-rundown-of-the-benefits\">Here\u2019s a quick rundown of the benefits:<\/h3>\n\n\n<ul>\n<li><strong>code integrity<\/strong> &#8211; lowers technical debt of change through auditable, version-controlled code<\/li>\n\n\n\n<li><strong>lower cost<\/strong> &#8211; less engineer time ($$$) is spent on \u201cyak shaving\u201d (repetitive, manual tasks)<\/li>\n\n\n\n<li><strong>faster deployments<\/strong> &#8211; little lag time for new VMs to spool up once code changes are deployed<\/li>\n\n\n\n<li><strong>lower human error<\/strong> &#8211; no handoffs means less risk of human errors that lead to downtime and performance degradation<\/li>\n\n\n\n<li><strong>higher availability<\/strong> &#8211; reduction in non-availability of infrastructure during spikes in demand<\/li>\n<\/ul>\n\n\n\n<p>Another sellable benefit of IaC is that it supports DevOps, which is very <em>in<\/em> right now. <\/p>\n\n\n\n<p>This is the case because an easy-to-share code paradigm allows developers to get more involved in configuration and collaborate with production-focused engineers.<\/p>\n\n\n\n<p>Now, let\u2019s cover some costs and benefits of IaC in more detail.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"cost-iac-takes-time-to-learn\">Cost: IaC takes time to learn<\/h3>\n\n\n<p>IaC is a new paradigm for engineers who may be used to SSHing into a server and directly making modifications. <\/p>\n\n\n\n<p>With IaC, engineers will note an additional step between their writing a change and the change being deployed to the infrastructure.<\/p>\n\n\n\n<p>The engineer makes the necessary code addition or adjustments and pushes them to the provisioning tool, which then directs the changes to the infrastructure. <\/p>\n\n\n\n<p>They first need to learn the code and secondly need to keep the habit and avoid the temptation to make direct \u201cdial-in\u201d changes to infrastructure.<\/p>\n\n\n\n<p>The engineering group will need to invest in the ongoing development of engineers to ensure this happens. <\/p>\n\n\n\n<p>One path involves implementing a culture (change) that fosters continuous development. <\/p>\n\n\n\n<p>This could manifest as ongoing feedback and learning loops.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"benefit-iac-is-flexible-to-many-kinds-of-infrastructure\">Benefit: IaC is flexible to many kinds of infrastructure<\/h3>\n\n\n<p>Infrastructure-as-code isn\u2019t relegated to public cloud computing use cases. You can use it to define the physical infrastructure that you have on premises.<\/p>\n\n\n\n<p>The benefit of using IaC in this situation is that every application gets assigned a distinct set of resources from the outset. <\/p>\n\n\n\n<p>You gain greater visibility and granularity into how resources get allocated to applications.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"benefit-iac-assures-consistency-across-environments\">Benefit: IaC assures consistency across environments<\/h3>\n\n\n<p>In my own experience, I&#8217;ve seen way too much code crumble in production. <\/p>\n\n\n\n<p>The cause was sometimes simple like differing environments between stages of the software development lifecycle.<\/p>\n\n\n\n<p>Developers were testing on a different environment \u2014 \u201clocalhost\u201d \u2014 to what production would be. <\/p>\n\n\n\n<p>The <a href=\"http:\/\/localhost\">localhost<\/a> was often souped up in comparison with the production environment planned by operators.<\/p>\n\n\n\n<p>The concept of having a single source of infrastructure code for all stages reduces the risk of different resource allocations \u2014 and subsequently different performance \u2014 for the same feature or story.<\/p>\n\n\n\n<p>This works all the way down to the granular level of matching OS versions, patch level, etc. <\/p>\n\n\n\n<p>Differences in these granular properties are often the culprit behind code working well in testing, but not in production.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A live environment clone, created using the exact same IaC as the live environment, has the absolute guarantee that if it works in the cloned environment it will work in live. \u2014 Dan Merron &amp; Shanika Wickramasinghe, DevOps consultants at BMC<\/p>\n<\/blockquote>\n\n\n\n<p>IaC also ensures that different layers of infrastructure supporting your code are defined appropriately to suit your production requirements. These layers include:<\/p>\n\n\n\n<ul>\n<li>IaaS artifacts like VMs, load balancers, databases<\/li>\n\n\n\n<li>On-premises hardware<\/li>\n\n\n\n<li>Platforms like Kubernetes<\/li>\n<\/ul>\n\n\n<h3 class=\"wp-block-heading\" id=\"cost-iac-needs-consistent-maintenance\">Cost: IaC needs consistent maintenance<\/h3>\n\n\n<p>The IaC code that you have today may not be viable in the near future. This is because <strong>the underlying infrastructure is constantly changing<\/strong>. <\/p>\n\n\n\n<p>Kubernetes is releasing updates all the time. Operating systems need constant patching. New security rules get recommended.<\/p>\n\n\n\n<p>IaC is a constantly moving target.<\/p>\n\n\n\n<p>Subsequently, the necessary code for controlling infrastructure is always different from what it may have been earlier. <\/p>\n\n\n\n<p>This calls for a <strong>consistent testing routine, so you can ensure that the code is up-to-date<\/strong> with all your up-to-date IaaS artifacts and platforms.<\/p>\n\n\n\n<p>It also goes without saying that the engineers responsible for IaC will need to stay on top of the changes that occur \u2014 all the time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You might have heard that Infrastructure-as-code (IaC) contributes to better cloud-native software architecture. But what is IaC, what are its benefits &amp; trade-offs and how can it be improved? This guide aims to give clarity around IaC through: It can serve as a starting point for business-specific conversations with stakeholders. At some point, senior management [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60,5],"tags":[33,34],"_links":{"self":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/5728"}],"collection":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/comments?post=5728"}],"version-history":[{"count":10,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/5728\/revisions"}],"predecessor-version":[{"id":5816,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/5728\/revisions\/5816"}],"wp:attachment":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/media?parent=5728"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/categories?post=5728"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/tags?post=5728"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}