{"id":5728,"date":"2023-08-16T09:34:17","date_gmt":"2023-08-15T23:34:17","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=5728"},"modified":"2023-12-13T15:26:50","modified_gmt":"2023-12-13T05:26:50","slug":"cost-benefit-analysis-of-infrastructure-as-code-iac","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/cost-benefit-analysis-of-infrastructure-as-code-iac\/","title":{"rendered":"Cost-benefit analysis of infrastructure-as-code (IAC)"},"content":{"rendered":"\n
You might have heard that Infrastructure-as-code (IaC) contributes to better cloud-native software architecture. <\/p>\n\n\n\n
But what is IaC, what are its benefits & trade-offs and how can it be improved?<\/p>\n\n\n\n
This guide aims to give clarity around IaC through:<\/p>\n\n\n\n
It can serve as a starting point for business-specific conversations with stakeholders.<\/p>\n\n\n\n
At some point, senior management may want to know what exactly they\u2019re buying into (or have bought into already).<\/p>\n\n\n
Fact #1<\/strong> \u2014 Running reliable software in the real world depends in a significant part on properly provisioned and configured infrastructure<\/p>\n\n\n\n Fact #2 \u2014<\/strong> Your codebase is constantly evolving to support new features and feature improvements<\/p>\n\n\n\n Fact #3<\/strong> \u2014 Infrastructure these days is a complicated combination of bare metal, VMs, networks and software-based abstractions like Kubernetes.<\/p>\n\n\n\n Fact #4<\/strong> \u2014 The combination of infrastructure and evolving code needs careful orchestration in order to support the growing and volatile demands of users on production software systems.<\/p>\n\n\n\n Infrastructure-as-code (IaC) allows engineers to operate effectively with these facts in mind. <\/p>\n\n\n\n They define and manage infrastructure using code. You can automate what was previously a series of costly manual processes.<\/p>\n\n\n In the past, the infrastructure that supported applications was seen as something that didn\u2019t change often \u2014 at least not without noticeable capital investment. <\/p>\n\n\n\n This was a time when engineers were limited to the physical infrastructure tucked away in the basement or at a nearby data center.<\/p>\n\n\n\n The SDLC was similarly restrictive \u2014 developers wrote code, which then had to pass unit tests. <\/p>\n\n\n\n If all was okay, the code got deployed with the help of systems administrators who manually set up the infrastructure.<\/p>\n\n\n\n The work was clunky because:<\/p>\n\n\n\n The type of infrastructure in this olden-day scenario is called on-prem<\/em>, short for on-premises infrastructure. <\/p>\n\n\n\n The speed and scalability of this approach would not be able to cope with the volatile infrastructure demands of today.<\/p>\n\n\n The on-prem model was okay to support demand levels for most software at a time when the few users around were not so dependent on technology. <\/p>\n\n\n\n But now, there are way more users today and they demand a consistently high level of application performance<\/strong>.<\/p>\n\n\n\n There is a clear need to be fluid and react quickly to corresponding changes in demand for infrastructure. <\/p>\n\n\n\n With a cloud-first architecture, scaling up to meet demand and subsequently winding it down is made easier.<\/p>\n\n\n\n The machines are no longer in your data center, and with virtual machine (VM) technology, they\u2019re not even physical. <\/p>\n\n\n\n You can spool up 100s of new VMs at your cloud provider in minutes.<\/p>\n\n\n\n But doing this kind of spooling up action regularly and at scale comes with its own kinds of challenges, in particular, what can become a lot of manual work.<\/p>\n\n\n Ask any Site Reliability Engineer (SRE) what their unique superpower is, and they will tell you it\u2019s their relentless pursuit of reducing toil<\/em> i.e. manual, repetitive work. **<\/p>\n\n\n\n Without IaC, which I\u2019ll get to in a moment, the engineer manually loads up VMs through command line prompts or clicks around on an administrator panel. <\/p>\n\n\n\n Either that or they send a ticket to the assigned infrastructure engineer to spool up a new VM.<\/p>\n\n\n\n This of course adds significant human-related lagtime, as well as error risk from the handoff effect<\/em>. <\/p>\n\n\n\n Very briefly, the handoff effect implies that the work that gets passed from one person to another to contribute is more likely to have errors.<\/p>\n\n\n\n The errors in handoffs can occur because:<\/p>\n\n\n\n The above might seem like an acceptable risk if you\u2019ve got a non-critical use case with small infrastructure needs, but for anything else with a reasonable scale, this becomes burdensome.<\/p>\n\n\n\n From my experience, high 10s of VMs and up can start chipping away on the cognitive load capacity of an infrastructure-focused engineer. <\/p>\n\n\n\n At 100s of VMs and up, manual work and handoffs are debilitating to an engineer\u2019s workflow and hinder providing reliable services to end-users.<\/p>\n\n\n\n This is compounded by the fact that VMs that used to run for months to years now only run for a few days to weeks. <\/p>\n\n\n\n The frequency of change to infrastructure has accelerated and so ticket build-up to spool up and spin down VMs is a real possibility.<\/p>\n\n\n\n To me, this seems like a clear case of onerous manual work \u2014 a waste of highly-paid engineer time \u2014 at best, and a high risk for compounding human error at worst.<\/p>\n\n\n\n\n IaC solves the toil problem by maintaining the infrastructure\u2019s properties as a code within a file. When you need to change the infrastructure, you modify the code within the file. No endless clicking around control panels. No tickets to send.<\/p>\n\n\n\n With IaC,<\/strong> the code you add and amend automatically drives the infrastructure changes you need<\/strong>.<\/p>\n\n\n\n I know I repeated myself there, but this distinction of IaC is important to let sink in.<\/p>\n\n\n For implementations with a reasonable scale \u2014 high 10s of VMs as I mentioned earlier \u2014 the benefits outweigh the time, money, and energy costs required to do IaC. <\/p>\n\n\n\n If you\u2019re starting out on IaC, your transformation cost will be minimized by 2 factors:<\/p>\n\n\n\n (1) your primary infrastructure is already in the cloud and<\/em><\/p>\n\n\n\n (2) there\u2019s institutional or team knowledge of IaC tools and methods<\/p>\n\n\n\n The former is already a given in most modern organizations and the latter can be developed rapidly with an effective IaC capability development approach.<\/p>\n\n\n Another sellable benefit of IaC is that it supports DevOps, which is very in<\/em> right now. <\/p>\n\n\n\n This is the case because an easy-to-share code paradigm allows developers to get more involved in configuration and collaborate with production-focused engineers.<\/p>\n\n\n\n Now, let\u2019s cover some costs and benefits of IaC in more detail.<\/p>\n\n\n IaC is a new paradigm for engineers who may be used to SSHing into a server and directly making modifications. <\/p>\n\n\n\n With IaC, engineers will note an additional step between their writing a change and the change being deployed to the infrastructure.<\/p>\n\n\n\n The engineer makes the necessary code addition or adjustments and pushes them to the provisioning tool, which then directs the changes to the infrastructure. <\/p>\n\n\n\n They first need to learn the code and secondly need to keep the habit and avoid the temptation to make direct \u201cdial-in\u201d changes to infrastructure.<\/p>\n\n\n\n The engineering group will need to invest in the ongoing development of engineers to ensure this happens. <\/p>\n\n\n\n One path involves implementing a culture (change) that fosters continuous development. <\/p>\n\n\n\n This could manifest as ongoing feedback and learning loops.<\/p>\n\n\n Infrastructure-as-code isn\u2019t relegated to public cloud computing use cases. You can use it to define the physical infrastructure that you have on premises.<\/p>\n\n\n\n The benefit of using IaC in this situation is that every application gets assigned a distinct set of resources from the outset. <\/p>\n\n\n\n You gain greater visibility and granularity into how resources get allocated to applications.<\/p>\n\n\n In my own experience, I’ve seen way too much code crumble in production. <\/p>\n\n\n\n The cause was sometimes simple like differing environments between stages of the software development lifecycle.<\/p>\n\n\n\n Developers were testing on a different environment \u2014 \u201clocalhost\u201d \u2014 to what production would be. <\/p>\n\n\n\nThe changing infrastructure landscape that spawned IaC<\/h3>\n\n\n
\n
Demands on infrastructure are highly fluid<\/h3>\n\n\n
Manual infrastructure work is fine \u2014 until it isn\u2019t<\/h3>\n\n\n
\n
IaC reduces toil in infrastructure provisioning<\/h3>\n\n\n
Cost-benefit view of implementing IaC<\/h2>\n\n\n
Here\u2019s a quick rundown of the benefits:<\/h3>\n\n\n
\n
Cost: IaC takes time to learn<\/h3>\n\n\n
Benefit: IaC is flexible to many kinds of infrastructure<\/h3>\n\n\n
Benefit: IaC assures consistency across environments<\/h3>\n\n\n