In this blog, we dwell on how SRE would help ITSM professionals gain leap advantage and deliver consistent value.
As a Service Management Professional since 2005, I have been seeing the shift from Process Centric approach (ITIL V2) to Service Lifecycle (ITIL V3) and now focused on Service Value System/Service Value Chain (ITIL 4)
It is evident that the shift is moving towards customer-centricity and within that space more focused on value for products and services being delivered to the consumer. While ITIL is a defacto framework for ITSM, it has never been prescriptive. So, it pretty much depends on how we interpret it and use it in the context of the organization. (Adopt & Adapt).
With the advent of Digital Enterprise, organizations are grappling to release new features in the form of sprints (Agile) and rapid deployment (DevOps). While we see today that Agile as a set of principles and values have matured in the form of Practices (Scrum, Kanban, TDD, etc.), DevOps has become the norm today for organizations moving up the value chain.
With this trend, the concept of Site Reliability Engineering (SRE) has emerged as the next big movement towards building solutions and services with higher reliability and stability.
In this blog, we dwell on aspects of how the knowledge of SRE would help ITSM professionals gain leap advantage and deliver consistent value.
SLAs vs SLOs
Customers and Service Providers get into an agreement using the Service Level Agreement (SLA) that determines our performance measures in terms of Availability, Capacity, Response Time, Resolution Time and others. While we have had fair success in meeting our performance targets, customers demonstrate the watermelon syndrome. Also, there have been instances of several breaches that make service providers vulnerable to create a consistent user experience.
In this regard, SRE recommends having Service Level Objective (SLO) built that has a higher threshold than SLA and can proactively check, before any misses. This is tied with SLI (Service Level Indicators) that provides a useful ratio of “Total successful outcomes/Total Possible outcomes). It serves as a yardstick for continual improvement
Warranty and Stability
With DevOps, it is guaranteed that there are iterative releases, automated deployment pipeline and improved velocity of features being deployed in sprints. However, the question of NFT (Non-Functional Testing) along with meeting Warranty Aspects (Capacity, Availability, IT Service Continuity and Security) is still very much a dream to accomplish.
In this regard, SRE suggests the Infrastructure teams and Operations team to adopt a Software Engineering approach towards operations and improve warranty and stability. This is a great opportunity for ITSM professionals to gain deep insights on how they can help improve the overall stability and reliability in client environments
Error Budget vs CSI
While we have seen Continual Service Improvement as a backbone for any improvements, SRE highlights the approach of Error Budgets and how they can indicate the triggers to tangible service improvements.
While Agile teams do the Sprint Retrospectives, ITSM can take advantage of doing Blameless Postmortem of Major Incidents and unplanned outages. Every Failure helps us to understand the shortcomings, vulnerability and shift our focus to improve overall reliability of services.
These are some salient aspects that ITSM professionals stand to gain by developing interest and knowledge on SRE. What has been your experience?
Suresh GP is the Managing Director of TaUB solutions Pte Ltd, Singapore & TaUB Solutions LLP in Bangalore. He has more than 18 years of experience in IT Service Management, IT Governance, BRM, Agile, DevOps and Organizational Change Management. He is a regular blogger and International speaker at itSMF UK, itSMF (Australia, USA, Finland, Norway, Singapore).