Skip to main content

Home > Consulting > Reliability Engineering Coaching

Reliability Engineering Coaching

The Reliability Engineering Coaching workshop introduces ways to economically and reliably scale services in an organization. It explores strategies to improve agility, cross-functional collaboration and transparency of health of services towards building resiliency by design, automation and closed loop remediations.

Contact Us

Training Schedule

Sep 9th to 11th
Virtual
Oct 14th to 16th
Virtual
Nov 4th to 6th
Virtual

Features

Three Days

24 hours of Instructor-led training classes

Case Study

Share relevant Industry Insights

By Practioners

Shares real-world experience

Workshop Objective

This workshop  aims to equip participants with the practices, methods, and tools to engage people across the organization involved in reliability through the use of real-life scenarios and case stories. Upon completion of the workshop, participants will have tangible takeaways to leverage when back in the office such as implementing reliability models that fit their organizational context, building advanced observability in distributed systems, building resiliency by design and effective incident responses.

The workshop is developed by leveraging key sources, engaging with thought-leaders in the DevOps  space and working with organizations embracing Reliability Engineering to extract real-life best practices and has been designed to coach the key principles & practices.

WORKSHOP OBJECTIVES

At the end of the workshop, the following learning objectives are expected to be achieved:

  1. Practical view of how to successfully implement a flourishing reliability culture in your organization.
  2. The underlying principles of reliability and an understanding of what it is not in terms of anti-patterns, and how do you become aware of them to avoid them.
  3. The organizational impact of introducing reliability.
  4. Acing the art of SLIs and SLOs in a distributed ecosystem, and extending the usage of Error Budgets beyond the normal to innovate and avoid risks.
  5. Building security and resilience by design in a distributed, zero-trust environment.
  6. How do you implement full stack observability, distributed tracing and bring about an Observability-driven development culture?
  7. Curating data using AI to move from reactive to proactive and predictive incident management. Also, how do you use DataOps to build clean data lineage.
  8. Why is Platform Engineering so important in building consistency and predictability for reliability ?
  9. Implementing practical Chaos Engineering.
  10. Major incident response responsibilities for a reliability based on incident command framework, and examples of anatomy of unmanaged incidents.
  11. Perspective of why Reliability Engineering can be considered as the purest implementation of DevOps.
  12. Reliability Engineering Execution model
  13. Understanding why reliability is everyone’s problem.
  14. Success story learnings from a Reliability perspective.

Course Agenda

Module 1:

Module 1: Anti-patterns

● Rebranding Ops as Reliability Engineering
● Users notice an issue before you do
● Measuring until my Edge
● False positives are worse than no alerts
● Configuration management trap for snowflakes
● The Dogpile: Mob incident response
● Point fixing
● Production Readiness Gatekeeper
● Fail-Safe really? & Use Case Discussion

Module 2:

Module 2: SLO is a Proxy for Customer Happiness

● Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
● Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
● Use error budgets to help your team have better discussions and make better data-driven decisions & Use Case Discussion

Module 3:

Module 3: Building Secure, Scalable and Reliable Systems

● Reliability Engineering and its role in Building Secure and Reliable systems
● Design for Changing Architecture
● Fault tolerant Design
● Design for Security
● Design for Resiliency
● Design for Reliability & Use Case Discussion

Module 4:

Module 4: Full-Stack Observability

● Modern Apps are Complex & Unpredictable
● Slow is the new down
● Pillars of Observability
● Using Open Telemetry & Use Case Discussion

Module 5:

Module 5: Platform Engineering and AIOPs

● Taking a Platform Centric View
● AIOps -> A Bigdata view to go from reactive to proactive to predictive management
● Technology becomes more human through ML, allowing ubiquitous self-service & Use Case Discussion

Module 6:

Module 6: Incident Response Management

● Key Responsibilities towards incident response
● DevOps & ITIL
● OODA and Reliability Incident Response
● Closed Loop Remediation and the Advantages
● Swarming – Food for Thought & Use Case Discussion

Module 7:

Module 7: DiRT and Chaos Engineering

● Disaster Recovery Testing
● Fault Injection
● Chaos Engineering
● Tools that can be instrumented for Chaos Engineering & Use Case Discussion

Module 8:

Module 8: Reliability is the Purest form of DevOps

● Key Principles of Reliability Engineering
● How to increse increase Reliability across the spectrum
● Metrics for Success
● Possible implementation Model
● Culture and Behavioral Skills are key
● Case Study & Use Case Discussion

Pricing

Pricing: 2000 SGD per participant for 3 days workshop.
1500 USD per participant for 3 days workshop.
Note: Prices could vary depending on the country you reside. Please contact us at info@taubsolutions.com for further details
Email Us

Workshop Schedule

Schedule for Workshop

  • Sep 9th to 11th
  • Oct 14th to 16th
  • Nov 4th to 6th

Are there any pre-requisites for this course?

It is highly recommended that learners attend the SRE Foundation course with an accredited DevOps Institute Education Partner and earn the SRE Foundation certification prior to attending the Reliability Engineering Coaching and exam.  An understanding and knowledge of common SRE terminology, concepts, principles and related work experience are recommended.

Enroll Now

The session was delivered effectively. It was very interactive with on the job examples. I learned a lot and plan to implement them. Thanks Suresh GP for such a wonderful session.

Sheethal Sudheer BRMPBRM, Business Owner DevSecOps tools, Digital Transformation, Process Management

It was a very interactive session with a lot of real life examples. Very good and informative session by
Suresh GP

Vijaykumar TCVice President - Citi Technology Infrastructure

The CBRM training offered through TaUB Solutions is an excellent course. I first heard Suresh speak a few years back at the BRMConnect conference and taking this course solidified my opinion and respect of Suresh’s vast knowledge of BRM discipline

Michelle Day, CBRM®Director, Business Relationship Management

The training was absolutely fantastic, Extremely valuable.
It changes the way we look at SRE. Suresh was BEST instructor you could ask for, more importantly,
he was helping with real world problems based on his experiences.

Praveen PatilSenior Engineering Manager at Lowe's Companies, Inc.