Home > Trainings > SRE Practitioner
SRE Practitioner
Introduces a range of practices for advancing service reliability engineering through a mixture of automation, organizational ways of working and business alignment. Tailored for those focused on large-scale service scalability and reliability.
Training Schedule
Features
24 hours of Instructor-led training classes
Share relevant Industry Insights
Shares real-world experience
Course Objective
The SRE (Site Reliability Engineering) Practitioner course introduces ways to economically and reliably scale services in an organization. It explores strategies to improve agility,cross-functional collaboration, and transparency of health of services towards building resiliency by design, automation, and closed-loop remediations.
Course Agenda
Module 1: SRE Anti-patterns
● Rebranding Ops as SRE
● Users notice an issue before you do
● Measuring until my Edge
● False positives are worse than no alerts
● Configuration management trap for snowflakes
● The Dogpile: Mob incident response
● Point fixing
● Production Readiness Gatekeeper
● Fail-Safe really?
Module 2: SLO is a Proxy for Customer Happiness
● Define SLIs that meaningfully measure the reliability of a service from a user’s
perspective
● Choose appropriate SLO targets, including how to perform statistical and probabilistic
analysis
● Use error budgets to help your team have better discussions and make better
data-driven decisions
Module 3: Building Secure, Scalable and Reliable Systems
● SRE and their role in Building Secure and Reliable systems
● Design for Changing Architecture
● Fault tolerant Design
● Design for Security
● Design for Resiliency
● Design for Reliability
Module 4: Full-Stack Observability
● Modern Apps are Complex & Unpredictable
● Slow is the new down
● Pillars of Observability
● Using Open Telemetry
Module 5: Platform Engineering and AIOPs
● Taking a Platform Centric View
● AIOps -> A Bigdata view to go from reactive to proactive to predictive management
● Technology becomes more human through ML, allowing ubiquitous self-service
Module 6: SRE & Incident Response Management
● SRE Key Responsibilities towards incident response
● DevOps & SRE and ITIL
● OODA and SRE Incident Response
● Closed Loop Remediation and the Advantages
● Swarming – Food for Thought
Module 7: DiRT and Chaos Engineering
● Disaster Recovery Testing
● Fault Injection
● Chaos Engineering
● Tools that can be instrumented for Chaos Engineering
Module 8: SRE is the Purest form of DevOps
● Key Principles of SRE
● SREs help increase Reliability across the spectrum
● Metrics for Success
● Possible implementation Model
● Culture and Behavioral Skills are key
● SRE Case Study
Exam & Certification
Successfully passing (65%) the 90-minute examination, consisting of 40 multiple-choice questions, leads to the SRE Practitioner certificate.The certification is governed and maintained by DevOps Institute.
FAQs
Are there any pre-requisites for this course?
It is highly recommended that learners attend the SRE Foundation course with an accredited DevOps Institute Education Partner and earn the SRE Foundation certification prior to attending the SRE Practitioner course and exam. An understanding and knowledge of common SRE terminology, concepts, principles and related work experience are recommended.
How do I take the exam?
TaUB Solutions will request the examinations together with your registration. The exam will be taken at the end of the course. Results are available within 5 working days.