Millions of people rely on Uber for their livelihood and transportation. We take this responsibility very seriously and hold ourselves to a 10-minute SLA for mitigating user impacting outages. However, Uber Engineering uses a ‘test in production’ approach to change rollout – this enables us to move fast, but also means we sometimes break things. In this talk, explore the way Uber has historically handled the inherent conflict in these two philosophies. We’ll explore Uber’s long standing regional failover mitigation tool and the new mitigation tooling that is quickly taking its place.
Carissa Blossom is a member of the Production Engineering team for Eats & Delivery at Uber. She is also an Incident Commander for Ring0, Uber Engineering’s primary task force for critical outage mitigation. In her four and a half years at Uber, she has served as Production Engineer for Eats, Software Networking Edge and the core Marketplace team.