Observability has become a hot topic in reliability engineering and DevOps. But if you ask four different engineers what it means, you will probably get four different definitions. In this talk, I’ll cover the definition that I think is most useful as part of managing distributed systems: observability is the practice of understanding change in software systems. I’ll talk about the building blocks of observability, give some examples, and also relate it to the time-tested practice of monitoring production systems. While it may be tempting to think that new tools obviate the need to monitor systems, I’ll argue that there are some aspects monitoring that are here to stay. At the end of this talk, you’ll have a better sense of the value of the tools that you already have and some ideas as to when you should consider expanding your toolbox.
Daniel “Spoons” Spoonhower is a co-founder and Chief Architect at Lightstep, where he’s building next-generation monitoring and observability tools. He is an author of Distributed Tracing in Practice (O’Reilly Media, 2020). Previously, Spoons spent almost six years at Google where he worked as part of Google’s infrastructure and Cloud Platform teams. He has published papers on the performance of parallel programs, garbage collection, and real-time programming. He has a PhD in programming languages from Carnegie Mellon University but still hasn’t found one he loves.