Cloudflare operates a global anycast edge network serving content for 6 million web sites. This talk explains how we monitor our network and the architecture we chose to provide maximum reliability for monitoring. We'll also discuss the impact of alert fatigue and how we reduced alert noise by analysing data, making alerts more actionable and alerting on symptoms rather than causes.
This talk will cover:
Matt is a Platform Engineer at Cloudflare. He was previously tech lead for the GOV.UK Infrastructure team and is a keen contributor to open source software. He also loves bacon, avocado, running, and the Oxford comma.