Who owns this outage? Building intelligent, automated escalation chains
The Stack Overflow Podcast - A podcast by The Stack Overflow Podcast
 
   Categories:
If your organization is running code on a production server 24/7, you’re going to need a process to handle when that code—or the infrastructure it runs on—fails. No code is bug free, so failures will happen. That means that your SREs and developers are going to have to spend some time on call and ready to respond to when the application breaks down. On this sponsored episode of the podcast, we talk to Eric Maxwell, a solution architect at xMatters, about automating, intelligent escalation chains.