About the VOID
Public-facing write ups of software incidents are scattered across the internet, in scrolling status pages without the ability to link directly to them, or sequestered in corners of company websites, often in (intentionally or unintentionally) obfuscated ways. They’re hard to find, and virtually impossible to compare and contrast or study in a structured way. The Verica Open Incident Database (VOID) makes public software-related incident reports available to everyone, raising awareness and increasing understanding of software-based failures in order to make the internet a more resilient and safe place.
Collecting these reports matters because software has long moved on from hosting pictures of cats online to running transportation, infrastructure, power grids, healthcare software and devices, voting systems, autonomous vehicles, and many critical (often safety-critical) societal functions.
Software outages and incidents aren't going to magically stop. We can't make our systems "flawless" or systematically map out all the potential faults—what we are building now far outmatches what any single person, team, or organization can mentally model regarding how the system is built, much less how it functions in varying, high-pressure, or otherwise unexpected conditions. On top of that, customer/user and organizational demands are only accelerating the pace and complexity of software development.
Collecting and sharing this information can keep us ahead of the potentially catastrophic consequences of a software-driven world. What wasn’t safety-critical a few years ago may now be—Slack and Microsoft Teams are being used for communication and coordination in emergencies, and Facebook, Twitter, and other social media are now squarely in that category. Much the same way airline companies set aside competitive concerns in the late 90s and beyond in order to improve the safety of their industry, the tech industry has an immense body of commoditized knowledge that we could be sharing in order to learn from each other and push software safety forward.
The good news is that this is already underway: the Learning From Incidents community (started by Nora Jones) provides a community of technologists, researchers, and related practitioners to share their expertise and experience related to failures and incidents related to software, reshaping how the software industry thinks about incidents, software reliability, and the critical role people play in keeping their systems running. The VOID is a direct result of having participated in this community, and owes a great deal of thanks to everyone who helped shape and inform its creation and mission.
Learning From Incidents
Anyone who has ever been involved in an incident that was written up and published knows that what’s in a public incident report isn’t the whole story. The notion of a “whole story” isn’t even possible—we can’t have perfect knowledge of a past event, and other factors like time or organizational pressures and priorities mean that at some point, an investigation has to be concluded and written up. More often than not, a public incident report exists more to assuage customer or shareholder concerns than to convey concrete details of what happened and what the team or organization learned.
Most incident reports—notably external, or public-facing ones—focus on action items and ensuring that “this won’t happen again.” But what about the next unanticipated event? That’s typically what defines this type of software failure, it is unexpected and surprising. The past event’s action items may or may not prevent the next one. (Even if they do, teams likely won’t notice events that don’t happen.) What can help is for the team(s) that operate those systems to further evolve their understanding of how those systems work—most critically, what their safety boundaries are—and share what they've learned with others.
Help Us Fill The VOID
We can't do this alone. Our goals are to:
- Collect as many public-facing reports as possible in the VOID so it becomes a reliably comprehensive resource and a unique source for practitioners and researchers to ask more and better questions about incidents.
- Encourage companies that aren’t yet doing so to write detailed incident reports (and submit them).
- Raise the quality and detail of these reports industry-wide.
The VOID includes everything from tweets to status page updates, conference talks, media articles, and lengthy company post-mortems. Nearly all the information comes directly verbatim from the report artifacts themselves, along with a set of metadata that we’ve collected based on the report contents. Yet we know that what’s currently in the VOID isn’t even close to the total amount of public incident reports out there. We’d love your help in making it more comprehensive. You can submit any that aren’t included in the VOID with this short form.
Become a Member or Partner
Drop us a line at firstname.lastname@example.org.
Subscribe to the Newsletter
The newsletter will look at patterns, stories from people involved in incidents, and other details of interest to people who care about software-based incidents.
Send Us Feedback and Ideas
Something amiss with a report in the VOID, or a bug with the VOID itself? Have an idea for a great feature or something you’d like to see that isn’t there? Get in touch, we’d love to hear from you. You can also follow the VOID on Twitter.