Chaos Monkey for Managers: Building Resilient Teams

Who is hoarding all the knowledge?

By Ibrahim Diallo

Published Jun 2 2025 ~ 5 minutes read

Imagine a mischievous agent of disruption. Not somewhere hidden in a server room where no regular human can notice. No. Imagine it right here in your office building. Forget about unplugged servers for a moment. Our "chaos monkey" is a little more… creative. This monkey lurks around meeting rooms and in a blink of an eye, you see a chair swiveling, empty. The monkey snatched your coworker away.

I’m feeling a bit dramatic today, but I want to highlight the fragility that can exist in unexpected places within an organization.

For engineers, the concept of a "Chaos Monkey" is well-established. It's a tool, often automated, that randomly terminates instances in a production environment. The goal? To proactively identify and eliminate single points of failure, forcing teams to build resilient, self-healing systems. If a server goes down unexpectedly, the system should seamlessly recover, with no discernible impact on the end-user.

But what if this "monkey" isn't targeting our technology, but our people? What happens when a critical individual – a key decision-maker, a vital source of knowledge, or even a seemingly indispensable team member – suddenly becomes unavailable?

In a past company, the CTO, a brilliant technical mind, had single-handedly devised the intricate algorithm for calculating crucial financial transactions. Cash-backs and cash-forwards. He then proceeded to directly code this logic into the core application. The problem? This vital piece of business intelligence lived solely within the lines of ruby on rails code he wrote, buried deep within a sprawling monolith repository.

There was no documentation, no shared understanding, no discussion outlining the rules. It wasn't until he left the company that the impact of this missing knowledge became apparent. It was a trial by fire. We had to decipher his code, to understand the fundamental rules governing a significant aspect of the business. In retrospect, perhaps he was our organizational chaos monkey, inadvertently exposing a critical weakness. The concentration of crucial information within a single individual. You can imagine how documentation and shared knowledge became a priority shortly after.

In contrast, my approach in my current role has been to build in redundancy from the get go. My philosophy is simple: when I take time off, the team should not just survive, but thrive. I actively avoid becoming a bottleneck. This means no hoarding of information, no secret strategies locked away. I believe in giving away the "secret sauce." When I resolve an issue, I call my team and we discuss how and why it was done. I not only share all the information I possess, but I also delegate significant responsibilities and grant autonomy. By equipping my team with the necessary context, they are empowered to make informed decisions, even in my absence.

This philosophy stands in direct opposition to the often-cited fear of job insecurity through knowledge sharing. The idea that holding onto vital information makes you indispensable is a fallacy. As I’ve written about before, this creates a fragile system, vulnerable to disruption. I worked with a .NET developer in a previous company, the only one in the company. Upon quitting, he would occasionally ask if the chaos had started. The truth was, his departure had minimal impact. His responsibilities were quickly absorbed, and projects continued without a hitch. The system, while not intentionally designed for his absence, proved more resilient than he had realized.

So, how do I introduce a bit of "chaos" into my own team to test our resilience? Ironically, life often throws curveballs my way unintentionally, like when I have to suddenly drop everything to care for my children. These unplanned absences force the team to adapt and problem-solve independently. Beyond these unplanned events, I also intentionally grant last-minute PTO requests whenever feasible. While seemingly a simple act of flexibility, it serves a crucial purpose: it immediately reveals any information silos or single points of dependency. If a team member needs to take unexpected time off, can their responsibilities be covered? Does critical knowledge reside solely in their head? These unplanned "chaos events" provide invaluable insights into our team's true resilience and highlight areas where we need to improve knowledge sharing and cross-training.

Building a resilient team, much like building a resilient system, requires a proactive approach. It means embracing the idea that unexpected disruptions are inevitable and designing our teams and processes to withstand them. It's about moving beyond the fear of redundancy and recognizing that true strength lies in a team where knowledge is distributed, individuals are empowered, and an unexpected absence doesn't bring everything to a standstill.