Lost Trust from Your Other Departments? How to Win Them Back

Incidents not only wear down your engineering teams, but they also erode trust in your entire engineering organization across your company. Data indicates that recurring issues and prolonged downtime can significantly impact a company's reputation and customer trust, leading to financial costs, hampered productivity, and a general loss of confidence in IT.¹ ²

Incidents Happen, It’s a Fact of Software Development

You can have the best processes in place, but sometimes, a backhoe cuts a fiber line. This happened to my team. We had a small data center from an acquisition we were transferring to the cloud, and one weekend, all hell broke loose as the fiber lines into the datacenter were severed from local construction. What defines us isn't if incidents happen, but how we respond.

How to Bolster Your Response to Keep Trust High

1. Communication: The Absolute Core of Incident Response

While technical prowess is crucial for fixing issues, communication is the single most important factor in effective incident response. It's the glue that binds the entire effort together. Without it, even the most brilliant engineering teams can struggle to maintain internal and external confidence. Poor communication amplifies chaos, leads to speculation, and can cause significant damage to your reputation and bottom line. Studies show that miscommunication in IT security can lead to cybersecurity incidents in over 60% of companies, and that 98% of non-IT respondents have experienced miscommunications regarding IT security, resulting in serious project delays and a diminished sense of cooperation.²

Engineers are usually great at swarming around incidents and working on them, but I've seen many teams stumble with proactive communication. If you don't have communication protocols that your teams adhere to, set them up now. Regular, transparent updates to the rest of your company throughout the incident are fundamental because, in the absence of information, people draw their own conclusions – and that usually means engineering doesn’t know what they are doing. Timely and accurate communication minimizes internal panic and helps manage external expectations.

Here's a prescriptive approach to effective incident communication:

1.1 Acknowledge Publicly and Quickly

The very first step is to acknowledge in a public way that your team is aware of an issue and is actively investigating it. Set up a dedicated channel in your company's messaging platform (e.g., Slack, Microsoft Teams) specifically for public incident communications. Your teams should aim to acknowledge within your service level agreement (SLA)—ideally sub-10 minutes. This immediate public acknowledgment sets expectations and builds initial trust.

1.2 Regular Updates Throughout the Incident Lifecycle

We think there are three critical phases to consider, and your teams should each an opportunity to provide crucial updates on the progress of the incident:

Acknowledged Phase: This is about confirming awareness as quickly as possible. As noted above, teams should acknowledge within your SLA (sub-10 minutes).
Assessing Phase: During this phase, teams are actively assessing if this is an actual incident and determining its full impact. It's important to give frequent updates so the broader company can start to formulate a response to customers. Aim for updates every 20 minutes, even if it's the same update ("Still investigating, no new information at this time"). This consistent presence significantly lowers stress and reassures stakeholders that, yes, engineering is still actively looking into it.
Fixing Phase: You've acknowledged publicly, verified the issue, and provided high-level impact details. Now it's time to fix it. During this critical phase, it's important to bring people along with updates every 30-60 minutes. These updates can detail progress, potential solutions being tested, or estimated times to resolution (if known). All this proactive, consistent communication builds invaluable trust. If you don't already have SLAs set up for your engineering teams for communication during incidents, now is the time to define those.

2. Assign an Incident Commander (IC)

This is a non-negotiable role for effective incident response. The Incident Commander is the single source of truth, providing centralized leadership and decision-making during high-stakes situations.² ³ ⁴ ⁵

Responsibilities of an IC include:

Overall Incident Management: The IC has leadership responsibility for the incident, overseeing the entire response process from detection to resolution. They are responsible for setting priorities and determining incident objectives and strategies.
Decision-Making & Delegation: Quickly assessing the situation, making critical decisions about what to do, which team members are needed, and what actions come next. They delegate tasks to the team, ensuring everyone knows their role, and escalating issues or bringing in additional resources as needed.
Maintaining Calm and Focus: Incidents are stressful. A key part of the IC's job is to keep teams calm, focused, and aligned, ensuring conversations are brief and productive.
Communication Hub: Acting as the primary point of contact and source of truth for all internal and external stakeholders, providing regular, concise updates. This role is crucial for executing the communication protocols mentioned above.
Post-Incident Follow-through: Guiding the post-mortem process, including documentation and recommendations for preventing future incidents.

3. Follow Through: Actions Speak Louder Than Words

Once the incident is over, it's time to follow through. Hold your Root Cause Analysis (RCA) meetings promptly. Publish your findings publicly (internally, at least), explaining what happened, why, and what steps you're taking to prevent recurrence. Most importantly, ensure you close all action items created. This demonstrates accountability and a commitment to continuous improvement, which is vital for rebuilding trust.

4. Talk to Your Customer Success (CS) Team

If you have a CS team, reach out to them. They are on the front lines, bearing the brunt of customer frustration during outages. Listening to their experiences, understanding the customer impact, and validating their efforts goes a long way. They can provide invaluable insights into how incidents affect customers and how your communication efforts are perceived. This collaboration not only improves incident management but also strengthens inter-departmental relationships.

5. Invest in the Right Tooling to Help

Managing through outages is hard work; don’t go at it alone. Modern incident management tools are designed to streamline the entire process. This is where the right technology truly shines. Phoenix Incidents main value proposition is its AI-powered communication automation platform for incidents. It goes beyond basic alerts, providing intelligent, real-time communication that keeps everyone in the loop without burdening your engineers with manual updates. Phoenix automates global awareness and communication during incidents, from crafting real-time chat responses to guiding RCA processes and facilitating Jira-native follow-ups. It integrates seamlessly with your existing tools, providing AI-powered automation for resolution steps, consistent communication across channels, and advanced reporting. By centralizing information, automating status updates, and facilitating post-incident analysis, Phoenix Incidents allows your team to focus on fixing the problem, not managing the chaos of communication, ultimately building stronger internal trust and external confidence.

Closing Thoughts

By focusing on these areas, you can transform incidents from trust-eroding events into opportunities to demonstrate resilience, competence, and a strong commitment to reliability. It's not just about getting systems back online; it's about rebuilding and strengthening the foundational trust within your organization.