Mastodawn

# How to Use the Low-Cost Airline Model to Handle Urgent Production Issues in Entertainment SaaS (1/40)

An entertainment SaaS multinational running DSDM with multiple teams of fifty-plus people has a problem with urgent production issues. The company provides a streaming platform for independent film distributors. The platform handles content ingestion, transcoding, digital rights management, content delivery, subscriber management, and analytics. The company has been around for eleven years and has eight hundred employees (2/40)

Show thread

agile 1d ago

. The product development organization has sixty-four people across eight feature teams of seven to eight people each. (3/40)

Show thread

agile 1d ago

The urgent production issues are handled poorly. The handling is slow, chaotic, and inconsistent. When a production issue occurs, the on-call engineer receives a PagerDuty alert, investigates, identifies the affected component, and contacts the team that owns it. That team is not always available. They might be in a different time zone, in a sprint planning session, or offline. The on-call engineer waits. The waiting causes delays, and the delays cause customer impact. (4/40)

Show thread

agile 1d ago

Last month, a transcoding failure occurred at 2:00 AM Eastern time. The failure affected all new content uploads. Independent film distributors could not upload new films. The on-call engineer received the alert at 2:05 AM, identified the transcoding service as the affected component, and contacted Team Three in London, where it was 7:05 AM. Team Three was in sprint planning and did not respond for forty-five minutes. The issue was resolved at 3:30 AM Eastern time (5/40)

Show thread

agile 1d ago

. Total time from alert to resolution was eighty-five minutes, with forty-five minutes caused by the team handoff. During that window, fourteen film distributors attempted uploads. All fourteen failed. Three escalated to account management. Two threatened to switch to a competitor.

The urgent production issues must be handled faster. (6/40)

Show thread

agile 1d ago

Herb Kelleher built Southwest Airlines on the low-cost airline model. The model was simple. Kelleher realized the biggest cost in airlines was turnaround time, the time an airplane spends on the ground between flights. The longer the plane sits, the more money the airline loses. The plane only makes money when it is flying.

Kelleher attacked the turnaround time. The industry standard was sixty minutes. He reduced it to fifteen. He did it through three principles. (7/40)

Show thread

agile 1d ago

First, every employee does every job. Pilots helped load baggage. Flight attendants cleaned the cabin. Gate agents refueled the plane. Job boundaries were eliminated, and with them went the handoff delays.

Second, every decision is made at the lowest level. A gate agent does not need approval from a manager. The decision is made fast, and the waiting disappears. (8/40)

Show thread

agile 1d ago

Third, every process is standardized. Boarding is the same at every gate. Cleaning is the same on every plane. Refueling is the same at every airport. Variation is eliminated.

These three principles made Southwest profitable. Kelleher applied the same thinking to crisis management. When a flight was delayed, the gate agent did not wait for instructions from headquarters. The decision was made at the lowest level, fast, minimizing the delay and the customer impact. (9/40)

Show thread

agile 1d ago

For an entertainment SaaS multinational, the urgent production issue problem is the same. The issues are handled slowly because of handoff delays caused by job boundaries. Kelleher's model says: eliminate job boundaries, make decisions at the lowest level, and standardize the process. Handoff delays disappear. Waiting disappears. Variation disappears. Resolution time drops, and customer impact drops with it.

## The Core Principle (10/40)

Show thread

agile 1d ago

Kelleher's low-cost airline model was built on a simple insight. The best way to handle urgent issues is to eliminate handoff delays by removing job boundaries, making decisions at the lowest level, and standardizing the process. He eliminated the handoff between the ground crew and the cleaning crew by having every employee do every job. Turnaround time went from sixty minutes to fifteen. (11/40)

Show thread

agile 1d ago

For an entertainment SaaS multinational, the urgent production issue problem is the same. Issues are handled slowly because of handoff delays between the on-call engineer and the component team. Kelleher's model says to eliminate the handoff. Let the on-call engineer fix the issue directly. The delay disappears, resolution time drops, and customer impact drops.

## Four Steps to Apply the Low-Cost Airline Model (12/40)

Show thread

agile 1d ago

1. Map the Current Incident Response Process and Identify Every Handoff Point

Kelleher mapped the aircraft turnaround process in 1971. The mapping identified six handoff points between the ground crew, the cleaning crew, the refueling crew, the catering crew, the boarding gate, the flight crew, and air traffic control. Those six handoff points were the source of delay. Mapping them created a target: eliminate them. (13/40)

Show thread

agile 1d ago

Your team should map the current incident response process and identify every handoff point with the same discipline. For an entertainment SaaS multinational, the mapping might look like this. The engineering manager leads a two-hour session with all eight team leads (14/40)

Show thread

agile 1d ago

. The session maps the current seven-step process: PagerDuty alert fires, on-call engineer acknowledges, investigates, identifies the affected component, contacts the component team, component team responds, component team resolves the issue. (15/40)

Show thread

agile 1d ago

The mapping identifies three handoff points. Handoff one is the communication handoff when the on-call engineer contacts the component team and must explain the issue. That takes time. Handoff two is the availability handoff when the component team must respond. If they are in a meeting, the response is delayed. Handoff three is the knowledge handoff when the component team must understand the issue. If the explanation was unclear, follow-up questions add more delay. (16/40)

Show thread

agile 1d ago

Three handoff points identified. Three targets for elimination.

For a DSDM team of fifty-plus, the mapping should happen in one session of no more than two hours and identify at least three handoff points. For DSDM, this should be part of the feasibility study.

2. Cross-Train Every On-Call Engineer on Every Component So They Can Fix Issues Directly (17/40)

Show thread

agile 1d ago

Kelleher cross-trained every Southwest employee on every job. Pilots loaded baggage. Flight attendants cleaned the cabin. Gate agents refueled the plane. Cross-training eliminated job boundaries, and job boundaries were the biggest source of handoff delays.

Your team should cross-train every on-call engineer on every component so they can fix issues directly. For an entertainment SaaS multinational, the cross-training program has four phases. (18/40)

Show thread

agile 1d ago

Phase one is a component inventory. The engineering manager lists all eight components: content ingestion, transcoding, digital rights management, content delivery, subscriber management, analytics, payment processing, and search and recommendation. (19/40)

Show thread

agile 1d ago

Phase two is knowledge transfer. Each team creates a runbook for their component covering architecture overview, common failure modes, diagnostic steps, fix procedures, and escalation criteria. The runbooks are stored in a shared repository accessible to all on-call engineers.

Phase three is hands-on training. Every on-call engineer completes a two-hour training session for each component, covering the runbook and including a simulated incident they must diagnose and fix. (20/40)

Show thread

agile 1d ago

Phase four is certification. Every on-call engineer passes a certification test for each component by diagnosing and fixing a simulated incident within thirty minutes. Certification is valid for six months and must be renewed.

With cross-training complete, every on-call engineer can handle every component. Job boundaries are gone, and handoff delays are gone with them. (21/40)

Show thread

agile 1d ago

Consider the transcoding failure from last month. In the old process, the on-call engineer would have contacted Team Three, waited three minutes for contact and thirty minutes for a response. That is thirty-three minutes of handoff delay. In the new process, the on-call engineer opens the transcoding runbook, checks the message queue, finds it full, and restarts the queue worker. The restart takes two minutes. The issue is resolved at 2:15 AM (22/40)