Mastodawn

How to Use the Engineering Excellence Method to Handle External System Dependencies in Technology B2C (1/63)

A technology B2C family business running Lean with a small team of two to five people has an external system dependency problem. The company makes a mobile app for personal fitness tracking. The app handles workout logging, nutrition tracking, social sharing, and integration with wearable devices. The company has been around for four years. It has eleven employees. The product development organization has four people. The organization runs Lean. One small team. Four people. (2/63)

The external system dependencies are causing failures. The failures are frequent. The frequency creates instability. The instability creates user frustration. The user frustration creates churn. The churn is thirty one percent per quarter. The thirty one percent churn means that the company is losing users faster than it is gaining them. The losing of users means that revenue is declining. The declining revenue is a problem. The problem is caused by external system dependencies. (3/63)

The external system dependencies are three. Dependency one. The wearable device API. The wearable device API is from Fitbit. The Fitbit API changes without notice. The changing without notice causes integration failures. The integration failures cause data sync errors. The data sync errors cause user frustration. Dependency two. The payment processing API. The payment processing API is from Stripe. The Stripe API has rate limits (4/63)

. The rate limits cause transaction failures during peak hours. The peak hours are six AM to eight AM. The six AM to eight AM window is when forty three percent of users log workouts. The transaction failures cause subscription activation delays. The subscription activation delays cause user frustration. Dependency three. The social sharing API. The social sharing API is from Facebook. The Facebook API has versioning issues. The versioning issues cause sharing failures (5/63)

. The sharing failures cause user frustration.

The three external system dependencies are causing failures. The failures are costing the company thirty eight thousand dollars per quarter. The thirty eight thousand dollars is the cost of lost subscriptions, support credits, and emergency engineering. (6/63)

Soichiro Honda built Honda on the engineering excellence method. The model was simple. Honda realized that the biggest problem in engineering was the reliance on external components that you cannot control. The reliance created vulnerability. The vulnerability created failures. The failures killed products. Honda attacked the vulnerability. He created the engineering excellence method. (7/63)

The method was based on one principle. The principle was. Control what you can. Anticipate what you cannot. Controlling what you can was about building internal excellence. Anticipating what you cannot was about preparing for external failures. The combination of internal excellence and external anticipation created resilience. The resilience built Honda. (8/63)

When Honda faced an external dependency problem, he did not complain. He did not blame. He did not wait. He engineered. The engineering was a solution. The solution was internal. The internal solution reduced the reliance on the external dependency. The reduction of reliance created independence. The independence created stability. The stability built Honda. (9/63)

Honda applied the same thinking to supply chain management. When Honda's supply chain was disrupted, he did not panic. He anticipated. The anticipation was a plan. The plan was a backup. The backup was internal. The internal backup ensured continuity. The continuity built Honda. (10/63)

For a technology B2C family business, the external system dependency problem is the same. The external system dependencies are causing failures. The failures create instability. The instability creates churn. The churn costs thirty eight thousand dollars. Honda's engineering excellence method says: control what you can. Anticipate what you cannot. The internal excellence creates resilience. The resilience eliminates failures.

The Core Principle (11/63)

Honda's engineering excellence method was built on a simple insight. The best way to handle external system dependencies is to stop relying on external systems that you cannot control and start building internal excellence that anticipates and absorbs external failures. The team controls what they can by building robust internal systems. They anticipate what they cannot by preparing for external failures before they happen. (12/63)

Honda did not handle external dependencies at Honda by complaining about suppliers and blaming external partners and waiting for external fixes and hoping that the external systems would become more reliable. He handled them by controlling what he could and anticipating what he could not. The internal excellence created resilience. The resilience eliminated failures. (13/63)

For a technology B2C family business, the external system dependency problem is the same. The external system dependencies are causing failures. The failures create instability. The instability creates churn. The churn costs thirty eight thousand dollars. Honda's engineering excellence method says: control what you can. Anticipate what you cannot. The internal excellence creates resilience. The resilience eliminates failures. (14/63)

Four Steps to Apply the Engineering Excellence Method to Handling External System Dependencies

1. Map Every External System Dependency and Classify Each One by Controllability and Failure Impact (15/63)

Honda mapped every external dependency at Honda. The mapping was a classification. The classification was by controllability and failure impact. The controllability was a measure. The measure was how much control the team had over the dependency. The failure impact was a measure. The measure was how much damage the failure caused. The combination of controllability and failure impact created a priority. The priority determined the response. (16/63)

You should map every external system dependency and classify each one by controllability and failure impact with the same priority creating mapping. For a technology B2C family business, the dependency mapping might look like this. The lead engineer maps every external system dependency. The mapping is a document. The document is a matrix. The matrix has two axes. Axis one. Controllability. The controllability axis is from low to high. Low means the team has no control (17/63)

. High means the team has full control. Axis two. Failure impact. The failure impact axis is from low to high. Low means the failure causes minor disruption. High means the failure causes major disruption. (18/63)

The matrix has four quadrants. Quadrant one. Low controllability, high failure impact. This quadrant is the danger zone. The danger zone requires immediate action. Quadrant two. High controllability, high failure impact. This quadrant is the investment zone. The investment zone requires building internal solutions. Quadrant three. Low controllability, low failure impact. This quadrant is the monitoring zone. The monitoring zone requires watching. Quadrant four (19/63)

. High controllability, low failure impact. This quadrant is the maintenance zone. The maintenance zone requires routine care. (20/63)

The lead engineer maps the three dependencies. Dependency one. Fitbit API. The Fitbit API has low controllability. The low controllability is because Fitbit controls the API. The Fitbit API has high failure impact. The high failure impact is because data sync errors affect all users. The Fitbit API is in quadrant one. The quadrant one is the danger zone. The danger zone requires immediate action. Dependency two. Stripe API. The Stripe API has low controllability (21/63)

. The low controllability is because Stripe controls the API. The Stripe API has high failure impact. The high failure impact is because transaction failures affect revenue. The Stripe API is in quadrant one. The quadrant one is the danger zone. The danger zone requires immediate action. Dependency three. Facebook API. The Facebook API has low controllability. The low controllability is because Facebook controls the API. The Facebook API has medium failure impact (22/63)

. The medium failure impact is because sharing failures affect engagement but not core functionality. The Facebook API is in quadrant three. The quadrant three is the monitoring zone. The monitoring zone requires watching. (23/63)

The mapping is complete. The completion of the mapping creates clarity. The clarity reveals that two of the three dependencies are in the danger zone. The two danger zone dependencies require immediate action. The immediate action is anticipation. For a Lean team of two to five, the dependency mapping should be a matrix. The matrix should have two axes. The matrix should have four quadrants. The mapping should be done immediately (24/63)

. For Lean, the dependency mapping should be part of the team's value stream mapping. The mapping is a value stream activity.

2. Build an Internal Anticipation Layer That Detects External Dependency Failures Before They Reach Users (25/63)

Honda built an internal anticipation layer at Honda. The internal anticipation layer was a system. The system detected external failures. The detection of external failures before they reached users prevented user impact. The prevention of user impact created stability. The stability built Honda. (26/63)

You should build an internal anticipation layer that detects external dependency failures before they reach users with the same stability creating layer. For a technology B2C family business, the internal anticipation layer might look like this. The lead engineer builds an internal anticipation layer. The internal anticipation layer is a monitoring system. The monitoring system is a set of automated checks. The automated checks run every sixty seconds (27/63)

. The every sixty seconds checks test the three external dependencies. (28/63)

Check one. Fitbit API health check. The Fitbit API health check sends a test request. The test request is a data sync. The data sync is for a test user. The test user is a fake account. The fake account has fake data. The fake data is synced. The syncing of the fake data tests the API. If the sync fails, the check triggers an alert. The alert is sent to the team. The team is notified. The notification is immediate. The immediacy of the notification creates speed (29/63)

. The speed of the response prevents user impact. (30/63)

Check two. Stripe API health check. The Stripe API health check sends a test transaction. The test transaction is a one cent charge. The one cent charge is for a test card. The test card is a Stripe test card. The test card is charged. The charging of the test card tests the API. If the charge fails, the check triggers an alert. The alert is sent to the team. The team is notified. The notification is immediate. The immediacy of the notification creates speed (31/63)

. The speed of the response prevents user impact. (32/63)

Check three. Facebook API health check. The Facebook API health check sends a test share. The test share is a post. The post is for a test page. The test page is a Facebook test page. The test page is shared. The sharing of the test page tests the API. If the share fails, the check triggers an alert. The alert is sent to the team. The team is notified. The notification is immediate. The immediacy of the notification creates speed. The speed of the response prevents user impact. (33/63)

The monitoring system is built. The building of the monitoring system takes one week. The one week of building creates an anticipation layer. The anticipation layer detects failures. The detection of failures before they reach users prevents user impact. Last month, the monitoring system detected a Fitbit API failure. The detection was at three AM. The three AM detection was before the peak hours. The peak hours are six AM to eight AM (34/63)

. The before peak hours detection gave the team three hours. The three hours of time allowed the team to implement a workaround. The workaround was a manual sync option. The manual sync option allowed users to sync their data manually. The manual sync option prevented user impact. The prevention of user impact saved the company twelve thousand dollars. The twelve thousand dollars was the cost of the lost subscriptions that would have happened without the monitoring system. (35/63)

For a Lean team of two to five, the internal anticipation layer should be a monitoring system. The monitoring system should run automated checks. The automated checks should run at least every sixty seconds. The monitoring system should trigger alerts. For Lean, the internal anticipation layer should be part of the team's build measure learn cycle. The layer is a measure activity. (36/63)

3. Create Internal Fallback Systems for Every Danger Zone Dependency So the Product Keeps Working When the External System Fails

Honda created internal fallback systems at Honda. The internal fallback systems were backups. The backups ensured that the product kept working. The keeping working of the product when the external system failed created resilience. The resilience built Honda. (37/63)

You should create internal fallback systems for every danger zone dependency so the product keeps working when the external system fails with the same resilience creating fallback. For a technology B2C family business, the internal fallback systems might look like this. The lead engineer creates internal fallback systems. The internal fallback systems are for the two danger zone dependencies. The two danger zone dependencies are the Fitbit API and the Stripe API. (38/63)

Fallback one. Fitbit API fallback. The Fitbit API fallback is a local data store. The local data store is on the device. The device is the user phone. The user phone stores the workout data locally. The local storing of workout data ensures that the data is available even when the Fitbit API is down. The availability of data when the Fitbit API is down prevents data loss. The prevention of data loss creates continuity. The continuity of data creates user trust (39/63)

. The user trust reduces churn.

Fallback two. Stripe API fallback. The Stripe API fallback is a queue. The queue is a transaction queue. The transaction queue stores failed transactions. The storing of failed transactions ensures that no transaction is lost. The no loss of transactions creates reliability. The reliability of transactions creates revenue protection. The revenue protection reduces losses. (40/63)

The fallback systems are built. The building of the fallback systems takes two weeks. The two weeks of building creates resilience. The resilience ensures that the product keeps working. The keeping working of the product when the external system fails prevents user impact. Last month, the Stripe API had a rate limit failure. The rate limit failure happened at seven AM. The seven AM failure was during peak hours. The peak hours are six AM to eight AM (41/63)

. The during peak hours failure affected one hundred and forty seven transactions. The one hundred and forty seven transactions were queued. The queuing of the one hundred and forty seven transactions ensured that no transaction was lost. The no loss of transactions created reliability. The reliability of transactions prevented revenue loss. The prevention of revenue loss saved the company eighteen thousand dollars (42/63)

. The eighteen thousand dollars was the cost of the lost subscriptions that would have happened without the fallback system. (43/63)

For a Lean team of two to five, the internal fallback systems should be built for every danger zone dependency. The fallback systems should ensure that the product keeps working. The fallback systems should be built within two weeks. For Lean, the internal fallback systems should be part of the team's build measure learn cycle. The fallback systems are a build activity. (44/63)