What are some design patterns in machine learning systems?

Here are a few I've seen:

1. Cascade: Break a complex problem into simpler problems. Each subsequent model focuses on more difficult or specific problems.

Stack Exchange has a cascade of defenses against spam: https://stackoverflow.blog/2020/06/25/how-does-spam-protection-work-on-stack-exchange/

How does spam protection work on Stack Exchange?

If you put a textbox on the Internet, someone will put spam in it. If you put a textbox on a site that gets millions of hits a day, lots of someones will put lots of spam in it. So Stack Exchange uses multiple layers to block all the spam coming in.

Stack Overflow Blog

2. Reframing: Redefine the original problem, target, or input to make the problem easier to solve.

Sequential recsys reframed the paradigm from co-occurrence (matrix factorization) to predict-the-next-event (e.g., transformers).

Alibaba's BST: https://arxiv.org/abs/1905.06874

Behavior Sequence Transformer for E-commerce Recommendation in Alibaba

Deep learning based methods have been widely used in industrial recommendation systems (RSs). Previous works adopt an Embedding&MLP paradigm: raw features are embedded into low-dimensional vectors, which are then fed on to MLP for final recommendations. However, most of these works just concatenate different features, ignoring the sequential nature of users' behaviors. In this paper, we propose to use the powerful Transformer model to capture the sequential signals underlying users' behavior sequences for recommendation in Alibaba. Experimental results demonstrate the superiority of the proposed model, which is then deployed online at Taobao and obtain significant improvements in online Click-Through-Rate (CTR) comparing to two baselines.

arXiv.org

3. Human-in-the-loop: Collecting labeled data from users, annotation services, or domain experts.

Stack Exchange lets users flag posts as spam, and LinkedIn lets users report messages as harassment: https://engineering.linkedin.com/blog/2020/fighting-harassment

Recently, LLMs are applied too: https://twitter.com/eugeneyan/status/1640530851489259522

The technology behind fighting harassment on LinkedIn

Co-authors: Grace Tang, Pavan K. Ganganahalli Marulappa, Montinee Khunvirojpanich, and Ting Chen

4. Data Augmentation: Synthetically increase the size and diversity of training data to improve model generalization and reduce overfitting.

DoorDash varied sentence order and randomly removed information such as menu category: https://doordash.engineering/2020/08/28/overcome-the-cold-start-problem-in-menu-item-tagging/

Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging - DoorDash Engineering Blog

Companies with large digital catalogs often have lots of free text data about their items, but very few actual labels, making it difficult to analyze the data and develop new features.  Building a system that can support machine learning (ML)-powered search and discovery features while simultaneously being interpretable enough for business users to develop curated ...

DoorDash Engineering Blog

5. Data flywheel: Positive feedback loop where more data improves ML models, which leads to more users and data.

Tesla collects data via cars, finds and labels errors, retrains models, and then deploys to their cars which gather more data.

https://twitter.com/karpathy/status/1599852921541128194

Andrej Karpathy on Twitter

“Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general”

Twitter

6. Business Rules: Adding logic or constraints based on domain knowledge and/or business requirements to augment or adjust the output of ML models

Twitter has various hand-tuned weights when predicting engagement probabilities: https://github.com/twitter/the-algorithm-ml/tree/main/projects/home/recap

the-algorithm-ml/projects/home/recap at main · twitter/the-algorithm-ml

Source code for Twitter's Recommendation Algorithm - the-algorithm-ml/projects/home/recap at main · twitter/the-algorithm-ml

GitHub

A few more that I'll cover in a write-up:
• Aggregate raw data once: To reduce compute cost
• Evaluate before deploy: For safety and reliability
• Hard mining: To better learn difficult instances

What other design patterns or industry examples are there? Please share!