The workflow had been running for eleven days
when the errors started appearing.

Not every execution. Roughly one in seven.
The same node, the same failure, no obvious
pattern in the data.

The trigger was firing correctly. Enrichment
returning results. HubSpot write succeeding
most of the time. But on some runs, Clay API
was returning a 429, too many requests, and
the node was failing silently.

The records that failed were not being flagged.
They were being dropped.

Tests ran at two or three records. In production:
forty records every fifteen minutes at peak.

API rate limit: two hundred calls per minute.
At peak: two hundred and forty. Nobody had
thought to check the limit before scaling.

What I now build into every automation: a rate
limit check before production.

What is the limit? Volume at peak? Volume at
3× peak, because that will happen? Does the
workflow need a delay node? A queue with
a throttle? These questions take twenty minutes.

They take much longer after the workflow has
been dropping records in production for
eleven days.

The test environment almost never surfaces rate
limits. Test data is small. Runs are spaced out.
The limit only shows when the workflow meets
real volume.

Plan for the limit before the volume arrives.