What's AI strategy for those companies which do not have the capital to train their own foundation models?
The world is full of companies who know their customers, their domain and their social value well, but do not have the capital to utilize this position to create their own foundation models.
How are they to survive and even prosper in the AI revolution?
Well, they need to play the cards they got, not the cards they want. They need to position themselves as gardens of knowledge creation. They can use frontier models through APIs, or open-weights models internally, but they will need to tend their data assets to grow into knowledge and skills asset for AIs.
There are a few principles to follow here to succeed. First of all, approach LLM/VLM based automation like is typical, automate and scale up the work. This goes without saying really, everyone does this.
But while doing it, aggregate your data asset:
- Store all the inference calls you make in a sustainable and long-term fashion with ample metadata to later understand what the call was about.
- Build and refine knowledge-bases and RAG assets automatically.
- Systematically document silent knowledge, and make your company internal discussions and other processes saved and accessible to AIs. Slack conversations, emails, Confluence, stuff like that.
- Ingest external data which relates to your domain in a proper #DataHoarder style.
Build processes which refine all this data further into knowledge, for example by storing it into a RAG-enabled graph database.
Then you will need to build some level of refinement processes to at the very least do rejection sampling to your collected inference data. There are many techniques to utilize here in a synergistic, mutually supporting way, enough to write many books about.
You'll get datasets good for fine-tuning and separately for benchmarking. Benchmarking datasets you can already use to select the best available foundation models for your use cases. But you should also measure and prove the training data exports you produce.
You do this by fine-tuning smaller models with this data and note how much better they become in your use case. You don't have to train the best foundation models here, you just want to basically prove that your data asset is valuable and builds knowledge and skills in existing foundation models.
Now this data asset is valuable in the future where generalist AIs will try to serve your social purpose. Leverage it.
If the data has constraints such as personally identifiable information, or other limitations, even better. Then you take a position as a synthetic data generator in this domain, and generate synthetic data which doesn't contain the limited aspects, and produce valuable training and fine-tuning data through that indirection layer.
You will need to reimagine and direct your company to become a garden of knowledge creation in your domain, to carry your purpose.
What if your valuable data asset is copied and stolen? Don't worry about it too much. You're not building a static asset but a living process.
You are the closed feedback loop for AIs to improve in servicing your purpose. You can only be displaced from this position if someone else fulfills your purpose better, by building a better garden for knowledge and skills around which intelligent entities orbit and gather.




