Why we (mostly) don't use LLM frameworks
LangChain, LlamaIndex, Haystack — we have used them all and we ship most of our agents without them. Here is when frameworks help, when they hurt, and why we usually choose the harder-looking option.
- LLM
- Engineering
- Opinion
We have used the major LLM frameworks. We do not use them by default anymore. Here is the working theory.
What frameworks promise
The pitch is identical across LangChain, LlamaIndex, Haystack and friends: we will abstract over the boring parts so you can focus on the interesting ones. Tool use, retrieval, agents, eval — it is all there in a few imports.
What frameworks deliver
Sometimes that. Often, the opposite. The boring parts are not abstracted away — they are abstracted over. The decisions get made by the framework, not by you. Three failure modes we keep hitting:
- The abstraction does not match the problem. Your agent needs a kind of memory the framework does not model. You end up monkey-patching the internals, which is harder than writing it from scratch.
- The abstraction hides cost. A single call inside the framework turns into seven LLM calls, two embeddings calls, and a vector lookup. Until you instrument the thing, you do not know.
- The abstraction hides errors. Failures get caught and re-thrown inside the framework. By the time you see them, the stack trace tells you nothing.
When we do reach for them
- Vector store clients. LlamaIndex's connectors are genuinely useful.
- Document loaders. PDF-to-text in particular is a swamp; let the framework handle it.
- Prototype phase. First-week prototypes get built fast with whatever is fastest. If the project ships, we usually rewrite that layer.
What we do instead
For most production agents, the actual code is small:
- An LLM client (the official SDK, not a wrapper).
- A small set of typed tools, each a plain async function.
- A loop that calls the model, handles tool calls, accumulates context.
- A retrieval function that calls the vector DB directly.
- An eval harness that runs the whole thing on a fixture set.
The whole stack is usually under 1,500 lines of TypeScript. It is faster to debug than any framework version, and it does not break when the next provider releases a new feature.
The bigger argument
Frameworks make sense when the underlying surface is stable. LLM APIs are not. The capabilities, the cost shapes, the tool-use semantics — all moving. Every six months something fundamental changes, and frameworks either chase it (badly) or abstract over it (worse). The thinnest possible layer over the official SDKs has been a winning bet for two years running.
The frameworks will eventually win, when the surface stabilizes. We do not think it has stabilized yet.
When this take is wrong
It is wrong for teams that do not have engineering capacity to maintain their own agent runtime. For those teams, a framework is correct — the cost of the abstraction is lower than the cost of rolling your own.
The right test is honest: can our team debug a malfunctioning agent at 2 a.m.? If yes, no framework. If no, a framework is buying you something.
Keep reading.
All postsThe four-week MVP, defended
People keep asking why we ship in four weeks instead of six months. It's not because we're faster. It's because four weeks is what's actually required to learn anything.
The eval set is the spec
For LLM-shaped products, the document that defines what the product does is the eval set, not the PRD. We have stopped writing PRDs for AI features and started writing the cases. The work is better for it.