The Monorepo Lesson I Needed: Why Your API Should Be Boring
Admin User
Author
Last month, I inherited a "decentralized" platform that was anything but. The landing page promised peer-to-peer compute. The backend was two Django views, a Redis instance that nobody monitored, and approximately 400 lines of undocumented business logic scattered across three different apps. When someone asked me why payouts were wrong last month, I literally couldn't trace the calculation. I spent a week reading Python files and git blame instead of fixing the actual bug.
Then I read about Sarva — a three-tier monorepo for GPU compute that's small enough that one person can understand it end-to-end. What struck me wasn't the architecture itself. It was the philosophy: sometimes the hardest part of building a distributed system isn't the distributed part. It's keeping your boring infrastructure so boring that you can actually reason about it when it breaks at 2 AM.
Why Most Monorepos Become Disasters
I've seen this pattern before. A team starts with a monorepo because it seems efficient. One repo, one CI pipeline, shared utilities. Six months later, you've got tangled dependencies, circular imports, and nobody knows where the authorization logic actually lives. It's distributed physics — the more you add, the more forces pulling in different directions.
Sarva sidesteps this by accepting harsh constraints upfront. The API is purely a hub — no ML knowledge, no presentation logic. The node agent is just a polling client that claims work and reports results. The web dashboard reads and writes through the API only, full stop. No shortcuts. No "just this once" business logic in Next.js.
These constraints aren't accidents. They're the price of maintainability.
The Architecture That Actually Works
The beauty of what's described here is its deliberate simplicity. The FastAPI hub is 340 lines. You can read it in 45 minutes. Job dispatch is first-come, first-served — no fancy scheduling, no GPU affinity matching. The pricing is three dictionaries at the top of the file.
GPU_MULT = {
"rtx-4090": 3.0,
"rtx-4070": 2.5,
"cpu": 0.8
}
GEO_RATE = {
"in": 0.7,
"us": 1.0,
"eu": 0.95
}
PLATFORM_FEE = 0.20
I love this because the first person who needs to debug pricing can find it in 30 seconds. More importantly, the geographic multiplier isn't hiding a tax — it's surfacing economic arbitrage honestly. An Indian contributor gets paid fairly for their compute at their cost of living. That's not a system hack, that's a feature.
The node agent is even more brutal: 12 dependencies, fallback GPU detection that gracefully degrades from GPUtil to nvidia-smi to CPU-only. It works on Windows, Linux, macOS, a ThinkPad from 2015, or a brand new 4090 rig. No threads, no async, not clever. A single polling loop that blocks on compute until v0.2. The author acknowledges the limitation explicitly and moves on.
That's what I'm talking about.
My Take: The Separation That Matters
Where most developers get this wrong is trying to be clever too early. We want to abstract everything, pre-optimize for scale, build the perfect separation of concerns. Then we end up with business logic scattered everywhere because we were too abstract to put it anywhere.
Sarva's approach is refreshingly inverted: be brutally concrete about where decisions live. Credit calculations live in the API, period. The web never does math. The node never knows about jobs — it just fetches scripts and executes them. This means debugging a payout issue is a database query plus API logs. No cross-tier investigation.
But here's where I'd push back: naive scheduling will bite you faster than you think. "First-come first-served by priority" works until someone submits 500 penny-jar jobs and starves the high-value work. The author knows this — v0.2 plans a smarter scheduler. I just think this needs to be in the first month, not the first year, because humans will absolutely game a system this simple.
Also, I'm skeptical about the shared monorepo long-term. Three tiers in one repo only works if your deployment story keeps pace. The moment you need to deploy API changes independently of node changes, that monorepo becomes friction.
The Question That Stays With Me
Why is the API the loneliest part? Because it's the source of truth. The node and web are just clients. They can be rewritten, swapped out, rebuilt. But if your API breaks, everything breaks. The author mitigates this by keeping it so simple that breakage becomes unlikely. Honest approach.
What would you do differently?
Source: This post was inspired by "I built a three-tier monorepo for a peer-to-peer GPU grid — here is why the API is the loneliest part" by Dev.to. Read the original article