← back

Holos

Nov-2025

Github Repo Link: https://github.com/vaedlabs/holos

Holos is an AI fitness application built around a coordinator-agent architecture. Three domain-specialized agents handle physical fitness, nutrition, and mental health. A fourth coordinates between them. All four share the same medical context about the user.

This was an AI architecture prototype, not a healthcare product. Privacy, compliance, and ship-quality concerns were consciously deferred in favour of validating the multi-agent design.

Why the domains are split the way they are

Physical fitness and mental fitness are not the same problem. Physical fitness is a coaching problem - workouts, progressive overload, recovery. Mental fitness is closer to a therapist problem - consistency, showing up, finding the energy for tasks, managing stress. Collapsing them into one agent means one always gets watered down.

Nutrition gets Gemini over OpenAI for two reasons: cost, and vision. For a feature where you photograph a meal and get macros back, Google's image training data gives it an edge - particularly on non-Western food, which was a deliberate early test. The accuracy held up well there. Where it didn't was on prompt sensitivity: smaller models were less robust to prompt variation, producing inconsistent outputs on the same input. The fix was moving to a larger model. The real finding: vision task accuracy scales with model size faster than text tasks do, and prompt engineering compensates less than you'd expect.

Routing

The coordinator uses a prompt-based classifier. One explicit constraint built into the routing prompt - don't default to physical fitness. Without it, everything becomes a workout recommendation.

A concrete example: the query "I keep stress-eating at night" routes to mental fitness, not nutrition, because intent is about behaviour and mood rather than meal planning. A query like "build me a plan to lose 5kg before a trek" triggers a holistic plan - the coordinator calls nutrition and physical fitness in parallel, then synthesises their responses into a single output.

Medical context

On signup, the user fills out a medical history and preferences questionnaire. That data gets pulled into agent context at query time - before responding, the agent cross-checks its suggestion against what the user can actually eat, perform, or do.

The current implementation is: stored, retrieved, injected. What it needs eventually is a summary agent - likely running as a background job on medical history updates, not on every query - that maintains a compressed representation of context and refreshes it when history changes. The data model anticipates this. PostgreSQL over SQLite specifically so pgvector or a similar extension can be added later without rewriting the stack, and because fewer platforms means fewer privacy surfaces.

The latency problem

The coordinator triggering a holistic plan is the most expensive operation in the system. Sequential agent calls ran around 37s. Parallelising brought that down to roughly 10s. The implementation uses async parallel querying with millisecond-jittered request offsets - staggered dispatches to avoid thundering herd on a shared OpenAI key across the coordinator, physical fitness, and mental health agents. Nutrition runs on a separate Gemini key. Rate limit headroom across both keys was factored into the dispatch timing.

The codebase also has explicit retry logic, circuit breaker patterns, model fallbacks, and agent execution tracing - scaffolding that turned out to be necessary before the latency work was stable enough to rely on.

What's unfinished and why

Rate limiting middleware exists but isn't mounted. It was working - but repeated testing kept tripping the limits mid-build, so it got unmounted with the intent to re-enable before any real deployment. The app stayed local.

The privacy surface is an acknowledged debt. Medical data in PostgreSQL with JWT auth is functional, not hardened. DPDP compliance, proper encryption, and a more serious auth layer would all need to come before this goes anywhere near production. These were conscious deferrals on a project scoped to validate AI architecture - not to ship a healthcare product.

What it gave me

The architecture decisions were largely settled before the first line was written - prior work in the same domain had already resolved most of the hard questions. The build was mostly execution and verification.

What it sharpened was agent management at a real, user-facing scale: four agents, shared state, a coordinator, latency constraints, retry scaffolding, and a product sitting on top of all of it. That understanding fed directly into Rostrom - a self-hosted multi-agent orchestration system built afterward - where the same problems recur with more agents, longer context windows, harder memory constraints, and no tolerance for dropping state.