Architecture¶

VANESSA is designed as a multi-container system with clear boundaries.

System Diagram¶

VANESSA container architecture

The diagram is generated from:

infra/docker-compose.yml (service inventory and dependencies)
infra/architecture/metadata.yml (labels, groups, communication semantics)

To regenerate artifacts:

python scripts/generate_architecture.py --write

Legend:

Solid blue edges: HTTP calls
Purple edges: SQL/data access
Dashed orange edges: event/webhook flow
Dashed gray edges: internal runtime/dependency links

Container Boundaries¶

Frontend: browser UI, HTTP calls only to backend API.
Backend (Flask API): public API entrypoint, validation, orchestration.
LLM API: private model-serving HTTP gateway for inference/discovery requests.
LLM Runtime Inference: hardware-adaptive local vLLM runtime engine backing text generation on CPU or GPU hosts.
LLM Runtime Embeddings: hardware-adaptive local vLLM runtime engine backing embeddings on CPU or GPU hosts.
llama.cpp: optional OpenAI-compatible local inference runtime used as an alternate llm_inference provider.
Agent Engine: multi-step agent logic and tool workflows.
Sandbox: isolated Python code execution environment and native runtime provider for Python execution tools.
MCP Gateway: normalized HTTP provider for MCP-backed tools such as web search.
SearXNG: local token-free metasearch backend used only by MCP Gateway for web search.
KWS: offline wake-word detection and wake-event emission.
Weaviate: persistent semantic index for RAG context retrieval.
Qdrant: optional vector database for alternate retrieval provider binding.
PostgreSQL: persistent relational data for auth and metadata.

Interaction semantics in the generated graph represent directional runtime communication paths (who calls whom), not Docker Compose startup dependencies:

Frontend -> Backend API
Backend API -> Agent Engine, LLM API, optional llama.cpp, Sandbox, MCP Gateway, Weaviate, optional Qdrant, PostgreSQL
Agent Engine -> LLM API, Sandbox, MCP Gateway, Weaviate, optional Qdrant, PostgreSQL
MCP Gateway -> SearXNG
LLM API -> LLM Runtime Inference, LLM Runtime Embeddings
KWS -> Backend API

GenAI Control Plane Terms¶

The runtime architecture now distinguishes container topology from capability binding:

capability: a platform function such as llm_inference, embeddings, vector_store, mcp_runtime, or sandbox_execution
provider: an implementation family for a capability such as vllm_local, llama_cpp_local, openai_compatible_cloud_embeddings, weaviate_local, qdrant_local, mcp_gateway_local, or sandbox_local
provider_origin: a backend-owned family classification, either local or cloud, inherited by provider instances and serialized into provider, deployment, and runtime payloads
deployment profile: the named set of active capability-to-provider bindings, plus any binding-level resource selection required by that capability
adapter: the capability-specific backend client that talks to a provider
resource: the deployment-bound capability resource chosen by a binding, including managed models and provider-native resources such as vector indexes

This control plane lives in backend + postgres. It complements the container topology rather than replacing it.

ModelOps Domain¶

ModelOps is the managed-model domain layered on top of the GenAI control plane.

It owns model catalog records, lifecycle, validation, sharing, and usage.
It does not replace capability/provider/deployment selection in /control/platform.
A model must be active, validation-current, visible to the caller, and runtime-compatible before it is invokable or eligible for deployment binding as a managed-model resource.

See ModelOps service documentation for the domain model, lifecycle rules, and canonical APIs.

Context Management Domain¶

Context Management is the managed knowledge-base domain layered beside the GenAI control plane and ModelOps.

It owns reusable knowledge-base metadata, document source-of-truth, upload/manual document ingestion, and vector synchronization.
It also owns knowledge-base sync diagnostics, operator-triggered rebuilds, and retrieval QA against the active deployment runtime.
It now also owns repeatable local_directory knowledge sources and persisted worker-backed sync runs so operators can reconcile KB content from an allowlisted local source root instead of hand-managing every document. Sync runs carry queued/running/ready/error state and progress counters for the control UI.
PostgreSQL stores knowledge-base and document records; Weaviate remains the derived serving index for the current v1 implementation.
Deployment profiles still decide which vector_store provider is active and which managed knowledge bases are explicitly bound through binding resources plus default_resource_id.
Knowledge Chat now resolves only against knowledge bases bound to the active deployment profile rather than a fixed global retrieval index.

Current provider proof state:

local-default keeps llm_inference -> vllm_local, embeddings -> vllm_embeddings_local, and vector_store -> weaviate_local.
When LLAMA_CPP_URL is configured, backend also seeds local-llama-cpp with llm_inference -> llama_cpp_local, embeddings -> vllm_embeddings_local, and vector_store -> weaviate_local.
When QDRANT_URL is configured, backend also seeds local-qdrant with llm_inference -> vllm_local, embeddings -> vllm_embeddings_local, and vector_store -> qdrant_local.
local-default also binds sandbox_execution -> sandbox_local and mcp_runtime -> mcp_gateway_local.
Shared cloud provider families are also available for OpenAI-compatible LLM and embeddings endpoints; OpenAI-compatible cloud provider instances hold endpoint/auth config, including optional modelops://credential/<credential-id> refs to saved ModelOps credentials, while deployment bindings choose explicit managed-model resources.
Offline runtime profile enforcement uses persisted provider_origin, not provider-key naming. Cloud providers can be created and listed while offline, but validation, deployment activation, runtime snapshot resolution, and provider dispatch fail closed with offline_provider_blocked before any cloud provider client is created.
embeddings bindings now require a managed model with task_key=embeddings; bootstrap profiles intentionally leave that resource slot empty until an operator selects one.
vector_store bindings in explicit mode may now reference managed knowledge bases as binding resources; the runtime-facing provider resource remains the provider index name resolved from that knowledge base.
Switching deployment profiles changes the active inference and retrieval targets without changing frontend or ModelOps APIs. Tool runtime capabilities remain modeled as optional platform capabilities, but local staging now seeds and binds both sandbox and MCP runtime by default and still enforces them per execution only when an agent references tools that need them.

Tool Runtime Convergence¶

Agent tools now use a hybrid split:

Tool definitions remain registry entities, referenced by agents via tool_refs.
Tool transport runtimes are control-plane capabilities resolved from the active deployment profile.

Current v1 transports:

mcp: remote/general-purpose tools executed through the MCP gateway provider.
sandbox_http: native Python execution tools executed through the sandbox provider.

Current canonical tools:

tool.web_search -> transport: mcp, tool_name: web_search, served by MCP Gateway through local SearXNG; token-free but internet-required
tool.python_exec -> transport: sandbox_http, tool_name: python_exec

Tool execution is LLM-driven. Agent engine passes tool definitions to the active OpenAI-compatible llm_inference provider, dispatches returned tool calls through the appropriate runtime provider, appends tool results back into the conversation, and loops for up to three rounds before returning the final answer plus normalized tool_calls metadata.

Design Principles¶

Keep agent logic in agent_engine/, not in Flask route handlers.
Use service abstractions for LLM, vector store, and data access.
Preserve sandbox isolation. Do not bypass it from backend/frontend paths.
Keep services modular so they can evolve independently.
Keep infrastructure provider binding separate from user-facing model governance.

Product AI Domains¶

The product-facing AI surface now has its own domain split, separate from the control plane and ModelOps:

playgrounds
Canonical user-facing workspace for both plain chat and knowledge-grounded chat.
Frontend entrypoints now live under the AI Playground section at /playgrounds, with dedicated /playgrounds/chat and /playgrounds/knowledge routes.
Backend persists one session model with playground_kind, assistant_ref, model_selection, knowledge_binding, and messages.
Public API lives under /v1/playgrounds/*.
agent-projects
Builder-facing authoring domain for end-user agents and workflow definitions.
Publish compiles project specs into catalog-managed runtime artifacts instead of exposing raw registry entities directly.
Public API lives under /v1/agent-projects/*.
vanessa-core
First-party Vanessa behavior is intended to plug into shared execution seams instead of branching generic execution code.
Frontend entrypoints now live under the Vanessa AI section at /ai, with Vanessa Core remaining at /ai/vanessa.

Frontend work now lands under frontend/src/features/*, backend product APIs under backend/app/api/http, and engine execution seams under agent_engine/app/execution_pipeline. Admin builder/catalog work follows the same rule, with builder-facing authoring under frontend/src/features/agent-builder, catalog administration under frontend/src/features/catalog-admin, and the canonical backend HTTP owners under backend/app/api/http/catalog.py, backend/app/api/http/registry.py, and backend/app/api/http/registry_models.py.

Source of Truth¶

Container responsibilities are defined in AGENTS.md.

Owner: Core platform maintainers. Update cadence: whenever service responsibilities or interfaces change.