Backend (Flask API)¶
The backend is the HTTP entrypoint for frontend and service orchestration.
Responsibilities¶
- Request validation and error handling
- API endpoints for frontend clients
- Orchestration with agent engine, vector store, and data layer
- Authentication and authorization surface (present and future)
- GenAI control plane for capability/provider/deployment-profile management
- Context Management for reusable knowledge bases and document ingestion
HTTP Ownership¶
- Canonical backend HTTP domains now register from
backend/app/api/http. playgrounds,agent-projects,platform,context,catalog,registry,registry_models,modelops,runtime,executions,policy,quotes, andcontentare the current domain-owned HTTP modules.- Legacy
backend/app/routes/*modules may remain as thin shims for import compatibility, but bootstrap registration should point atapi/httpmodules instead of flat route files.
Current Voice Endpoints¶
POST /voice/wake-eventsGET /voice/health
Registry and Runtime Endpoints¶
POST /v1/registry/modelsPOST /v1/registry/agentsPOST /v1/registry/toolsGET /v1/runtime/profile(authenticated users; read-only for non-superadmins)PUT /v1/runtime/profile(superadmin only; global runtime mode)
Typed Catalog Endpoints¶
GET /v1/catalog/agents(superadmin)POST /v1/catalog/agents(superadmin)GET /v1/catalog/agents/{id}(superadmin)PUT /v1/catalog/agents/{id}(superadmin)DELETE /v1/catalog/agents/{id}(owner or superadmin; platform agents are blocked)POST /v1/catalog/agents/{id}/validate(superadmin)GET /v1/catalog/tools(superadmin)POST /v1/catalog/tools(superadmin)GET /v1/catalog/tools/{id}(superadmin)PUT /v1/catalog/tools/{id}(superadmin)POST /v1/catalog/tools/{id}/validate(superadmin)
Catalog orchestration now resolves through the application-layer catalog-management service, while the generic registry surface remains the lower-level runtime artifact store behind those typed catalog DTOs.
Product AI Endpoints¶
Playgrounds¶
GET /v1/playgrounds/sessions(authenticated)POST /v1/playgrounds/sessions(authenticated)GET /v1/playgrounds/sessions/{id}(authenticated)PATCH /v1/playgrounds/sessions/{id}(authenticated)DELETE /v1/playgrounds/sessions/{id}(authenticated)POST /v1/playgrounds/sessions/{id}/messages(authenticated)POST /v1/playgrounds/sessions/{id}/messages/stream(authenticated)GET /v1/playgrounds/options(authenticated)
Playground semantics:
playground_kind=chatandplayground_kind=knowledgeare variants of the same canonical session model.- Session payloads carry
assistant_ref,model_selection,knowledge_binding, persistedmessages, and timestamps. - Knowledge playground sessions are backend-owned and persisted; browser-only local storage is no longer the source of truth for user conversations.
Agent Projects¶
GET /v1/agent-projects(authenticated)POST /v1/agent-projects(authenticated)GET /v1/agent-projects/{id}(authenticated)PUT /v1/agent-projects/{id}(authenticated)POST /v1/agent-projects/{id}/validate(authenticated)POST /v1/agent-projects/{id}/publish(authenticated)
Agent-project semantics:
- This is the builder-facing authoring surface for
workflow_definition,tool_policy, validation, and publish flows. - Runtime
catalogentities remain the admin/runtime surface; publish compiles agent projects into catalog-managed artifacts. - The frontend builder workspace now lives under
/agent-builder, while superadmin catalog administration remains at/control/catalog.
Platform Control Plane¶
GET /v1/platform/capabilities(authenticated)GET /v1/platform/provider-families(superadmin)GET /v1/platform/providers(superadmin)POST /v1/platform/providers(superadmin)PUT /v1/platform/providers/{id}(superadmin)DELETE /v1/platform/providers/{id}(superadmin)POST /v1/platform/providers/{id}/loaded-model(superadmin)DELETE /v1/platform/providers/{id}/loaded-model(superadmin)GET /v1/platform/deployments(superadmin)GET /v1/platform/activation-audit(superadmin)POST /v1/platform/deployments(superadmin)PATCH /v1/platform/deployments/{id}(superadmin)PUT /v1/platform/deployments/{id}(superadmin)PUT /v1/platform/deployments/{id}/bindings/{capability}(superadmin)POST /v1/platform/deployments/{id}/clone(superadmin)DELETE /v1/platform/deployments/{id}(superadmin)POST /v1/platform/deployments/{id}/activate(superadmin)POST /v1/platform/embeddings(superadmin)POST /v1/platform/providers/{id}/validate(superadmin)POST /v1/platform/vector/indexes/ensure(superadmin)POST /v1/platform/vector/documents/upsert(superadmin)POST /v1/platform/vector/query(superadmin)POST /v1/platform/vector/documents/delete(superadmin)
Platform-control request parsing and response shaping are now owned by the application-layer platform-control service behind the canonical api/http module.
Context Management¶
GET /v1/context/schema-profiles(admin)POST /v1/context/schema-profiles(superadmin)GET /v1/context/vectorization-options(admin)GET /v1/context/knowledge-bases(admin)POST /v1/context/knowledge-bases(superadmin)GET /v1/context/knowledge-bases/{id}(admin)PUT /v1/context/knowledge-bases/{id}(superadmin)DELETE /v1/context/knowledge-bases/{id}(superadmin)POST /v1/context/knowledge-bases/{id}/resync(superadmin, returns202with a queued sync run)POST /v1/context/knowledge-bases/{id}/query(admin)GET /v1/context/knowledge-bases/{id}/sources(admin)POST /v1/context/knowledge-bases/{id}/sources(superadmin)PUT /v1/context/knowledge-bases/{id}/sources/{source_id}(superadmin)DELETE /v1/context/knowledge-bases/{id}/sources/{source_id}(superadmin)POST /v1/context/knowledge-bases/{id}/sources/{source_id}/sync(superadmin, returns202with a queued sync run)GET /v1/context/knowledge-bases/{id}/sync-runs(admin)GET /v1/context/knowledge-bases/{id}/documents(admin)POST /v1/context/knowledge-bases/{id}/documents(superadmin)PUT /v1/context/knowledge-bases/{id}/documents/{document_id}(superadmin)DELETE /v1/context/knowledge-bases/{id}/documents/{document_id}(superadmin)POST /v1/context/knowledge-bases/{id}/uploads(superadmin)
Context-management request parsing and response shaping are now owned by the application-layer context-management service behind the canonical api/http module.
Context-management semantics:
- Managed knowledge bases are global records, but deployments explicitly bind them as
vector_storebinding resources. - Each managed knowledge base is created against one configured
vector_storeprovider instance. - Schema authoring is provider-aware: reusable schema profiles are keyed by provider family, with built-in Weaviate templates plus superadmin-created custom profiles.
- Knowledge-base creation also persists a
vectorizationstrategy. In the current slice, KBs either usevanessa_embeddingswith an explicit embeddings provider/model target orself_providedfor externally supplied vectors. - Knowledge-base creation also persists a create-time
chunkingstrategy. The current supported shape ischunking.strategy="fixed_length"with token-basedchunk_lengthpluschunk_overlap, and this configuration is immutable after KB creation. - Documents are stored in Postgres, chunked synchronously by backend according to the persisted KB
chunkingconfig, and upserted into the backing vector provider. - Token-based chunking resolves the tokenizer from the KB embeddings target: local/HuggingFace-backed providers load a tokenizer from the managed model filesystem path, while
openai_compatible_cloud_embeddingsproviders usetiktokenwith model-aware encoding resolution and acl100k_basefallback. - For Weaviate-backed KBs, the backend keeps collection creation on
vectorizer: noneand supplies vectors from VANESSA embeddings providers or external uploads rather than native Weaviate module vectorizers. - Managed
local_directoryknowledge sources live under allowlisted backend-visible roots configured byCONTEXT_SOURCE_ROOTS. - Source sync is deterministic: document identity is derived from
source_id + source_path + logical position, so re-sync updates or deletes existing source-managed documents instead of duplicating them. - Source sync runs are persisted with operation type, queued/running/ready/error state, progress fields, file/document counters, and error summaries for operator-facing history.
- Source include/exclude globs use simple Python
fnmatch-style matching against source-relative paths and do not support brace expansion. - Current managed ingestion supports
.txt,.md,.json,.jsonl, and text-extractable.pdffiles. PDF handling usespypdf, creates one logical document per PDF, and fails clearly for encrypted or scanned/image-only PDFs because OCR is not part of this slice. - Knowledge-base detail payloads now include sync diagnostics such as
last_sync_at,last_sync_error,last_sync_summary, andeligible_for_binding. - Operators can run an asynchronous resync that reconciles active local-directory sources first, then rebuilds one managed knowledge base from the final stored documents.
- Operators can also create/update/delete directory-backed sources and run
Sync nowagainst one source without rebuilding the whole knowledge base. - Operators can also run a retrieval test against one managed knowledge base through the active deployment embeddings/vector runtime without going through full Knowledge Chat.
- Deployment editors expose only knowledge bases that are both
activeandready. - Deployment save semantics are now capability-local. Superadmins may save
embeddings,llm_inference, andvector_storebindings independently throughPUT /v1/platform/deployments/{id}/bindings/{capability}. - Deployment identity (
slug,display_name,description) can now be updated separately throughPATCH /v1/platform/deployments/{id}. - Required deployment capabilities still require a selected provider, but model and vector resources may be left empty until the capability is fully configured.
- Deployment binding validation now requires the knowledge base backing provider instance to exactly match the selected deployment
vector_storeprovider instance. - Cross-capability KB/embeddings compatibility is now surfaced as deployment readiness metadata instead of blocking save. Runtime retrieval and ingestion paths still reject incomplete or mismatched configurations when the capability is actually used.
- Deployment binding and runtime retrieval validation now also require
vanessa_embeddingsKBs to match the deploymentembeddingsprovider instance plus its default embeddings resource. self_providedKBs are intentionally excluded from the current text-ingestion and text-query runtime flows until explicit vector upload/query flows land.- Knowledge Chat also filters runtime-selectable knowledge bases to
active+readyrecords at request time, so archived or unhealthy bindings are not silently reused. - Managed vector binding resources now use
ref_type="knowledge_base"plusknowledge_base_id, while still preservingprovider_resource_id=index_namefor runtime enforcement.
Key terms:
capability: platform function such asllm_inference,embeddings,vector_store,mcp_runtime, orsandbox_executionprovider: implementation family such asvllm_local,llama_cpp_local,openai_compatible_cloud_llm,openai_compatible_cloud_embeddings,weaviate_local,qdrant_local,mcp_gateway_local, orsandbox_localprovider_origin: family-owned origin classification,localorcloud, inherited by provider instances and serialized in provider, deployment-binding, runtime, and active-provider payloadsdeployment profile: named set of active capability bindingsbinding resource: capability-scoped resource explicitly bound at the deployment-binding layer, such as a ModelOps-managed model or a vector-store indexadapter: capability-specific backend client used by runtime paths
This layer stays separate from user-facing model/provider governance. Model governance decides which models users can access; the platform control plane decides which infrastructure implementation powers a capability.
For shared OpenAI-compatible cloud providers, endpoint/auth stay on the provider instance via secret_refs, while the deployment binding chooses the allowed managed-model resources plus one default. Provider secret refs may point at ModelOps saved credentials with modelops://credential/<credential-id>; backend resolves those encrypted credentials only for provider validation, deployment preflight, and internal runtime dispatch. Provider origin is not editable per instance; changing locality means choosing a different provider family.
Bootstrap defaults:
local-defaultis always seeded fromLLM_URL,LLM_INFERENCE_RUNTIME_URL,LLM_EMBEDDINGS_RUNTIME_URL, andWEAVIATE_URL.local-llama-cppis seeded only whenLLAMA_CPP_URLis configured.local-qdrantis seeded only whenQDRANT_URLis configured.sandbox_localis seeded fromSANDBOX_URLand bound as optionalsandbox_executioninto local deployment profiles when available.mcp_gateway_localis seeded fromMCP_GATEWAY_URLand, in default local staging, is bound into local deployment profiles asmcp_runtime.- OpenAI-compatible cloud provider families are also seeded so superadmins can create shared cloud-backed LLM or embeddings providers without changing backend code. Built-in families seed explicit
provider_origin; only the OpenAI-compatible cloud LLM and embeddings families arecloud. - The shared OpenAI-compatible LLM adapter now supports both the in-stack normalized LLM gateway and direct llama.cpp OpenAI chat-completions endpoints.
- Model-bearing deployment bindings now require a selected provider, but may be saved temporarily with zero resources and no default until the capability is fully configured.
- Deployment bindings may reference only ModelOps models that are already
active,is_validation_current=true, andlast_validation_status=success. - The runtime snapshot now serializes generic binding
resources,default_resource_id,default_resource, andresource_policyfor every capability binding. - Deployment list/detail responses now include
configuration_statusfor both the deployment and each binding so the UI can show partial or mismatched configuration without inventing its own readiness rules. - Direct backend inference and agent-engine runtime selection both enforce active-binding membership: requested LLM model ids must be present in the active
llm_inferencebinding and omitted requests fall back to the binding default. - Runtime-facing provider model ids are resolved per bound managed model. Cloud models resolve through
provider_model_id; local models resolve by matching the provider/modelsinventory against managed model metadata. - Local model-bearing providers now also expose one backend-owned loaded-model slot per provider instance. For local
llm_inferenceandembeddingsproviders, downloading a model into ModelOps does not make it testable by itself; a superadmin must assign that managed model into the provider slot so the runtime can advertise it through/v1/models. POST /v1/platform/providers/{id}/loaded-modelandDELETE /v1/platform/providers/{id}/loaded-modelare the superadmin control-plane APIs for setting or clearing that local slot intent, and now immediately apply that change to the matching local runtime controller.- Superadmin-only embeddings and vector proof routes exercise the real
embeddingsandvector_storedata planes through the active provider bindings without exposing provider-specific payloads. - Backend also resolves an execution-scoped
platform_runtimesnapshot from the active bindings and sends it toagent_enginefor real model execution, while keeping the control plane itself backend-owned. - Offline runtime enforcement is fail-closed for platform providers. When the effective runtime profile is not
online, backend rejects cloud provider validation, deployment activation, runtime-profile switches toofflinewith active cloud bindings, active runtime resolution, and runtime adapter dispatch withoffline_provider_blockedand conflict status409. - Backend owns product/public retrieval request shaping, active KB selection, deployment-runtime resolution, and knowledge-chat/source projection. It forwards canonical
input.retrievalpayloads toagent_engine, which executes semantic / keyword / hybrid retrieval against the active runtime bindings. - Canonical backend ↔ agent-engine retrieval semantics are documented in Retrieval Contract.
- Backend also forwards optional
platform_runtime.capabilities.mcp_runtimeandplatform_runtime.capabilities.sandbox_executionsnapshots to support agent tool dispatch without givingagent_enginedirect platform-table ownership. GET /v1/playgrounds/optionsexposes runtime-allowed models, assistants, and deployment-bound knowledge bases for user-facing playground selection.POST /v1/playgrounds/sessions/{id}/messagesresolves the session kind and routes chat or knowledge execution through the same backend-owned playground orchestration layer.- Superadmins can now manage provider instances and deployment profiles directly from the control-plane API/UI, including clone/delete flows and activation history reads.
- Deployment bindings now serialize the full bound-resource list plus the default resource for UI rendering.
- Deployment activation now performs provider preflight validation before switching and returns a conflict if any bound provider is unreachable or incompatible, but incomplete resource/default configuration is reported through readiness metadata instead of blocking activation.
- Provider validation now includes dry-run execution checks for sandbox providers and invoke-readiness checks for MCP gateway providers.
- Tool definitions remain registry entities. Backend bootstraps
tool.web_searchandtool.python_exec, and registry validation constrains tool specs totransport in {"mcp", "sandbox_http"}withconnection_profile_ref == "default"in this first convergence phase.tool.web_searchstays online-only and reaches SearXNG only through the MCP gateway runtime provider. - The typed catalog API is now the canonical superadmin management surface for agents and tools. Each catalog create/update writes a new registry version under the hood, so runtime consumers still resolve from the registry while operators work with typed DTOs instead of opaque spec blobs.
- Catalog agents are classified as
platformoruser. Platform agents, such asagent.knowledge_chat, can be edited or deactivated by publishing a draft version, but they cannot be deleted. User agents can be deleted by their owner or a superadmin.
ModelOps Endpoints¶
GET /v1/modelops/modelsPOST /v1/modelops/modelsGET /v1/modelops/models/{id}POST /v1/modelops/models/{id}/registerPOST /v1/modelops/models/{id}/validateGET /v1/modelops/models/{id}/testsGET /v1/modelops/models/{id}/test-runtimesPOST /v1/modelops/models/{id}/testPOST /v1/modelops/models/{id}/activatePOST /v1/modelops/models/{id}/deactivatePOST /v1/modelops/models/{id}/unregisterDELETE /v1/modelops/models/{id}GET /v1/modelops/models/{id}/usageGET /v1/modelops/models/{id}/validations- Superadmins can inspect compatible local runtime providers for a ModelOps test without changing the active deployment profile.
GET /v1/modelops/models/{id}/test-runtimesnow reports the provider slot state, the currently loaded managed model, the runtime model id, and structured advertised runtime entries. Local ModelOps tests execute only when the selected runtime is actually serving the chosen managed model. GET /v1/modelops/credentialsPOST /v1/modelops/credentialsDELETE /v1/modelops/credentials/{id}GET /v1/modelops/catalogPOST /v1/modelops/catalogGET /v1/modelops/sharingPUT /v1/modelops/sharingGET /v1/modelops/discovery/huggingfaceGET /v1/modelops/discovery/huggingface/{source_id}POST /v1/modelops/downloadsGET /v1/modelops/downloadsGET /v1/modelops/downloads/{id}POST /v1/models/inferencePOST /v1/models/generate
ModelOps ownership notes:
- Canonical HTTP registration now resolves through
backend/app/api/http/modelops.pyplus focused submodules for models, credentials, access, and local/discovery/download flows. - Request coercion and orchestration now flow through application services under
backend/app/application/modelops_*_service.py. - Legacy
backend/app/routes/modelops*.pymodules are import shims only and should not regain orchestration logic.
Agent Execution Proxy¶
POST /v1/agent-executionsGET /v1/agent-executions/{id}- Backend forwards to agent engine internal contract:
POST /v1/internal/agent-executionsGET /v1/internal/agent-executions/{id}- Internal calls include
X-Service-TokenandX-Request-Id. - Config:
AGENT_ENGINE_URLAGENT_ENGINE_SERVICE_TOKENAGENT_EXECUTION_VIA_ENGINEAGENT_EXECUTION_FALLBACKAGENT_EXECUTION_FALLBACK=trueapplies only to engine transport failures and returns deterministic503 EXEC_UPSTREAM_UNAVAILABLE; backend does not run local execution.- Canonical HTTP registration now resolves through
backend/app/api/http/executions.py. - Request validation, upstream fallback mapping, and response shaping now flow through
backend/app/application/execution_management_service.py.
Policy Rule Management¶
POST /v1/policy/rules(superadmin)GET /v1/policy/rules(superadmin)- Canonical HTTP registration now resolves through
backend/app/api/http/policy.py. - Payload validation and list/create orchestration now flow through
backend/app/application/policy_management_service.py.
Quote and Content Endpoints¶
GET /v1/quotes/summary(admin)GET /v1/quotes(admin)GET /v1/quotes/{id}(admin)POST /v1/quotes(admin)PUT /v1/quotes/{id}(admin)GET /v1/content/quote-of-the-day(public)- Canonical HTTP registration now resolves through
backend/app/api/http/quotes.pyandbackend/app/api/http/content.py. - Quote request parsing, pagination/filter normalization, and error mapping now flow through
backend/app/application/quote_management_service_app.py. contentremains intentionally thin and continues to delegate quote-of-the-day resolution directly to the quote service.
Canonical service notes: backend/README.md.
Execution contract details: docs/services/agent_execution_contract.md.
Config Source of Truth¶
- Backend config module:
backend/app/config.py get_auth_config()for auth + DB + service integration settings.get_backend_runtime_config()for runtime-only settings used by health/voice/runtime checks.- Agent engine config module:
agent_engine/app/config.py get_config()for engine DB/runtime/service-token settings.
Owner: Backend maintainers. Update cadence: whenever API routes, contracts, or service integrations change.