Back to Docs
AI & Intelligence

AI & Intelligence

17 model providers, multi-agent Senate Committee, Jules auto-fix, and a runtime intelligence layer. Opt-in: nothing is enabled until an admin saves a key.

Overview

The AI subsystem has three layers:

  1. Providers — A uniform interface over 17 third-party LLM APIs. The platform auto-discovers which providers are configured; a single key means "solo" mode, two-or-more keys means "Senate Committee" mode.
  2. Intelligence — Periodic background tasks that scan services, summarize deployments, and emit remediation recommendations. See Intelligence (Runtime).
  3. Jules auto-fix — A specialized agent that opens Pull Requests on failed deployments. Opt-in via JULES_AUTO_DEPLOY_PR.

Supported Providers

The platform ships with adapters for 17 model providers. Each has a dedicated *_API_KEY env var and a default model.

ProviderDefault model
OpenAIgpt-4o-mini
Grok (xAI)grok-3-mini
Geminigemini-2.0-flash
Claude (Anthropic)claude-sonnet-4-20250514
DeepSeekdeepseek-coder
OpenRouteropenrouter/auto
Groqllama-3.3-70b-versatile
Alibaba (Qwen)qwen-max
Jules (Google)jules-latest
Local LLM (Ollama / vLLM / LM Studio)local-model
SMSLY Cloudsmsly-latest
FreeModel.devgpt-4o-mini
OpenCode APIopencode-latest
Mistral (La Plateforme)mistral-small-latest
NVIDIA NIMnvidia/llama-3.1-nemotron-70b-instruct
Cloudflare Workers AI@cf/meta/llama-3.1-8b-instruct
Mock (fallback)canned responses

Provider Configuration

Provider configuration is a singleton row in AIProviderSettings (pk=1). It is created automatically on first access via AIProviderSettings.get_solo(). API keys are EncryptedCharField (Fernet) and are never returned in API responses — only the configured / unconfigured status is exposed.

There are two ways to configure a provider:

  1. UI — Settings → AI → Providers. Save keys per provider. The UI never displays the saved key (only a "configured" badge).
  2. APIPOST /api/v1/ai/providers/update/ (admin only). The body is a partial update of the singleton.

Either path calls _sync_db_to_env() which writes the keys into the worker process's environment so the next LLM call picks them up.

The _validate_https_allowlist Gate

The Jules provider's jules_base_url is validated against settings.JULES_ALLOWED_HOSTS (default ['api.jules.google.com']). Any other host is rejected at clean() time. The validator requires https:// and the host to be in the allowlist. This prevents an admin from accidentally pointing Jules at an attacker-controlled endpoint.

Rate Limits

The AI endpoints are throttled to prevent accidental cost overruns:

ThrottleRateScope
AIChatRateThrottle30/minuteper user — chat endpoints
AIAnalysisRateThrottle10/minuteper user — analysis endpoints
UserAICapdaily capper user — all endpoints

The UserAICap model holds the per-user daily cap. It defaults to:

  • daily_token_cap = 100000 (tokens/day)
  • daily_cost_cap_usd = 10.00 (USD/day)

When the cap is exceeded the API returns HTTP 429 with a reason (Daily token cap exceeded or Daily cost cap exceeded). The Senate Committee applies a 3× multiplier on the cap pre-flight check (SENATE_COMMITTEE_COST_MULTIPLIER).

Senate Committee

When two or more providers are configured and senate_enabled=True, the chat endpoints switch from solo mode to a three-phase deliberation. The committee is capped at senate_max_members (default 5) — only the first N configured providers participate.

Phase 1 — Propose

Every committee member answers the prompt independently and in parallel. Each call has a SENATE_TIMEOUT_SECONDS timeout (default 180s). The parallel pool uses ThreadPoolExecutor(max_workers=len(providers)) with cancel_futures=True on timeout.

Phase 2 — Review

Each provider receives all other proposals and is asked to review and vote. Voting is a structured "I agree with member X because …" or "I disagree with member X because …". This phase is also parallelized.

Phase 3 — Chair

A chair (rotated: the second configured provider by default, falling back to the first) receives all proposals and reviews, then synthesizes a final resolution. If the chair fails (timeout, 5xx, bad JSON), the next configured provider in the list is rotated in as chair and the phase is retried.

The user-facing response is the chair's resolution. The audit log records the full deliberation as metadata.votes and metadata.resolution.

Code Review Mode

When exactly two providers are configured and mode=auto (the default), the platform uses a lighter 2-agent code-review instead of the full Senate. The two agents cross-review each other (4 API calls total) and the user receives both reviews. This is cheaper than the Senate and produces results in ~half the time.

Jules Auto-Fix

Jules is a specialized agent for fixing failed deployments. It is opt-in and gated by JULES_AUTO_DEPLOY_PR. The flow:

  1. A deployment fails (status=FAILED).
  2. _collect_failure_context() builds a prompt: deployment ID, last 10000 chars of build logs, plus monitoring context (CPU / memory / OOM events / crash-loop detection) from the ScalingAnalyzer.
  3. The prompt is sent to Jules and Jules returns a structured JSON: {fix_description, files_to_change, suggested_changes}.
  4. The agent clones the repository, creates a branch jules/auto-fix-<deployment-id>, applies the suggested changes, commits, and pushes.
  5. A Pull Request is opened on GitHub (or GitLab / Bitbucket).
  6. If JULES_AUTO_DEPLOY_PR=True, a new deployment is queued on the PR's branch.

Caps

To prevent runaway auto-fixes, Jules enforces hard caps per PR:

  • MAX_FILES_PER_JULES_PR = 5 — at most 5 files per PR.
  • MAX_BYTES_PER_JULES_PR = 50_000 — at most 50 KB of diff per PR.

If the suggested fix exceeds either cap, the agent truncates the diff and writes a comment on the PR noting the truncation. The PR is still opened; the user can review and complete the fix manually.

Failure Handling

Every external call (Jules API, GitHub API, git push) is wrapped in backoff.on_exception(backoff.expo, Exception, max_tries=5, factor=2). If the auto-fix fails at any step, the task logs the error and returns a structured FixResult(success=False, error=...) payload — it never crashes the Celery worker.

The history of auto-fix attempts is exposed via GET /api/v1/jules/history/{service_id}/.

Prompt Injection Policy

The AI subsystem is hardened against prompt injection in three ways:

  1. Server-side system prompts only. User input is concatenated into the user message; the system prompt is constructed in code and cannot be overridden by the user.
  2. Truncation. User input is truncated to a configurable length (default 20000 characters) before being sent to the model. This prevents "context-flooding" attacks.
  3. Role-marker filtering. The pre-processor strips user-typed occurrences of system:, assistant:, <|im_start|>, and similar role markers from the user message.

The system prompt explicitly says "Never reveal internal system details or API keys." Models that do not follow this instruction are caught by the post-processor, which scans the response for known API key patterns and substitutes them with ••••••••.

API Reference

All AI endpoints are mounted under /api/v1/ai/. Authentication is session- or token-based; admin-only endpoints are marked accordingly.

List providers

curl -sS http://localhost:8000/api/v1/ai/providers/ \
  -H "Authorization: Token $SMSLY_TOKEN"

Configure a provider (admin)

curl -sS -X POST http://localhost:8000/api/v1/ai/providers/update/ \
  -H "Authorization: Token $SMSLY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "openai_api_key": "sk-...",
    "openai_model": "gpt-4o",
    "jules_base_url": "https://api.jules.google.com/v1"
  }'

Chat completion (solo or Senate)

curl -sS -X POST http://localhost:8000/api/v1/ai/chat/completions/ \
  -H "Authorization: Token $SMSLY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Why is my deploy stuck in HEALTH_CHECK?"}
    ]
  }'

Streaming chat

curl -sS -N -X POST http://localhost:8000/api/v1/ai/chat/stream/ \
  -H "Authorization: Token $SMSLY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Write a haiku about CI/CD."}]}'

Server-Sent Events stream. The first event carries the id; subsequent events are token deltas. The stream is closed with a data: [DONE] event.

Analyze deployment logs

curl -sS -X POST http://localhost:8000/api/v1/ai/analyze_logs/ \
  -H "Authorization: Token $SMSLY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"deployment_id": "2d3e4f5a-6b7c-8d9e-0f1a-2b3c4d5e6f7a", "mode": "hybrid"}'

Returns detected patterns (CRASH_LOOP, OOM_KILLED, DB_CONNECTION_TIMEOUT, etc.) and a confidence score per pattern.

Jules history for a service

curl -sS http://localhost:8000/api/v1/jules/history/9c8b4b1a-7d1c-4a2b-9a55-2e8c3d4f9b21/ \
  -H "Authorization: Token $SMSLY_TOKEN"

Full API reference

See docs/ai.md for every endpoint, request body, response field, and error code — including ai/test/, ai/cost-estimate/, and the include_balance=true query parameter.

Security

Encrypted API Keys at Rest

All 17 provider API keys are EncryptedCharField(max_length=500) columns in AIProviderSettings. The encryption is Fernet (symmetric, AES-128-CBC + HMAC-SHA256) using the platform's BACKUP_ENCRYPTION_KEY (or a separate AI_ENCRYPTION_KEY if set) as the master key.

Key rotation: re-encrypt the rows with the new key, then update .env and restart. There is no in-place re-key tool; the recommended path is to re-save each key through the UI after the restart.

Per-User Spend Caps

UserAICap is a one-to-one table on User. Defaults are daily_token_cap=100000 and daily_cost_cap_usd=10.00. To raise the cap for a specific user, edit the row directly or call UserAICap.objects.update_or_create(user=…, defaults={…}) in a Django shell.

The cap is recomputed on every LLM call. It is not a per-second or per-minute cap — the only per-second throttling is the DRF throttle classes.

Senate Committee Cost Multiplier

The Senate Committee pre-flight divides the user's cap by 3 (SENATE_COMMITTEE_COST_MULTIPLIER). This is a conservative guard: a typical Senate call uses 3× the tokens of a solo call (one propose + one review + one chair), so the pre-divided cap roughly matches the post-call usage.

Troubleshooting

"No AI providers configured. Add an API key in Settings > AI."

None of the 17 provider keys are set. Open Settings → AI → Providers and save at least one. The platform will not auto-fall-back to mock mode in production.

"Provider X failed: 401 Unauthorized"

The API key is invalid or has been rotated. Re-save the key in Settings → AI. The platform's _sync_db_to_env() runs on every save and writes the new key into the worker environment.

"Provider X failed: 429 Too Many Requests"

The provider is rate-limiting the platform. The default retry is 3 attempts with exponential back-off (retry_429(max_retries=3, base_delay=2.0)). After 3 failures the call is recorded in LLMUsage with zero tokens and the user gets a 502.

"Daily cost cap exceeded"

The user has hit their UserAICap.daily_cost_cap_usd. Either wait until tomorrow or raise the cap in a Django shell.

"Jules auto-fix did not create a PR"

Inspect GET /api/v1/jules/history/{service_id}/ for the failure reason. The most common cause is that the suggested fix exceeded MAX_FILES_PER_JULES_PR (5) or MAX_BYTES_PER_JULES_PR (50000).

"Provider 'jules' not in JULES_ALLOWED_HOSTS"

The platform's JULES_ALLOWED_HOSTS setting is missing the host portion of jules_base_url. Default is ['api.jules.google.com']. If you self-host Jules, add the host to the allowlist in .env:

JULES_ALLOWED_HOSTS=api.jules.google.com,jules.internal.example.com

Streaming cuts off after the first chunk

The platform's reverse proxy (Traefik) has a 60s idle timeout by default. Long streams (Senate committees with 5 members) may exceed this. Raise the timeout in traefik_dynamic.yml (transport.respondingTimeouts.idleTimeout).