Overview
A deployment is one attempt to promote a new revision of a service. Each deployment has a single status that advances through a fixed set of states. Deployments are asynchronous: the API returns the new record immediately and a Celery worker drives it through the pipeline.
Common reasons to use deployments:
- Ship a new commit to a running service.
- Roll back a broken release to the last
ACTIVErevision. - Wire a Git provider to deploy on every push.
- Promote a tagged release to production.
- Re-run the pipeline after a settings change, env var update, or build-config tweak.
Deployments always run in the context of a Service. A service has a deploy_type (GIT, DOCKER, UPLOAD, TEMPLATE, or FUNCTION) that determines how the pipeline is wired.
Deployment Types
| deploy_type | Source of truth | When to use |
|---|---|---|
| GIT | A Git repository (GitHub, GitLab, Bitbucket) reachable from the build agent. | The common case: your application lives in a Git repo. |
| DOCKER | A pre-built image reference (e.g. ghcr.io/org/app:abc1234). | You build images elsewhere (CI, local Docker) and want Grid to host them. |
| UPLOAD | A source tarball uploaded through the API. | One-off deploys, prototypes, environments without a Git provider. |
| TEMPLATE | A one-click template from the Grid catalog. | Spinning up Postgres + Redis + app stacks with a few clicks. |
| FUNCTION | Inline source code stored on the Service row. | See Functions for the serverless workflow. |
Build Phases
A GIT deployment passes through seven observable phases. The phase name is the pipeline_stages entry, and the deployment's status reflects the dominant phase.
QUEUED → REVIEW → BUILDING → PUSH → DEPLOYING → HEALTH_CHECK → ACTIVE
│ │ │ │
└─ BUILD_FAILED PUSH_FAILED DEPLOY_FAILED HEALTH_FAILED → FAILED- Clone — shallow
git fetch --depth=1to the commit hash, intobuild_<deployment_id>_*. - Analyze — reads
package.json,pyproject.toml,requirements.txt,Dockerfile,nixpacks.toml. The output isDeployment.review_summary. FreshGITdeploys pause atREVIEW. - Build — the chosen buildpack (
NIXPACKS,DOCKER, orSTATIC) produces a container image. - Push — image is pushed to the local insecure registry on
MASTER_MESH_IP:5000on multi-node fleets. Single-node: image is loaded into the local Docker daemon. - Deploy — new container started. The strategy (
ROLLING,BLUE_GREEN, orCANARY) is set on the service. - Health check — Traefik sends
GET <health_check_path>athealth_check_interval(default 30s). - Active — new container is now serving traffic. All other
ACTIVEdeployments for the same service are demoted toINACTIVE.
Status Reference
Every deployment carries a single status value. The list below covers all defined statuses; the most common ones are bolded.
| Status | Phase | Terminal? |
|---|---|---|
| QUEUED | initial | No |
| REVIEW | analyze | No |
| BUILDING | build | No |
| BUILD_FAILED | build | Yes |
| AWAITING_APPROVAL | review | No |
| BACKUP_RUNNING | pre-deploy | No |
| BACKUP_FAILED | pre-deploy | No |
| MIGRATION_PLANNING | pre-deploy | No |
| MIGRATION_RUNNING | pre-deploy | No |
| MIGRATION_FAILED | pre-deploy | No |
| DEPLOYING | deploy | No |
| HEALTH_CHECK | health | No |
| ACTIVE | success | Yes (lifecycle) |
| INACTIVE | post-success | Yes (lifecycle) |
| FAILED | any | Yes |
| CANCELLED | any | Yes |
| ROLLING_BACK | any | No |
| ROLLED_BACK | terminal | Yes |
BUILDING, DEPLOYING, HEALTH_CHECK, BACKUP_RUNNING, MIGRATION_RUNNING, and ROLLING_BACK are the active statuses. A service can only have one active deployment at a time; creating a second one returns HTTP 409 with the existing deployment in the response body.
API Reference
All endpoints are mounted under /api/v1/. Authentication is session- or token-based for user endpoints and HMAC-signed for node-to-node traffic.
Trigger a deployment
curl -sS -X POST http://localhost:8000/api/v1/deployments/trigger/ \
-H "Authorization: Token $SMSLY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"service_id": "9c8b4b1a-7d1c-4a2b-9a55-2e8c3d4f9b21",
"provider_id": "f1c2b0c1-1234-5678-9abc-def012345678",
"commit_hash": "abc1234"
}'Returns HTTP 201 with the new deployment record and status=QUEUED.
Cancel a deployment
curl -sS -X POST \
http://localhost:8000/api/v1/deployments/2d3e4f5a-6b7c-8d9e-0f1a-2b3c4d5e6f7a/cancel/ \
-H "Authorization: Token $SMSLY_TOKEN"Allowed only when the deployment is in QUEUED, REVIEW, BUILDING, or AWAITING_APPROVAL.
Approve a paused deployment
curl -sS -X POST \
http://localhost:8000/api/v1/deployments/2d3e4f5a-6b7c-8d9e-0f1a-2b3c4d5e6f7a/approve/ \
-H "Authorization: Token $SMSLY_TOKEN"Roll back a deployment
curl -sS -X POST \
http://localhost:8000/api/v1/deployments/2d3e4f5a-6b7c-8d9e-0f1a-2b3c4d5e6f7a/rollback/ \
-H "Authorization: Token $SMSLY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"confirm": "true"}'The confirm: "true" gate prevents accidental rollbacks. The endpoint creates a new deployment row with is_rollback=True.
One-click rollback
curl -sS -X POST \
http://localhost:8000/api/v1/services/9c8b4b1a-7d1c-4a2b-9a55-2e8c3d4f9b21/instant-rollback/ \
-H "Authorization: Token $SMSLY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message": "5xx spike after deploy"}'Looks up the most recent ACTIVE deployment and rolls back to it. The caller does not need to know the deployment ID.
Full API reference
See docs/deployments.md in the repository for every endpoint, request body, response field, and error code — including /api/v1/deployments/{id}/rollback/, instant-rollback/, and the multi-server deploy/ / multi-deploy/ actions.
Webhook Setup
Grid accepts webhooks from GitHub, GitLab, and Bitbucket. Each delivery creates a deployment for the matching service, and the webhook handler is idempotent: a WebhookDelivery row is keyed on the provider's delivery_id, so duplicate deliveries are dropped.
GitHub
- In your repo, go to Settings → Webhooks → Add webhook.
- Set Payload URL to
https://<your-grid-host>/api/v1/webhooks/github/. - Set Content type to
application/json. - Set Secret to the same value as
GITHUB_WEBHOOK_SECRETin the Grid.env. - Choose Let me select individual events and enable
PushandPull request. - Save. Push to the configured branch to fire a deployment.
GitLab
- Settings → Webhooks in the project.
- URL:
https://<your-grid-host>/api/v1/webhooks/gitlab/. - Trigger: Push events and Merge request events.
- Set the Secret token to
GITLAB_WEBHOOK_SECRET.
Bitbucket
- Repository settings → Webhooks → Add webhook.
- URL:
https://<your-grid-host>/api/v1/webhooks/bitbucket/. - Triggers: Repo: push and Pull request: created / updated.
Buildpacks
A service's buildpack field selects the build strategy. The default is NIXPACKS.
| Buildpack | Behavior |
|---|---|
| NIXPACKS | Detects the language and emits a multi-stage Dockerfile. Supports Node, Python, Go, Ruby, Rust, Java, PHP, Elixir, Deno, Bun. |
| DOCKER | Uses the Dockerfile at the service's root_directory (default /). |
| STATIC | Serves the directory as a static site. Traefik routes / to a small nginx container. |
Environment Variables
Service.env_vars is a list of (key, value, is_secret, is_locked, source) rows. The values are stored as EncryptedCharField and decrypted at deploy time.
Precedence
The final env on the new container is the union of these sources, in this order (later overrides earlier):
- Platform defaults —
PORT,SMSLY_API_KEY,SMSLY_PUBLIC_DOMAIN. - Addon auto-injection —
source=ADDON. - Shortcode resolution —
source=SHORTCODE. Example:{{pg.MAIN.DATABASE_URL}}. - System auto-injection —
source=SYSTEM. IncludesDEPLOYMENT_ID,COMMIT_HASH,BRANCH,SERVICE_NAME. - User-defined —
source=USER. Highest precedence.
If a user-defined row is marked is_locked=True, it cannot be overridden by any auto-injection step.
Health Checks and Auto-Restart
Each service has its own health check config:
health_check_path(default/health)health_check_port(blank = auto-detect fromPORTenv)health_check_interval(default 30s)health_check_timeout(default 300s)health_check_retries(default 90)auto_restart(defaultTrue)restart_policy(always,unless-stopped,on-failure,no)
Containers can also push their own health status via the Service Health Webhook:
curl -X POST https://<your-grid-host>/api/v1/services/<service-id>/health/webhook/ \
-H "X-Health-Webhook-Token: <service.health_webhook_token>" \
-H "Content-Type: application/json" \
-d '{"status": "healthy", "details": {"db": "ok", "cache": "ok"}}'Accepted status values: healthy, unhealthy, starting, needs_manual_intervention.
Autoscaler Interaction
The autoscaler can mutate Service.min_replicas while a deploy is in flight. To prevent the deploy's container plan from drifting, the platform snapshots min_replicas onto the deployment row at queue time as Deployment.queued_min_replicas. The deploy executor uses this snapshot to decide how many containers to bring up at deploy time, not the live min_replicas field.
This means:
- If a user triggers a deploy and the autoscaler is concurrently scaling up, the new deploy starts with the smaller count and the autoscaler brings the extra replicas online a few seconds later.
- If the autoscaler is concurrently scaling down, the new deploy starts with the larger count and the autoscaler schedules a scale-down after its cooldown elapses.
See Autoscaling for the full replica controller design.
Security
Deployment Throttles
The DeploymentViewSet is gated by two DRF throttles:
BurstRateThrottle—3/minuteper user. Prevents rapid-fire re-triggers.DeploymentRateThrottle—10/hourper user. Prevents resource exhaustion from excessive builds.
Both return HTTP 429 with a Retry-After header.
Audit Log
Every state change on a deployment writes an AuditLog row. The chain is hash-linked — see the AuditLog.calculate_hash() and AuditLog.save() overrides in models_audit.py. Logs are immutable.
Common audit events emitted by the pipeline:
DEPLOYMENT_TRIGGER— user triggered a new deployment.DEPLOYMENT_ROLLBACK— user requested a specific rollback.DEPLOYMENT_ROLLBACK_INSTANT— user clicked instant-rollback.DEPLOYMENT_APPROVE— user approved a paused deployment.DEPLOYMENT_CANCEL— user cancelled a deployment.
SSRF Protection
The deploy pipeline clones repositories over https:// or git://. URLs are validated against _validate_registry_url() which:
- Rejects loopback, link-local, multicast, reserved, and unspecified ranges.
- Accepts private RFC 1918 ranges only when the host resolves to a registered
CloudProvider. - Rejects non-HTTPS URLs unless the host is in the platform's
localhost/ Docker service list.
Troubleshooting
"Deployment already in progress (status: BUILDING)"
There is an active deployment for this service. Either wait for it to finish or POST /api/v1/deployments/{id}/cancel/. Creating a second active deployment returns HTTP 409 with the existing deployment in existing_deployment.
"Cannot cancel deployment in HEALTH_CHECK status"
HEALTH_CHECK is past the cancel boundary. Wait for the deployment to reach ACTIVE or FAILED, then trigger a rollback if needed.
Build hangs in BUILDING
The buildpack has stalled — usually a network failure (npm registry down, apt-get update timing out) or a runaway npm install cycle. Inspect GET /api/v1/deployments/{id}/build-logs/ for the live log tail.
"BUILD_FAILED: exit 137"
OOM-killed during build. Reduce build memory pressure (move large assets out of the build, use .dockerignore) or raise the platform's per-task memory limit (see docker-compose.prod.yml).
"ENCRYPTION_KEY_MISMATCH" at restore time
A BACKUP_ENCRYPTION_KEY was rotated without restarting the backend, or the encrypted backup was made on a different installation. Set BACKUP_ENCRYPTION_KEY to the value used at backup time, restart the backend, and re-run the deploy.
Health checks pass on the dashboard but the public domain returns 502
The platform considers the container healthy, but the Traefik route is stale. Force a route re-check: POST /api/v1/services/{id}/recheck-health/ and then POST /api/v1/system/route-recheck/.
Webhook deliveries do not trigger deployments
Inspect the WebhookDelivery table — duplicate deliveries are recorded with status=ignored. The most common cause is a webhook signed with a secret that does not match the service owner's CloudProvider config.
"vulnerability_report is empty after build"
The Trivy scan was skipped. This happens when the image is on a registry that Trivy cannot reach. Configure TRIVY_REGISTRY_USERNAME / TRIVY_REGISTRY_PASSWORD in the platform .env and re-trigger.