API Usage Guide
Use this guide after you deploy your own Rafiki Worker and Modal backend. It is the main API reference for the Cloudflare routes your app calls. Do not call internal Modal URLs directly or send internal service headers from clients.
If you are still setting up, start with Getting Started, Deploy to Modal, and Deploy to Cloudflare.
Base URL
https://<your-worker>.workers.dev
Most Developers Start Here
| If you need to... | Use | Notes |
|---|---|---|
| send one request and wait for the full answer | POST /query |
simplest request/response path |
| stream execution events live | GET /query_stream |
WebSocket upgrade |
| run work in the background | POST /submit |
poll with GET /jobs/{job_id} |
Auth
What your client sends
- All public endpoints except
GET /healthrequireAuthorization: Bearer <session_token>. - WebSocket routes also accept
token=<session_token>as a query parameter. - Session tokens are usually minted by your server. They authorize one
session_idor a list of session IDs, and can also carryuser_idandtenant_idscope. - Session-scoped read routes that do not already carry
session_idin the path or body must includesession_idas a query parameter. Common examples are/schedules?...and some/jobs/{job_id}...reads.
What Rafiki handles internally
- Rafiki signs and verifies its own service-to-service auth between the Worker and the execution backend.
- Clients should only integrate with the public Worker routes documented here.
Core Endpoints
GET /health
Health probe for the public Worker.
POST /query
Non-streaming query execution. This is the easiest first integration path.
Example:
curl -X POST https://<your-worker>.workers.dev/query \
-H "Authorization: Bearer <session_token>" \
-H "Content-Type: application/json" \
-d '{
"question": "What is the capital of Canada?",
"session_id": "sess_demo_001"
}'
Request body:
{
"question": "Your prompt",
"agent_type": "default",
"session_id": null,
"session_key": null,
"fork_session": false
}
Error responses:
400when the request body is not valid JSON or does not include a stringquestion409when a session is already executing429when session budget rails deny pre-flight execution500when execution fails after request validation
Response envelope:
{
"ok": true,
"messages": [
{
"type": "assistant",
"content": [{ "type": "text", "text": "..." }],
"model": "gpt-4.1"
},
{
"type": "result",
"subtype": "success",
"duration_ms": 1234,
"session_id": "...",
"usage": {
"requests": 1,
"input_tokens": 100,
"output_tokens": 50,
"total_tokens": 150
},
"result": "..."
}
],
"summary": {
"text": "...",
"is_complete": true,
"subtype": "success",
"duration_ms": 1234,
"session_id": "..."
},
"session_id": "..."
}
GET /query_stream
Streaming query execution via WebSocket upgrade.
Use this when your client needs live execution updates instead of one final response.
Expected public WebSocket message types include:
connection_ackquery_startassistant_messageexecution_statequery_completequery_error
When resuming a session, pass session_id in the query string. WebSocket query
messages must not include session_id, session_key, user_id, or
tenant_id; Rafiki derives actor scope from the authenticated connection.
GET /ws or GET /events
WebSocket event-bus subscriptions for multi-session updates.
Authentication:
Authorization: Bearer <session_token>header, ortoken=<session_token>query parameter
Supported subscription query parameters:
session_idsession_ids(comma-separated)user_idtenant_id
Expected public event-bus message types include:
connection_ackpresence_updatesession_updatejob_submittedjob_status
Session Resources
Only the documented /state, /messages, /queue, /queue/{prompt_id}, and
/stop session routes are public Worker contract surfaces.
GET /session/{session_id}and/session/{session_id}/queryare blocked at the Worker edge and return404- Methods outside the documented set return
405
GET /session/{session_id}/state
Returns current session metadata.
Response payload:
{
"ok": true,
"state": {
"session_id": "...",
"session_key": "...",
"user_id": "...",
"tenant_id": "...",
"created_at": 1234,
"last_active_at": 1234,
"status": "idle"
}
}
GET /session/{session_id}/messages
Returns persisted user and assistant message history for the session.
Response payload:
{
"ok": true,
"messages": [
{
"id": "...",
"session_id": "...",
"role": "user",
"content": [{ "type": "text", "text": "..." }],
"created_at": 1234
}
]
}
POST /session/{session_id}/queue
Queue a prompt for sequential execution in the target session.
Actor scope is derived from the authenticated session context, not from client-controlled identity fields in the request body.
Request body:
{
"question": "Your prompt",
"agent_type": "default"
}
Error responses:
400when the request body is not valid JSON or does not include a stringquestion429when the session queue has reached its configured max size429when session budget rails deny queue preflight
GET /session/{session_id}/queue
Returns queue state, prompt positions, and expiry timestamps.
Response payload:
{
"ok": true,
"session_id": "...",
"is_executing": false,
"queue_size": 1,
"max_queue_size": 10,
"prompts": [
{
"prompt_id": "...",
"question": "...",
"user_id": "...",
"queued_at": 1234,
"expires_at": 5678,
"position": 1
}
]
}
DELETE /session/{session_id}/queue
Clears the session queue.
Response payload includes ok, session_id, cleared_count, and a
human-readable message.
DELETE /session/{session_id}/queue/{prompt_id}
Removes a single queued prompt.
Error responses:
404when theprompt_idis not present in the queue
POST /session/{session_id}/stop
Stops an active run.
Request body:
{
"mode": "graceful",
"reason": "optional"
}
Modes:
graceful: after-turn cancellationimmediate: immediate cancellation
Response payload mirrors the stop contract, including status,
requested_at, expires_at, reason, and requested_by.
Additional behavior:
requested_byis derived from the authenticated actor scope and is not accepted as a client-controlled field400when the request body is not valid JSON or fails the public stop schema502when the upstream stop response fails the public runtime contract
GET /session/{session_id}/stop
Returns cancellation flag state.
Error responses:
502when the upstream stop-status response fails the public runtime contract
Session Semantics
- A new run returns a stable
session_id - Reuse
session_idto continue memory - Set
fork_session=trueto branch into a newsession_idwith inherited history
Background Jobs
POST /submit
Queues a background job.
Error responses:
400when the request body is not valid JSON or fails the public job schema
GET /jobs/{job_id}
Returns status plus result, summary, and metrics when complete.
Job reads enforce the authenticated session, user, and tenant scope. Rafiki
returns deterministic 403 responses when ownership checks fail.
Additional validation:
502when the upstream job-status payload is malformed502when the upstream payload omits contract-required identity fields
GET /jobs/{job_id}/artifacts
Returns the artifact manifest for a completed job.
Artifact reads also enforce authenticated ownership scope and return 403 on
mismatch.
GET /jobs/{job_id}/artifacts/{artifact_path}
Downloads a single artifact.
Additional behavior:
- Rafiki performs ownership checks before attempting the download
- malformed URL-encoded artifact paths return deterministic
400withInvalid artifact path encoding - Rafiki handles internal artifact-access tokens on the backend; clients call the public Worker route only
Schedules
POST /schedules
Creates a schedule owned by the authenticated session and actor scope.
Error responses:
400when the request body is not valid JSON or fails the public schedule-create schema
GET /schedules
Lists schedules visible to the authenticated actor scope.
Additional validation:
502when the upstream payload does not match the public schedule-list contract
GET /schedules/{schedule_id}
Returns a single schedule resource.
Additional validation:
502when the upstream payload does not match the public schedule contract
PATCH /schedules/{schedule_id}
Updates a schedule resource.
Error responses:
400when the request body is not valid JSON or fails the public schedule-update schema
Browser preflight for this route explicitly allows PATCH.