Skip to main content

AI Models (core.models)

The core.models runtime module provides a unified GraphQL interface for AI model operations — text embeddings and LLM completions — across multiple providers.

Quick Start

1. Register a Model

mutation {
core {
insert_data_sources(data: {
name: "my_llm"
type: "llm-openai"
prefix: "my_llm"
path: "http://localhost:1234/v1/chat/completions?model=gemma-4&timeout=120s"
}) { name }
}
}

2. Generate Text

{
function {
core {
models {
completion(model: "my_llm", prompt: "Explain GraphQL in one sentence") {
content
latency_ms
}
}
}
}
}

Functions

embedding

Generate a single embedding vector.

embedding(model: String!, input: String!): embedding_result
FieldTypeDescription
vectorVectorEmbedding vector
token_countIntTokens consumed

embeddings

Generate multiple embedding vectors in batch.

embeddings(model: String!, input: String!): embeddings_result

Input is a JSON array of strings. Returns vectors and total token count.

completion

Simple text completion.

completion(
model: String!
prompt: String!
max_tokens: Int
temperature: Float
): llm_result

chat_completion

Multi-turn chat with optional tool calling.

chat_completion(
model: String!
messages: [String!]!
tools: [String!]
tool_choice: String
max_tokens: Int
temperature: Float
): llm_result

Messages: Each element is a JSON string with role and content fields:

messages: [
"{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"}",
"{\"role\":\"user\",\"content\":\"What is the weather?\"}"
]

Tools: Each element is a JSON string with name, description, and parameters (JSON Schema):

tools: [
"{\"name\":\"get_weather\",\"description\":\"Get weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}"
]

Tool choice: "auto" (model decides), "none" (no tools), or a specific tool name.

model_sources

List all registered AI model data sources.

model_sources: [model_source_info]

Returns name, type (llm/embedding), provider, and model for each registered source.

Response: llm_result

Normalized across all providers:

FieldTypeDescription
contentStringGenerated text
modelStringModel that responded
finish_reasonStringstop, tool_use, or length
prompt_tokensIntInput tokens
completion_tokensIntOutput tokens
total_tokensIntTotal tokens
providerStringopenai, anthropic, or gemini
latency_msIntRequest latency in milliseconds
tool_callsStringJSON: [{"id":"...","name":"...","arguments":{...}}]
thought_signatureStringGemini 2.5+: thought signature for tool call verification (see Tool Call Round-trip)

Supported Providers

TypeProviderCovers
llm-openaiOpenAI-compatibleOpenAI, Ollama, LM Studio, vLLM, Mistral, Qwen, Azure OpenAI
llm-anthropicAnthropicClaude models
llm-geminiGoogle GeminiGemini models
embeddingOpenAI-compatibleAny OpenAI-compatible embedding endpoint

All LLM providers support the same completion, chat_completion, and tool calling interface. Provider-specific API differences (auth headers, request format, response format) are handled transparently.

Tool Calling

Tool calls are normalized across all providers into a unified format:

[{"id": "call_123", "name": "get_weather", "arguments": {"city": "London"}}]
  • OpenAI: tool_calls[].function.arguments (string) → parsed to object
  • Anthropic: content[].input (object) → used as-is
  • Gemini: parts[].functionCall.args (object) → used as-is

Tool Call Round-trip

To execute tool calls and send results back to the model, build a multi-turn message history:

  1. Get tool calls from the initial chat_completion response (both tool_calls and thought_signature fields)
  2. Build the assistant message including tool_calls and thought_signature (Gemini requires this for verification)
  3. Add tool result messages with role: "tool" and tool_call_id matching the tool call id
  4. Send the full history back to chat_completion
# Step 1: Initial request → model returns tool_calls + thought_signature
{
function { core { models { chat_completion(
model: "gemini"
messages: ["{\"role\":\"user\",\"content\":\"What is the weather in Tokyo?\"}"]
tools: ["{\"name\":\"get_weather\",\"description\":\"Get weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}"]
max_tokens: 200
) {
content finish_reason tool_calls thought_signature
} } } }
}

# Step 2: Send tool results back (include thought_signature in the assistant message)
{
function { core { models { chat_completion(
model: "gemini"
messages: [
"{\"role\":\"user\",\"content\":\"What is the weather in Tokyo?\"}",
"{\"role\":\"assistant\",\"tool_calls\":[{\"id\":\"abc\",\"name\":\"get_weather\",\"arguments\":{\"city\":\"Tokyo\"}}],\"thought_signature\":\"...\"}",
"{\"role\":\"tool\",\"tool_call_id\":\"abc\",\"content\":\"{\\\"temperature\\\":22,\\\"condition\\\":\\\"sunny\\\"}\"}"
]
tools: ["{\"name\":\"get_weather\",\"description\":\"Get weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}"]
max_tokens: 200
) {
content finish_reason
} } } }
}
Gemini Thought Signatures

Gemini 2.5+ returns a thought_signature alongside tool calls. This signature must be included in the assistant message when sending tool results back — omitting it causes a 400 error. For OpenAI and Anthropic, thought_signature is empty and can be omitted.

Streaming Completions

Streaming completions deliver tokens incrementally via GraphQL subscriptions, allowing clients to display partial results as they arrive.

Subscription Query

subscription {
core {
models {
completion(
model: "my_llm"
prompt: "Explain GraphQL in detail"
max_tokens: 2048
thinking_budget: 1024
) {
type
content
model
finish_reason
tool_calls
prompt_tokens
completion_tokens
}
}
}
}

The chat_completion function also supports streaming with the same event structure, including tool calling:

subscription {
core {
models {
chat_completion(
model: "claude"
messages: [
"{\"role\":\"user\",\"content\":\"What is the weather in London?\"}"
]
tools: ["{\"name\":\"get_weather\",\"description\":\"Get weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}"]
max_tokens: 1024
) {
type
content
finish_reason
tool_calls
thought_signature
}
}
}
}

Event Types

Each streamed event has a type field indicating what it contains:

Event TypeDescription
content_deltaRegular generated tokens (the main text output)
reasoningThinking/chain-of-thought tokens (when thinking is enabled)
tool_useTool call request from the model
finishFinal event with usage statistics (prompt_tokens, completion_tokens, tool_calls, thought_signature)
errorError during generation

Thinking Budget

The thinking_budget parameter controls extended thinking (chain-of-thought) output:

  • Set in the data source URL as the maximum allowed value (e.g., thinking_budget=4096)
  • Set per-request via the thinking_budget argument (capped at the source-level maximum)
  • Set to 0 or omit to disable thinking

When enabled, the model emits reasoning events containing its chain-of-thought before producing content_delta events with the final answer.

Provider Support

ProviderThinking SupportDetails
OpenAIreasoning_content fieldReasoning tokens returned in reasoning_content; available on o-series models
Anthropicthinking_delta eventsRequires thinking_budget to be set; uses thinking content blocks
Geminithought fieldRequires thinking_budget and includeThoughts enabled; uses thought flag on parts

See Also