Skip to main content

LLM Data Sources

LLM data sources connect hugr to AI model providers for text generation and chat completion. Three provider types are supported, each implementing the same unified GraphQL interface via the core.models runtime module.

Provider Types

llm-openai — OpenAI-Compatible

Covers: OpenAI, Azure OpenAI, Ollama, LM Studio, vLLM, Mistral, Qwen, LiteLLM, and any OpenAI-compatible endpoint.

mutation {
core {
insert_data_sources(data: {
name: "my_llm"
type: "llm-openai"
prefix: "my_llm"
path: "http://localhost:1234/v1/chat/completions?model=gemma-4&max_tokens=4096&timeout=120s"
}) { name }
}
}

llm-anthropic — Anthropic

mutation {
core {
insert_data_sources(data: {
name: "claude"
type: "llm-anthropic"
prefix: "claude"
path: "https://api.anthropic.com/v1/messages?model=claude-sonnet-4-20250514&api_key=${secret:ANTHROPIC_KEY}&max_tokens=4096"
}) { name }
}
}

llm-gemini — Google Gemini

mutation {
core {
insert_data_sources(data: {
name: "gemini"
type: "llm-gemini"
prefix: "gemini"
path: "https://generativelanguage.googleapis.com/v1beta?model=gemini-2.5-flash&api_key=${secret:GEMINI_KEY}"
}) { name }
}
}

Path Parameters

ParameterRequiredDefaultDescription
modelyesModel identifier (e.g., gpt-4o, claude-sonnet-4-20250514, gemini-2.5-flash)
api_keynoAPI key. Supports ${secret:ENV_VAR} syntax
api_key_headernoAuthorization: BearerCustom auth header (OpenAI only)
max_tokensno4096Default max tokens per request
timeoutno60sRequest timeout
rpmnoMax requests per minute (0 = unlimited)
tpmnoMax tokens per minute (0 = unlimited)
rate_storenoName of a redis data source for shared rate limit counters. If unset, counters are in-memory per-node

Configuration Examples

# OpenAI cloud
https://api.openai.com/v1/chat/completions?model=gpt-4o&api_key=${secret:OPENAI_API_KEY}

# Ollama local
http://localhost:11434/v1/chat/completions?model=mistral&timeout=120s

# LM Studio local
http://localhost:1234/v1/chat/completions?model=gemma-4&timeout=120s

# Azure OpenAI
https://myorg.openai.azure.com/openai/deployments/gpt4/chat/completions?model=gpt-4o&api_key=${secret:AZURE_KEY}&api_key_header=api-key

# vLLM / any OpenAI-compatible
http://gpu-server:8000/v1/chat/completions?model=llama3-70b&timeout=180s

Usage

LLM sources are accessed through the core.models module:

# Simple completion
{ function { core { models {
completion(model: "my_llm", prompt: "Explain GraphQL", max_tokens: 200) {
content finish_reason prompt_tokens completion_tokens latency_ms
}
} } } }

# Chat with tool calling
{ function { core { models {
chat_completion(
model: "claude",
messages: [
"{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"}",
"{\"role\":\"user\",\"content\":\"What is the weather?\"}"
],
tools: ["{\"name\":\"get_weather\",\"description\":\"Get weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}"],
max_tokens: 200
) {
content finish_reason tool_calls
}
} } } }

Response Normalization

All providers return the same llm_result structure:

FieldDescription
contentGenerated text
modelModel that responded
finish_reasonstop, tool_use, or length
prompt_tokensInput token count
completion_tokensOutput token count
total_tokensTotal tokens
provideropenai, anthropic, or gemini
latency_msRequest latency
tool_callsJSON string: [{"id":"...","name":"...","arguments":{...}}]

Rate Limiting

LLM sources support per-minute rate limits for both requests and tokens. Limits are enforced before making API calls.

In-Memory (Single Node)

# Limit to 100 requests/min and 100k tokens/min
http://api.openai.com/v1/chat/completions?model=gpt-4o&api_key=${secret:OPENAI_KEY}&rpm=100&tpm=100000

Counters are per-node only. Suitable for single-node deployments.

Shared (Redis)

For multi-node clusters, use a Redis data source for shared counters:

# First, register a Redis data source
# Then reference it in the LLM path:
http://api.openai.com/v1/chat/completions?model=gpt-4o&api_key=${secret:OPENAI_KEY}&rpm=100&tpm=100000&rate_store=redis

Redis keys auto-expire after 2 minutes. Key format: ratelimit:{source_name}:rpm:{minute_window}.

When rate limits are exceeded, requests return an error: rate limit exceeded: N requests per minute exceeded for source_name.

See Also

  • AI Models Module — full core.models API reference
  • Embeddings — vector embedding sources
  • Redis — Redis data source for key-value storage and shared counters