Hugr is an Open Source Data Mesh Platform and high-performance GraphQL Backend.
Designed for seamless access to distributed data sources, advanced analytics, and geospatial processing — Hugr powers rapid backend development for applications and BI tools. It provides a unified GraphQL API across all your data.
Key Benefits
Data Mesh–Ready
Build federated, domain-driven schemas without losing visibility or control.
Geospatial & Analytical Power
Perform spatial joins, aggregations, and OLAP queries — all in GraphQL.
Modern Data Stack Support
Natively integrates with Postgres, DuckDB, Parquet, Iceberg, Delta Lake, and REST APIs.
Cluster-Ready & Extensible
Scale with your workloads or embed the engine directly in your Go services.
Talk-to-data
Comming soon! Leverage natural language queries to access and analyze your data effortlessly.
Secure by Design
Enforce fine-grained access policies with built-in authentication and role-based permissions.
Understanding Data Mesh
Data Mesh is a modern approach where teams own and publish their data as a product — just like they do with APIs or microservices.
Hugr enables this by giving every domain a flexible, secure, and unified way to expose their data using GraphQL.

Use Cases
1. Data Access Backend for Applications
hugr acts as a universal GraphQL layer over data sources:
- Rapid API deployment over existing databases and files
- Centralized schema and access control
- Unified interfaces for apps and BI tools
- Minimal manual integration
- Ideal for data-first applications
2. Building Data Mesh Platforms
hugr is perfect for Data Mesh architecture:
- Modular schema definitions
- Federated access through a single API
- Decentralized data ownership
- Domain-specific modeling and scaling
- Easy onboarding of teams and data sources
3. Analytics, DataOps and MLOps Integration
hugr enables:
- Support for OLAP and spatial analytics
- Export to Arrow IPC and Python (pandas/GeoDataFrame)
- Server-side jq transformations
- Caching and scalability for heavy workloads
- Integration of ETL/ELT and ML pipeline results
4. Vibe/Agentic Analytics
hugr MCP powers Vibe's analytics platform by:
- Modular schema design for diverse data sources
- Summarize data objects and their fields, relationships, functions, modules and data sources descriptions using LLMs to better understand business context
- Lazy Hugr schema introspection tools to automatically generate GraphQL queries based on user requests
- Allow models to build complex queries over multiple data sources and performs chain of queries to fetch and aggregate data as needed
- Allow models to build JQ transformations to process and filter data server-side before returning results to users
Quick Setup in 2 Minutes
1. Start Hugr in Container
Make sure you have Docker installed on your machine. Read deployment guide →
Start hugr container:
docker run -d --name hugr -p 15000:15000 -v ./schemas:/schemas ghcr.io/hugr-lab/automigrate:latest
Access the admin UI:
http://localhost:15000/admin
Stop container:
docker stop hugr
View logs:
docker logs -f hugr
Connect DuckDB Database
Use DuckDB as an embedded analytical database with auto-generated schema
mutation AddDuckDBSource {
core {
insert_data_sources(data: {
name: "analytics"
type: "duckdb"
description: "DuckDB analytics database"
path: "/data/analytics.db"
as_module: true
self_defined: true
}) {
name
type
}
}
}
# Load the data source
mutation LoadDuckDBSource {
function {
core {
load_data_source(name: "analytics") {
success
message
}
}
}
}
Execute this mutation in the admin UI at http://localhost:15000/admin
Frequently Asked Questions
hugr is an open-source Data Mesh platform and high-performance GraphQL backend for accessing distributed data sources. It provides a unified GraphQL API across diverse sources including databases (PostgreSQL, MySQL, DuckDB), file formats (Parquet, Iceberg, Delta Lake), and REST APIs.
hugr enables rapid API development, analytics & BI, geospatial processing, and serves as a universal data access layer for applications.
Data Mesh is a decentralized approach to data architecture that treats data as a product, with domain-specific ownership. hugr enables Data Mesh by providing:
- Modular schema definitions that can be reused across different sources
- Federated access through a single GraphQL API
- Domain-specific modeling and independent scaling
- Decentralized data ownership while maintaining unified access
hugr supports multiple data source types:
- Relational Databases: DuckDB, PostgreSQL (with PostGIS, TimescaleDB, pgvector), MySQL
- File Formats: Parquet, Apache Iceberg, Delta Lake, CSV, JSON
- Spatial Formats: GeoParquet, GeoJSON, Shapefiles (GDAL compatible)
- Services: REST APIs with authentication (HTTP Basic, ApiKey, OAuth2)
- Storage: Local files and cloud object storage (S3-compatible)
- Coming Soon: DuckLake - a data lake solution for managing large volumes of data with snapshot-based schema evolution
Data sources are described using GraphQL SDL (Schema Definition Language) with hugr-specific directives. Key directives include:
@table- Define database tables@view- Define views with SQL expressions@field_references- Define relationships between tables@join- Define subquery fields in schema for data selection@module- Organize schema into logical modules@function- Define custom functions
Schema files are stored in catalogs and can be located in file systems, HTTP endpoints, or S3 buckets.
Queries:
- Basic CRUD operations with filtering, sorting, and pagination
- Complex aggregations (count, sum, avg, min, max) and bucket aggregations
- Cross-source queries and relationships
- Spatial joins and geospatial operations
- Vector search for semantic similarity
Mutations:
- Insert records with nested relations
- Update multiple records with filters
- Delete with conditional filters
- Full transaction support within single requests
Yes, hugr provides a comprehensive two-level caching system:
- L1 Cache (In-Memory): Fast local cache for quick access
- L2 Cache (Distributed): Redis/Memcached for shared cache across cluster nodes
Caching is controlled via directives:
@cache- Enable caching with configurable TTL and tags@no_cache- Disable caching for real-time data@invalidate_cache- Force cache refresh
Automatic cache invalidation occurs on mutations based on tags.
hugr supports multiple authentication methods:
- API Keys: Static keys for service-to-service communication, with support for managed keys stored in database
- OAuth2/JWT: Token-based authentication with standard JWT claims and custom claim mapping
- OIDC: OpenID Connect for enterprise identity providers (Google, Auth0, Keycloak, Azure AD)
- Anonymous: Unauthenticated access with limited permissions
Multiple methods can be enabled simultaneously, and hugr tries each in order.
hugr uses role-based access control (RBAC) managed through GraphQL API:
- Roles: Define user roles in the
rolestable - Permissions: Configure field-level and type-level access in
role_permissionstable - Row-Level Security: Apply mandatory filters to restrict data access
- Default Values: Auto-inject values in mutations (e.g., user_id, tenant_id)
Permissions support wildcards (*) for broad rules with specific exceptions. Access is open by default; add permission entries to restrict.
DuckDB is a high-performance analytical database engine optimized for OLAP workloads. hugr uses DuckDB as its core query engine because:
- Optimized for analytical queries and aggregations
- Native support for multiple data formats (Parquet, CSV, JSON)
- In-process execution with efficient memory usage
- Excellent performance for large-scale data processing
- Can attach external databases (PostgreSQL, MySQL) and query them together
Yes, hugr has native support for geospatial operations:
- Native
Geometryscalar type for spatial fields - Support for PostGIS (PostgreSQL) and DuckDB spatial extension
- Spatial file formats: GeoParquet, GeoJSON, Shapefiles
- Spatial joins and aggregations across data sources
- Distance-based queries and spatial relationships
- H3 clustering for hierarchical spatial indexing
Learn more about spatial queries → | Learn more about H3 clustering →
hugr is designed for enterprise-scale deployments:
- Horizontal Scaling: Stateless nodes that can be added/removed dynamically
- Cluster Mode: Multi-node operation with load balancing and fault tolerance
- Caching: Two-level cache (in-memory + Redis/Memcached) reduces database load
- Performance: Query optimization and pushdown to data sources
- Kubernetes Ready: Helm charts for easy K8s deployment
Learn more about cluster mode → | Learn more about container deployment →
hugr supports multiple output formats:
- GraphQL JSON: Standard GraphQL response format
- Arrow IPC: Efficient binary format for large datasets via Hugr multipart IPC protocol
- Python Integration: Direct export to pandas DataFrame and GeoDataFrame
- JQ Transformations: Server-side data transformation with custom JSON output
The Arrow IPC protocol enables efficient streaming of large datasets to analytics and ML pipelines.
Learn more about Arrow IPC → | Learn more about Python client → | Learn more about JQ transformations →
Powered by DuckDB
hugr leverages DuckDB - the blazing-fast in-process analytical database - as its core engine. This enables lightning-speed cross-source JOINs and aggregations directly in memory, combining data from PostgreSQL, S3 Parquet files, CSV, and geospatial formats in a single GraphQL query. With zero network latency and OLAP-optimized performance, DuckDB makes hugr the perfect choice for analytic workloads and data mesh architectures.