Hugr is an Open Source Data Mesh Platform and high-performance GraphQL Backend.
Designed for seamless access to distributed data sources, advanced analytics, and geospatial processing — Hugr powers rapid backend development for applications and BI tools. It provides a unified GraphQL API across all your data.
Key Benefits
Data Mesh–Ready
Build federated, domain-driven schemas without losing visibility or control.
Geospatial & Analytical Power
Perform spatial joins, aggregations, and OLAP queries — all in GraphQL.
Modern Data Stack Support
Natively integrates with Postgres, DuckDB, Parquet, Iceberg, Delta Lake, and REST APIs.
Cluster-Ready & Extensible
Scale with your workloads or embed the engine directly in your Go services.
Talk-to-data
Comming soon! Leverage natural language queries to access and analyze your data effortlessly.
Secure by Design
Enforce fine-grained access policies with built-in authentication and role-based permissions.
Understanding Data Mesh
Data Mesh is a modern approach where teams own and publish their data as a product — just like they do with APIs or microservices.
Hugr enables this by giving every domain a flexible, secure, and unified way to expose their data using GraphQL.

Use Cases
1. Data Access Backend for Applications
hugr acts as a universal GraphQL layer over data sources:
- Rapid API deployment over existing databases and files
- Centralized schema and access control
- Unified interfaces for apps and BI tools
- Minimal manual integration
- Ideal for data-first applications
2. Building Data Mesh Platforms
hugr is perfect for Data Mesh architecture:
- Modular schema definitions
- Federated access through a single API
- Decentralized data ownership
- Domain-specific modeling and scaling
- Easy onboarding of teams and data sources
3. Analytics, DataOps and MLOps Integration
hugr enables:
- Support for OLAP and spatial analytics
- Export to Arrow IPC and Python (pandas/GeoDataFrame)
- Server-side jq transformations
- Caching and scalability for heavy workloads
- Integration of ETL/ELT and ML pipeline results
4. Vibe/Agentic Analytics
hugr MCP powers Vibe's analytics platform by:
- Modular schema design for diverse data sources
- Summarize data objects and their fields, relationships, functions, modules and data sources descriptions using LLMs to better understand business context
- Lazy Hugr schema introspection tools to automatically generate GraphQL queries based on user requests
- Allow models to build complex queries over multiple data sources and performs chain of queries to fetch and aggregate data as needed
- Allow models to build JQ transformations to process and filter data server-side before returning results to users
Quick Setup in 2 Minutes
1. Start Hugr in Container
Make sure you have Docker installed on your machine. Read deployment guide →
Start hugr container:
docker run -d --name hugr -p 15000:15000 -v ./schemas:/schemas ghcr.io/hugr-lab/automigrate:latest
Access the admin UI:
http://localhost:15000/admin
Stop container:
docker stop hugr
View logs:
docker logs -f hugr
Connect DuckDB Database
Use DuckDB as an embedded analytical database with auto-generated schema
mutation AddDuckDBSource {
core {
insert_data_sources(data: {
name: "analytics"
type: "duckdb"
description: "DuckDB analytics database"
path: "/data/analytics.db"
as_module: true
self_defined: true
}) {
name
type
}
}
}
# Load the data source
mutation LoadDuckDBSource {
function {
core {
load_data_source(name: "analytics") {
success
message
}
}
}
}
Execute this mutation in the admin UI at http://localhost:15000/admin
Frequently Asked Questions
hugr is an open-source Data Mesh platform and high-performance GraphQL backend for accessing distributed data sources. It provides a unified GraphQL API across diverse sources including databases (PostgreSQL, MySQL, DuckDB, SQL Server), data lakes (DuckLake, Apache Iceberg), file formats (Parquet, Delta Lake), and REST APIs.
hugr enables rapid API development, analytics & BI, geospatial processing, and serves as a universal data access layer for applications.
Data Mesh is a decentralized approach to data architecture that treats data as a product, with domain-specific ownership. hugr enables Data Mesh by providing:
- Modular schema definitions that can be reused across different sources
- Federated access through a single GraphQL API
- Domain-specific modeling and independent scaling
- Decentralized data ownership while maintaining unified access
hugr supports multiple data source types:
- Relational Databases: DuckDB, PostgreSQL (with PostGIS, TimescaleDB, pgvector), MySQL, SQL Server / Azure SQL
- Data Lakes: DuckLake (snapshot-based time-travel, schema versioning), Apache Iceberg (REST catalogs, AWS Glue, S3 Tables — with time-travel and self-describing schema)
- File Formats: Parquet, Delta Lake, CSV, JSON
- Spatial Formats: GeoParquet, GeoJSON, Shapefiles (GDAL compatible)
- Services: REST APIs with authentication (HTTP Basic, ApiKey, OAuth2)
- Storage: Local files and cloud object storage (S3-compatible)
Data sources are described using GraphQL SDL (Schema Definition Language) with hugr-specific directives. Key directives include:
@table- Define database tables@view- Define views with SQL expressions@field_references- Define relationships between tables@join- Define subquery fields in schema for data selection@module- Organize schema into logical modules@function- Define custom functions
Schema files are stored in catalogs and can be located in file systems, HTTP endpoints, or S3 buckets.
Queries:
- Basic CRUD operations with filtering, sorting, and pagination
- Complex aggregations (count, sum, avg, min, max) and bucket aggregations
- Cross-source queries and relationships
- Spatial joins and geospatial operations
- Vector search for semantic similarity
Mutations:
- Insert records with nested relations
- Update multiple records with filters
- Delete with conditional filters
- Full transaction support within single requests
Yes, hugr provides a comprehensive two-level caching system:
- L1 Cache (In-Memory): Fast local cache for quick access
- L2 Cache (Distributed): Redis/Memcached for shared cache across cluster nodes
Caching is controlled via directives:
@cache- Enable caching with configurable TTL and tags@no_cache- Disable caching for real-time data@invalidate_cache- Force cache refresh
Automatic cache invalidation occurs on mutations based on tags.
hugr supports multiple authentication methods:
- API Keys: Static keys for service-to-service communication, with support for managed keys stored in database
- OAuth2/JWT: Token-based authentication with standard JWT claims and custom claim mapping
- OIDC: OpenID Connect for enterprise identity providers (Google, Auth0, Keycloak, Azure AD)
- Anonymous: Unauthenticated access with limited permissions
Multiple methods can be enabled simultaneously, and hugr tries each in order.
hugr uses role-based access control (RBAC) managed through GraphQL API:
- Roles: Define user roles in the
rolestable - Permissions: Configure field-level and type-level access in
role_permissionstable - Row-Level Security: Apply mandatory filters to restrict data access
- Default Values: Auto-inject values in mutations (e.g., user_id, tenant_id)
Permissions support wildcards (*) for broad rules with specific exceptions. Access is open by default; add permission entries to restrict.
DuckDB is a high-performance analytical database engine optimized for OLAP workloads. hugr uses DuckDB as its core query engine because:
- Optimized for analytical queries and aggregations
- Native support for multiple data formats (Parquet, CSV, JSON)
- In-process execution with efficient memory usage
- Excellent performance for large-scale data processing
- Can attach external databases (PostgreSQL, MySQL) and query them together
Yes, hugr has native support for geospatial operations:
- Native
Geometryscalar type for spatial fields - Support for PostGIS (PostgreSQL) and DuckDB spatial extension
- Spatial file formats: GeoParquet, GeoJSON, Shapefiles
- Spatial joins and aggregations across data sources
- Distance-based queries and spatial relationships
- H3 clustering for hierarchical spatial indexing
Learn more about spatial queries → | Learn more about H3 clustering →
hugr is designed for enterprise-scale deployments:
- Horizontal Scaling: Stateless nodes that can be added/removed dynamically
- Cluster Mode: Multi-node operation with load balancing and fault tolerance
- Caching: Two-level cache (in-memory + Redis/Memcached) reduces database load
- Performance: Query optimization and pushdown to data sources
- Kubernetes Ready: Helm charts for easy K8s deployment
Learn more about cluster mode → | Learn more about container deployment →
hugr supports multiple output formats:
- GraphQL JSON: Standard GraphQL response format
- Arrow IPC: Efficient binary format for large datasets via Hugr multipart IPC protocol
- Python Integration: Direct export to pandas DataFrame and GeoDataFrame
- JQ Transformations: Server-side data transformation with custom JSON output
The Arrow IPC protocol enables efficient streaming of large datasets to analytics and ML pipelines.
Learn more about Arrow IPC → | Learn more about Python client → | Learn more about JQ transformations →
Powered by DuckDB
hugr leverages DuckDB - the blazing-fast in-process analytical database - as its core engine. This enables lightning-speed cross-source JOINs and aggregations directly in memory, combining data from PostgreSQL, S3 Parquet files, CSV, and geospatial formats in a single GraphQL query. With zero network latency and OLAP-optimized performance, DuckDB makes hugr the perfect choice for analytic workloads and data mesh architectures.