Embeddings and Semantic Search

Embeddings data source allows you to integrate vector embeddings models into your Hugr instance. This enables you to generate vector representations of text data and perform semantic search queries based on vector similarity.

The embeddings model should be accessible via a REST API endpoint that accepts POST requests with JSON payload containing the text to be embedded. The response should include the generated vector. The structure of the request and response should be the same as used by OpenAI embeddings API.

Defining an Embeddings Data Source

To define an embeddings data source use GraphQL API to add a record to the data_sources table. The type of the data source should be set to embeddings. The path should contain the url of the embeddings model endpoint and it allows to pass additional parameters:

model - model name, e.g. text-embedding-3-small
api_key - API key for authentication, if required by the model provider
timeout - request timeout in seconds (default is 10)
api_key_header - custom header name for the API key (default is Authorization)

mutation addEmbeddingsDataSource($data: data_sources_mut_input_data! = {}) {
  core {
    insert_data_sources(data: $data) {
      name
      description
      as_module
      disabled
      path
      prefix
      read_only
      self_defined
      type
    }
  }
}

Variables:

{
  "data": {
    "name": "openai_embeddings",
    "type": "embeddings",
    "path": "https://api.openai.com/v1/embeddings?model=text-embedding-3-small&api_key=YOUR_API_KEY",
    "prefix": "",
    "description": "OpenAI Embeddings Model",
  }
}

It allows you to get embeddings for text through the unified GraphQL API.

query {
    function {
        create_embedding(
            model: "openai_embeddings"
            input: "Hello, world!"
        )
    }
}

It also allows you to set up the model for the table or view, that contains the vector field. You should use the @embeddings(model: "openai_embeddings", vector: "vec", distance: Cosine) directive to specify the model for the vector field. The vector field should be of type Vector.

If it would be set up, you can use the similarity argument in the queries to perform semantic search based on vector similarity.

type documents @table(name: "documents") 
    @embeddings(
        model: "openai_embeddings"
        vector: "vec"
        distance: Cosine
    ) {
  id: BigInt! @pk @default(sequence: "documents_id_seq")
  category: String!
  content: String!
  vec: Vector! @dim(len: 1536)
}

query {
  documents(
    filter: { category: { eq: "news" } }
    semantic: {
      query: "Hugr is an open-source data platform"
      limit: 5
    }
  ) {
    id
    content
    vec
  }
}

It also allows you to perform mutations with vector fields. The vector will be generated automatically using the specified embeddings model.

mutation {
  insert_documents(
    data: {
        category: "news"
        content: "Hugr is an open-source data platform"
    }
    summary: "Hugr is an open-source data platform that allows you to create GraphQL APIs over your databases."
  ) {
    id
    vec
  }
  update_documents(
    filter: { id: { eq: 1 } }
    data: { content: "Updated content" }
    summary: "Updated summary"
  ) {
    success
    message
    affected_rows
  }
}

Defining an Embeddings Data Source​

Defining an Embeddings Data Source