H3 Spatial Example
Hugr can help you to create geoembeddings from your data based on the H3 DSG. H3 is a geospatial indexing system that allows you to represent geographic locations as hexagonal grids. This can be useful for various applications, such as spatial analysis, mapping, and data visualization.
Hugr provides a special type of data query called h3
that allows you to aggregate you data into h3 hexagons. The h3
query can be used to create geoembeddings from your data by aggregating it into H3 hexagons based on the spatial fields in your data tables and views.
Data source
We will use the OpenStreetMap (OSM) data for Baden-Württemberg, Germany, and Population data from the German Federal Statistical Office (Destatis) as our data sources.
Create a OSM data source
To create a data source for the OSM data you can use the existing OSM dbt project as a starting point or following by the Example to create a DuckDB database with the OSM data.
Additionally, we should extend the OSM data source to add calculated field for the county code (osm tag de:amtlicher_gemeindeschluesselz
) in osm_administrative_boundaries table. This will allow us to join the OSM data with the population data later.
We will add additional catalog file osm_extension.graphql
to the OSM data source with the following content:
extend type osm_administrative_boundaries {
de_code: String @sql(
exp: "(tags->>'de:amtlicher_gemeindeschluessel')"
)
}
And now we can add this file as a catalog source for the OSM BW:
mutation addOsmExtend{
core{
insert_catalog_sources(data:{
name: "osm_extend_de_code"
description: "Add de code to the osm admin boundaries"
type: "uriFile"
path: "~/workspace/examples/h3/osm_extend.graphql"
}){
name
path
type
}
insert_catalogs(data:{
data_source_name: "osm.bw"
catalog_name: "osm_extend_de_code"
}) {
success
affected_rows
}
}
}
Than you can reload the OSM BW data source to apply the changes:
mutation reloadOsmDataSource {
function{
reload_data_source(
name: "osm.bw"
) {
success
message
}
}
}
Create a Population data source
To create a data source for the population data, we will use the Destatis Zensus 2022 Excel file. We can convert this file to a DuckDB table using the hugr
engine.
To do this follow the steps:
- Download the Excel file and place it in your workspace, e.g.
~/workspace/examples/h3/population.xlsx
. - Save the worksheet
Regionaltabelle_Bevoelkerung
as a csv file, e.g.examples/h3/population.csv
. - To create a duckdb file with the population data, you can run following bash script:
#!/bin/bash
duckdb examples/h3/zensus.duckdb -c "CREATE TABLE population AS SELECT * FROM read_csv_auto('examples/h3/population.csv');"
- Now we can create a DuckDB data source for the population data:
mutation addZensusDataSource {
core {
insert_data_sources(data: {
name: "zensus",
type: "duckdb",
prefix: "zensus",
description: "Zensus 2022 population data",
path: "/workspace/examples/h3/zensus.duckdb"
as_module: true,
read_only: true,
self_defined: true,
}) {
success
affected_rows
}
}
}
- Load the data source:
mutation loadZensusDataSource {
core {
load_data_source(name: "zensus") {
success
message
}
}
}
Join OSM and Population data
Now you are able to join the OSM data with the population data using the de_code
field we added to the OSM data source. You can use the following query to get the population data for each OSM administrative boundary:
query debw{
osm{
bw{
osm_administrative_boundaries(filter: {admin_level: {eq: 6}}){
osm_id
name
de_code
_join(fields:["de_code"]){
zensus_population(fields: ["db_RS"]){
db_RS
Name
EWZ
}
}
}
}
}
}
This query will return the OSM administrative boundaries for admin level 7 (Landkreis) for Baden-Württemberg with the population data from the Zensus 2022. The de_code
field is used to join the OSM data with the population data.
Split population data into H3 hexagons distributed by sum of residential building areas
query deOSMByH3 {
h3(resolution: 6) @stats{
cell
resolution
data{
lk: osm_bw_osm_administrative_boundaries_aggregation(
field: "geom"
filter:{admin_level: {eq:6}},
divide_values:false
inner: true
){
pop: _join(fields: ["de_code"]) {
zensus_population(fields: ["db_RS"]){
EWZ{
sum
}
}
}
}
houses: osm_bw_osm_buildings_aggregation(
field: "geom"
filter: {building_class: {eq:"residential"}}
){
_rows_count
area_sqm{
sum
}
}
}
}
}
This query will return the H3 hexagons in Baden-Württemberg with the population data from the Zensus 2022 and the residential building areas from the OSM data. The h3
query is used to aggregate the OSM data into H3 hexagons based on the spatial fields in the OSM data tables.
Let's extend the query to calculate the population density in each H3 hexagon:
query deOSMByH3 {
h3(resolution: 6) @stats{
cell
resolution
data{
lk: osm_bw_osm_administrative_boundaries_aggregation(
field: "geom"
filter:{admin_level: {eq:6}},
divide_values:false
inner: true
){
pop: _join(fields: ["de_code"]) {
zensus_population(fields: ["db_RS"]){
EWZ{
sum
}
}
}
}
houses: osm_bw_osm_buildings_aggregation(
field: "geom"
filter: {building_class: {eq:"residential"}}
){
_rows_count
area_sqm{
sum
}
}
}
pop: distribution_by(
numerator: "data.lk.pop.zensus_population.EWZ.sum"
denominator: "data.houses.area_sqm.sum"
){
value
ratio
numerator
denominator
denominator_total
}
}
}
This query will return the H3 hexagons in Baden-Württemberg with the population density calculated as the ratio of the population to the residential building area. The distribution_by
field is used to calculate the population density in each H3 hexagon by the following formula:
value = numerator * denominator / sum(denominator)
Where numerator
is the population data and denominator
is the residential building area in each H3 hexagon. For the population data aggregation we have used the inner: true
, that means that the sum of denominator will be calculated only for the H3 hexagons that have population data.
As well you can calculate the population density by buckets, to do this you can use the distribution_by_bucket
field:
query deOSMByH3 {
h3(resolution: 6) @stats{
cell
resolution
data{
lk: osm_bw_osm_administrative_boundaries_aggregation(
field: "geom"
filter:{admin_level: {eq:6}},
divide_values:false
inner: true
){
pop: _join(fields: ["de_code"]) {
zensus_population(fields: ["db_RS"]){
EWZ{
sum
}
}
}
}
houses_bucket: osm_bw_osm_buildings_bucket_aggregation(
field: "geom"
filter: {building_class: {eq:"residential"}}
){
key{
building_type
}
aggregations{
_rows_count
area_sqm{
sum
}
}
}
}
pop_by_bucket: distribution_by_bucket(
numerator: "data.lk.pop.zensus_population.EWZ.sum"
denominator_key: "data.houses_bucket.key"
denominator: "data.houses_bucket.aggregations.area_sqm.sum"
){
denominator_key
value
ratio
}
}
}
This query will return the H3 hexagons in Baden-Württemberg with the population density calculated by buckets based on the residential building area. The distribution_by_bucket
field is used to calculate the population density in each H3 hexagon by the same formula, but denominator points to the bucket aggregation of the residential building area. The denominator_key
is used to select key field in the base aggregation.
As well numerator can point to the bucket aggregation, so you can calculate the distribution by bucket for the numerator as well.
Conclusion
In this example, we have shown how to use the hugr
engine to create geoembeddings from your data based on the H3 DSG. We have used the OSM data for Baden-Württemberg and the population data from the Zensus 2022 to create H3 hexagons with population density calculated by residential building areas. The h3
query allows you to aggregate your data into H3 hexagons based on the spatial fields in your data tables and views, making it easy to create geoembeddings for various applications.
You can apply this approach to any spatial data you have, as long as you can define the spatial fields in your data tables and views. The hugr
engine provides a powerful way to work with spatial data and create geoembeddings that can be used for various applications, such as spatial analysis, mapping, data visualization and Machine Learning.