Geospatial SQL & Collations
Skill: databricks-dbsql
What You Can Build
Section titled “What You Can Build”Databricks SQL includes two geospatial function families — H3 for hexagonal grid indexing and ST for standard spatial operations — plus collation support for case-insensitive and locale-aware string comparisons. Together they handle proximity searches, point-in-polygon queries, spatial joins, and consistent string matching across languages.
In Action
Section titled “In Action”“Write SQL that indexes taxi trips by H3 cell, runs a proximity search, and rolls up trip counts to a coarser resolution.”
-- Index taxi trips by pickup location at resolution 12CREATE TABLE trips_h3 ASSELECT h3_longlatash3(pickup_longitude, pickup_latitude, 12) AS pickup_cell, h3_longlatash3(dropoff_longitude, dropoff_latitude, 12) AS dropoff_cell, *FROM taxi_trips;
-- Proximity search: cells within 2 rings of a locationSELECT explode(h3_kring(h3_longlatash3(-73.985, 40.748, 9), 2)) AS nearby_cell;
-- Roll up to coarser resolution for visualizationSELECT h3_toparent(pickup_cell, 7) AS parent_cell, count(*) AS trip_countFROM trips_h3 GROUP BY h3_toparent(pickup_cell, 7);Key decisions:
- H3 divides the Earth into hexagonal cells at 16 resolutions (0-15) — resolution 12 gives ~300m cells, resolution 9 gives ~5km cells
h3_kringreturns all cells within a grid distance, creating a proximity search without calculating actual distancesh3_toparentaggregates to coarser resolutions for map visualizations- H3 functions are available since DBR 11.2 with no additional installation
More Patterns
Section titled “More Patterns”Run point-in-polygon queries with ST functions
Section titled “Run point-in-polygon queries with ST functions”“Write SQL that finds stores within a delivery zone using native spatial types and ST_Contains.”
SELECT s.name, z.zone_nameFROM retail_stores sJOIN delivery_zones z ON ST_Contains(z.boundary, s.location);ST functions operate on GEOMETRY and GEOGRAPHY types with 80+ available functions. Use GEOMETRY for projected coordinate systems (distances in meters) and GEOGRAPHY for longitude/latitude on WGS 84. Requires DBR 17.1+.
Combine H3 and ST for fast spatial joins
Section titled “Combine H3 and ST for fast spatial joins”“Write SQL that uses H3 for fast pre-filtering and ST for precise distance calculation to find customers within 2km of each store.”
SELECT s.name AS store, c.name AS customer, ST_DistanceSphere(s.location, c.location) AS distance_mFROM store_h3 sJOIN customer_h3 c ON c.h3_cell IN (SELECT explode(h3_kring(s.h3_cell, 2)))WHERE ST_DistanceSphere(s.location, c.location) < 2000;H3 pre-filtering narrows the candidate set to nearby cells (cheap integer comparisons), then ST_DistanceSphere calculates precise spherical distance on the remaining rows. Spatial joins on serverless SQL warehouses see up to 17x performance improvement with automatic spatial indexing.
Set up case-insensitive string matching across a catalog
Section titled “Set up case-insensitive string matching across a catalog”“Write SQL that creates a catalog where all string comparisons are case-insensitive by default.”
-- Catalog level (DBR 17.1+)CREATE CATALOG customer_cat DEFAULT COLLATION UNICODE_CI_AI;
-- Table level (DBR 16.3+)CREATE TABLE users ( id INT, username STRING, email STRING, password_hash STRING COLLATE UTF8_BINARY) DEFAULT COLLATION UNICODE_CI;
-- Query levelSELECT * FROM productsWHERE name COLLATE UNICODE_CI = 'laptop';Collation inheritance flows: Catalog > Schema > Table > Column > Expression. Override at any level. UNICODE_CI handles case-insensitive comparisons; UNICODE_CI_AI adds accent-insensitivity (so “Montreal” matches “Montreal”). STARTSWITH and ENDSWITH with UTF8_LCASE show up to 10x speedup compared to LOWER() workarounds.
Calculate delivery zone areas with projected coordinates
Section titled “Calculate delivery zone areas with projected coordinates”“Write SQL that computes the area of delivery zones in square meters using a projected coordinate system.”
SELECT zone_name, ST_Area(ST_Transform(boundary, 5070)) AS area_sq_metersFROM delivery_zones;Always transform to a projected coordinate system (like SRID 5070, NAD83/Albers) before calculating area or distance in meters. Geographic coordinates (WGS 84) use degrees, which produce meaningless area values without projection.
Watch Out For
Section titled “Watch Out For”- Calculating distances on GEOMETRY types without projection —
ST_Distanceon unprojected GEOMETRY returns values in degrees, not meters. UseST_DistanceSpherefor quick spherical approximations or transform to a projected CRS first. - Using GEOMETRY columns on non-serverless compute —
GEOMETRYandGEOGRAPHYcolumn types inCREATE TABLErequire serverless compute with DBR 17.1+. On older warehouses, store WKT strings and convert withST_GeomFromText()at query time. - Mixing two different explicit collations in the same expression — this raises an error. Use
COLLATEon one side to harmonize, or set a default collation at the table or catalog level. - Assuming
ALTER SCHEMA DEFAULT COLLATIONretroactively changes existing columns — it only affects newly created objects. Existing tables and columns keep their original collation.