H3: For indexing geographies into a hexagonal grid, by Uber

130 points by wiradikusuma 4 months ago

I've read all the comments here and still don't understand the rationale for hexagons as opposed to squares.

In every sense, squares seem to be much easier to reason about and easier to hierarchically partition than hexagons are.

There are certain advantages to hexagons in certain contexts, like six degrees of movement instead of four in board games, but I don't see how any of those advantages translate here for geographical indexing.

I'd love to understand why hexagons as opposed to squares in this context are a superior solution rather than unnecessary complexity?

korkoros 4 months ago

Squares have a contiguity problem: does a square in a grid have 4 neighbors (rook contiguity) or 8 (queen contiguity)? Hexagons do not have this problem. All neighbors of a hexagon share a full edge, not just a vertex. All neighbors of a hex also have their centers equidistant from that hex. A massive number of spatial problems turn on neighborhood definitions, and hexes are almost always a better representation of reality than squares.
- crazygringo 4 months ago
  
  But if you assume rook contiguity then it seems equivalent to hexagons. All neighbors share a full edge, all neighboring centers are equidistant.
  I get that you're saying hexes are almost always a better representation. I still don't see a concrete example of why, for geographical indexing specifically.
  [Edit: sibling reply explained that at the end of the day, it's not about indexing but rather route planning.]
  
  alexanderchr 4 months ago
  
  Let's say you are looking for the closest gas station. One that is in the close corner of a "diagonal neighbour" would be closer than most points in the "edge neighbours". So if you want to find something nearby, you'd usually want to look at all 8 neighbours. The hexagonal neighbours look more like a circle centered in the original hexagon, thus more convenient for that purpose.
  
  ruined 4 months ago
  
  and if you assume cows are spherical and frictionless it's quite convenient
  sometimes roads go diagonal
hankchinaski 4 months ago

From the docs
>The cell shape of that grid system is an important consideration. For simplicity, it should be a polygon that tiles regularly: the triangle, the square, or the hexagon. Of these, triangles and squares have neighbors with different distances. Triangles have three different distances, and squares have two different distances. For hexagons, all neighbors are equidistant
And
>This property allows for simpler analysis of movement. Hexagons have the property of expanding rings of neighbors approximating circles
>Hexagons are also optimally space-filling. On average, a polygon may be filled with hexagon tiles with a smaller margin of error than would be present with square tiles.
duncanfwalker 4 months ago

(closer to) equal distance between all the neighbouring cells is really helpful when you're modelling a problem as a network - eg analysing and planning routes and capacity.
Hierarchical partitioning with exact containment is useful when aggregating but that's not always the most critical property.
- crazygringo 4 months ago
  
  Ah OK thank you that is making more sense -- that since Uber is primarily about navigation and route planning, hexagons minimize the number of cells or total land coverage you need to retrieve/analyze between two points? I think that was the missing piece of information I was looking for!
poeticLLama 4 months ago

Key reasons for the choice of hexagons in h3 - hexagons have have low distortion as compared to squares when you attempt tessellation of a spherical shape. You just require 12 pentagons to complete this tessellation. - all six neighboring cells are equidistant from the center of a cell.
One of the use case Uber was trying to solve here, was to show which regions had surge pricing - on the uber app for the drivers. With h3 and hexagons you can argue it looks a teensy bit better with smoother gradients https://h3geo.org/docs/comparisons/s2.
I am not sure whether Uber uses h3 for other use cases like - finding the closest drivers next to a location.
See this presentation from Uber on this: https://www.youtube.com/watch?v=ay2uwtRO3QE
serjester 4 months ago

I created an h3 polars implementation and I think one of the big one's is flow modeling. As other's have mentioned, each hexagon has 6 equidistant neighbors. As a result, it lends itself to analyzing telematics data (hence Uber's backing of h3) - I attached a colab notebook where I do some of this. Trying to do this sort of analysis any other way would be incredibly painful.
[1] https://github.com/Filimoa/polars-h3
[2] https://drive.google.com/file/d/18jIVEbE_1QbwTbHdMqj0AVqguf2...
zamadatix 4 months ago

This will seem like a nit at first but it's really a key driver to why you'd look at other shapes: a hexagon is more comparable to a quadrangle, a square is more comparable to what is called "a regular hexagon". Regular meaning the sides and angles which make up the shape are all equal. In the 2D world, such as on board games, equilateral triangles, squares, and regular hexagons can all tile a plane perfectly. This is not the case for the surface of a sphere, there is no tiling regular polygon in that case.
From there you're just trying to optimize uniformity in distance to neighbors, how big the adjustments to the irregular polygons need to be to get them to tile on the surface are, how easy the polygon is to split up into smaller similarly shaped variants of itself as sub tiles, and trying to be somewhat close to a circle in shape as that means the average distance to the center of the area defined by the index is as close to as it can be.
If you chunk through those you'll find quadrangles aren't attractively simple anymore and hexagons tend to optimize the parameters very well. H3 actually uses both hexagons and the occasional pentagon (12 total, no matter the zoom level). It all comes down to "tiling isn't going to be perfect - what is the most optimal answer for the purpose of the tiling".
jandrewrogers 4 months ago

For the purposes of visualization, you want each cell to enclose approximately equal surface area. These are your "pixels". H3 is a way of rendering any subset of a sphere for display.
While squares have superior properties for analytical geospatial data processing that H3 doesn't have, such as congruency, they really only work for Euclidean spaces and the surface of a 2-sphere is non-Euclidean. Any system using squares will be a poor approximation of "equal area" relative to hexagons, which makes them poor for visualization. To use squares for indexing, you need an extra step that allows non-Euclidean space to be addressable from Euclidean space. There are two main ways of doing this.
First, one can project the surface of the sphere onto the surface of a Euclidean cube. Second, one can use an embedding, indexing the 2-sphere in Euclidean 3-space. Both of these can be trivially projected to a hexagonal system like H3 for visualization purposes and commonly are.
If you primarily need visualization and your data is small, using H3 eliminates the step where you need to figure out how to map non-Euclidean data models to Euclidean data models. If you are doing large-scale geospatial processing, it becomes worth the effort for both scalability and performance reasons.
feverzsj 4 months ago

In practice, there is little to no advantage. Any spacing filling index is trying to fit into an ordered index like B-tree, so existing horizontal scaling solution can be directly applied. The problem is it's much harder and inefficient to implement computational geometry algorithms on spacing filling index than a real spatial index like R-tree. On the other hand, scaling can be easily solved by table partition on different area.
doktorhladnjak 4 months ago

You can more smoothly interpolate a function for points on a surface using hexagons than with squares. For Uber, think about things like surge pricing. If you compute a surge % for each hexagon, you can interpolate it for each rider location.
gleenn 4 months ago

IIRC it has to do with tiling the globe. You can't correctly tild a sphere with swuares. On a local level, it seems easier to reason about squares on a flat surface, but the Earth is round and Uber is international.
- crazygringo 4 months ago
  
  Thanks! I'm still a bit confused because you can't tile the globe with hexagons either, if I understand correctly -- everything I can find online says you need a mix of hexagons and pentagons, and you can think of a soccer ball as an example.
  Has Uber figured out a way to do it with just hexagons?
  
  setr 4 months ago
  
  iirc it’s exactly 12 pentagons required to tile the sphere with hexagons, independent of hex-tiling size. So it’s almost fully consistent, and uber tries to put most of those pentagons in the ocean
  
  seanhunter 4 months ago
  
  That is correct. These are called Goldberg Polyhedra, and the most common example is a football/soccer ball.
  https://en.wikipedia.org/wiki/Goldberg_polyhedron
  
  noplacelikehome 4 months ago
  
  Since cars don't need to cross water, maybe this isn't actually relevant. Uber aren't trying to model the planet, just the routes within the geographies they support.
  
  tiagod 4 months ago
  
  h3 has the required pentagons in the ocean (dymaxion icosahedron vertices)
  
  Tistron 4 months ago
  
  At least one of the pentagons is on Norway. And another one contains Beijing.
  
  zamadatix 4 months ago
  
  I think what GP is saying is if you zoom to the max level 15 there remain only 12 cells which come out as pentagons and all 12 of those are in the ocean (albeit sometimes very near shore). If you zoom to the global scale those 12 get inherited into ever larger parent cells which do cover large areas of the Earth though, as you found.
KaiserPro 4 months ago

I'm not sure either. http://s2geometry.io/ works well and you only have to really worry about 4 points to draw the circle.

dplarson 4 months ago

I found this presentation helpful for an intro to H3's design/motivation: "Engineering Sub-City Geos for a Hyper-Local Marketplace with Uber": https://youtu.be/wDuKeUkNLkQ?si=-9JmxZQJ2LZo6Kh4

naiv 4 months ago

This online tool gives you great idea about what this means on different levels:

https://wolf-h3-viewer.glitch.me/

Traubenfuchs 4 months ago

Is the big thing here that those hexagons have the advantages of a circles (almost even max distance in all directions from center) with the advantages of squares (no overlap)?

ethbr1 4 months ago

Oh, mon frère... there are many reasons.
Hexagons. Are the bestagons. https://m.youtube.com/watch?v=thOifuHs6eY
cpa 4 months ago

Yes, but it's important to note that it's quite a specialized index. - It's an index that doesn't depend on the data, unlike traditional r-tree indices. This means non-uniform data won't be queried as efficiently as with rtree, but it may be faster to build.
- This data independence property is VERY important for distributed or streaming queries. For example, if you want to join datasets using Spark or other big data tools, each team can add a column for h3 cells independently and join somewhat efficiently. For large volumes of data, constructing the rtree is just not feasible, or more precisely, very disconnected from the rest of the "data ecosystem".
- It doesn't work with any coordinate reference system other than EPSG 4326 (which you may want if you only work on specific geographies to get more precision in your floats)
- It's clearly built with points in mind. Polygons, curves, or lines are an afterthought. For example, the polygonToCells function returns a set of cells that are entirely within the polygon. If you want to join, you'd need to also have the set of all cells that entirely contain the polygon. I've never found a reliable way to get that.
That being said, it's not bad at all, but if you don't have so much data that you can't compute rtree indices, just stick with PostGIS.
- riordan 4 months ago
  
  > Polygons, curves, or lines are an afterthought. For example, the polygonToCells function returns a set of cells that are entirely within the polygon. If you want to join, you'd need to also have the set of all cells that entirely contain the polygon. I've never found a reliable way to get that.
  With v4 of h3 they (finally) have a clean syntax for this with polygonToCellsExperimental[0].
  Now there’s options for
  - Cell center is contained in the shape (default) - Cell is fully contained in the shape - Overlapping (covering): Cell overlaps the shape at any point - BBOX: Cell bounding box overlaps shape
  Makes life a fair bit easier if you’ve gotta deal with H3 polys. And if you’re working locally, DuckDB Spatial’s r-tree indexing[1] can make for a nice stand-in for PostGIS as a quick point-in-polygon solution without the need to spin up a service.
  [0]: https://h3geo.org/docs/api/regions/#polygontocellsexperiment... [1]: https://duckdb.org/docs/stable/extensions/spatial/r-tree_ind...
  
  cpa 4 months ago
  
  Nice, I did not know about polygonToCellsExperimental.
  As for DuckDB & rtree, it's alas not a replacement of postgis yet and the indices cannot be used (yet) in joins. In fact, I even have workflows where I iterate over rows in python and run duckdb queries one after the other rather than joining in just one query because of this very issue.
Epa095 4 months ago

Sqaures have overlap if you put them on a sphere, no?
There are some nice things with this. It contains cells at different levels (sizes). So you can use very small cells if you care about small areas, or large cells if you care more about larger areas. Those are given here https://h3geo.org/docs/core-library/restable/#average-area-i...
From a coordinate it is fast to get the cell at any level, so it's fast to group coordinates at whatever level you want.
If is less about comparing two coordinates, it is more about bering able to group large numbers of coordinates such that they go into non-overlapping groups, where everything in the same group is 'close' to each other.
- jillesvangurp 4 months ago
  
  Most grid systems don't have overlapping squares because they are quad trees. Each point identifies a single rectangle they are in, and all the parent rectangles that fully contain that rectangle.
  H3 doesn't have that property between levels. Cells at a level don't overlap. But the the seven children of a cell might overlap with neighboring parent cells.
  Of course rectangles aren't rectangular when projected on a sphere. Because there are no straight lines. Rectangles and hexagons are 2d shapes that are applied to the 2d projections of spheres. They only look like rectangles or hexagons in 2D.
- at0mic22 4 months ago
  
  I would assume you wouldn't care a lot about overlapping squares as the representation on the screen is mostly square, you don't order uber from the globe.
  Thus most screen projections are derived from wgs84. The idea behind h3 (as I see it) that when moving the map to lets say top-right, you'd need to fetch 1 leaf comparing to 3 leafs with rtree

kemotep 4 months ago

This is like the advantage of using 6 mile hexes in a tabletop rpg map.[0]

Each hex can be divided into smaller hexes until you get to the level of feet/meters as opposed to miles/kilometers.

[0]: https://steamtunnel.blogspot.com/2009/12/in-praise-of-6-mile...

ahoka 4 months ago

How is this better than space filling curves? Or does this solve a different problem? It’s a bit hard to see what and how it solves. Why hexagons?

jandrewrogers 4 months ago

It depends on what you are trying to do.
Hexagonal tessellations are optimized for display and visualization because they approximate equal surface areas on a sphere. They have poor properties for scalable and efficient analytical data processing because they are not congruent.
Indexing for scalable analytical processing on a sphere requires congruent equal volume decomposition, even if you only care about the surface. You can trivially project it to the surface later if needed. Binary space decomposition, of which space-filling curves are a subset, are strongly preferable for this type of indexing.
In practice, a lot of data processing systems will render data as an H3 tiles only for visualization as a final step. That conversion is fast and trivial and it makes pretty pictures. It is not as commonly used to index the underlying data model because scalability and performance is prohibitive unless the data is small.
meowtimemania 4 months ago

If you used curves it'd get complicated deciding where one area starts and the other ends. With hexagons it's easier to divide the world such that (mostly) no hexagon overlaps.
The purpose is to be able to predictably map any coordinate to its associated hexagon.
In database applications this makes it easier to query all data associated with a hexagonal area.
- xhkkffbf 4 months ago
  
  Why not just squares or rectangles? Is there a simple explanation?
  
  meowtimemania 3 months ago
  
  Because it’s hard to cover a sphere in rectangles and have no rectangle overlap.
  It’s the same reason soccer balls aren’t covered in squares/rectangles but rather hexagons.
  
  jt_b 4 months ago
  
  You want a shape that can represent an approximately equal area, anywhere on a 3 dimensional surface.

LegNeato 4 months ago

There is a super high quality Rust lib that implements this: https://github.com/HydroniumLabs/h3o

tomtomistaken 4 months ago

Hexagons could be a new kind of political entity of the future, where, for example, rewards for climate progress (measured by satellites) could be put into practice.

wbercx 4 months ago

I'm still surprised What3Words isn't more popular outside of the UK.

artemonster 4 months ago

[flagged]

mablopoule 4 months ago

While I also agree about the general stupidness of what3words and the clearly predatory move of trying to move GPS location behind a proprietary database, the article is not about what3words at all.
The link is about a grid system with different mathematical properties which is used at Uber.
zimpenfish 4 months ago

What, 3 words?!