MAY 25TH, 2020

GeoZero, a zero-copy API for processing geospatial data

Since a few years I get more and more obsessed about the performance of my daily work tools. It took a long time to realize how slow today’s software is, compared to similar software I was working with 20 or even 30 years ago. Software bloat compensating any hardware performance improvements seems to belong to the laws of nature.

There was a short time frame a few years ago when SSD disks appeared, which gave me the feeling that computers are getting faster again. In the meantime this effect has disapeared. One of the few areas in computing which is really decades away from 20 year old applications are computer games. Computer graphics is constantly getting more realistic and gives the impression that GPU speedups are also showing up in software improvements.

For me personally, I mostly solved the performance problem by switching from IDE’s and GUI tools to a fast text editor and the console. Doing most of my daily work with these two tools (plus a web browser and a mail client) makes working on a computer pleasant again. My work is mostly about processing geospatial data. It’s usually about creating maps, interacting with maps and preparing data for maps or applications like routing. Software bloat is a problem in this area, too. There are so many software layers involved, that speed optimization gets very difficult.

The development of t-rex, a vector tile server, gave me the chance to start from scratch again. The programming language of my choice was Rust, which has a fantastic tooling and great features for writing fast and robust software. A vector tile server basically reads geospatial data, processes it and writes tiles in Protobuf format. When I started, a PostGIS implementation, geospatial and Protobuf libraries where already there. So the plan was to read the data into an existing geospatial library and use the processing functions of this library for clipping, simplifying, reprojecting, etc. But it wasn’t that easy. The PostGIS driver didn’t have support for different spatial reference systems and the main native geospatial library was limited to X/Y coordinates. So I ended up in basically rewriting the PostGIS driver, but with my limited Rust skills back then, without finding a solution to avoid copying around geometries all the time. So a typical data flow looks like

Read input data into an in-memory representation
Copy geometries into the in-memory representation of a geospatial library
Process data
Copy data into an in-memory representation of the output driver
Write data to disk

This data flow is quite common in geospatial applications. In C/C++ land, GDAL has its in-memory representation, GEOS has one and an application like QGIS copies data between these representations plus its own geometry format all the time.

Two months ago I started another attempt to make geospatial software faster. FlatGeobuf was getting enough features to make it worth considering a new geospatial format. Apart from its clever storage layout which makes it considerably faster than existing formats, its ability for fast access over HTTP makes it a game changer. But that’s worth another blog post. To bring FlatGeobuf’s raw speed to an applications like t-rex, I decided to write a zero-copy driver for it. And this time I found a solution, which is now GeoZero, a zero-copy API for processing geospatial data.

GeoZero defines an API for low-level access of geospatial data, avoiding another in-memory geometry representation. You can think of it as a SAX parser for geometries. Geometry libraries supporting this API can read data from any GeoZero data format reader without an intermediate in-memory representation. Beside FlatGeobuf it’s currently supporting WKT and SVG output and has an initial GeoJSON reader and writer. GeoZero is not only for reading and writing of geospatial formats, there are also examples for using a Geozero reader for building a KD-tree for fast neighbour searches or finding a labelling point with Polylabel. With a multiplexer you can combine multiple processors without the need of an intermediate data storage. Example of a GeoZero data flow:

                                                 +-----------------------------> Polygon renderer
                                                 |
Input data  --> data reader -->  Multiplexer  ---+
                                                 |
                                                 +--> Label point calculator --> Text renderer

Geozero is now ready to get feedback from developers of geospatial libraries and applications. After these discussions, the next step will be implementing more drivers for reading and writing data formats with a priority on WKB for PostGIS and GeoPackage and adapters for GDAL and GEOS. Since Geozero is written in pure Rust, it does also compile to Web Assembly. So all processing functions, which don’t require external libraries, also run in a web browser, which opens many new possibilities. So stay tuned!

Pirmin Kalberer (@implgeo)

Recent Posts

Archive

Categories