The Case For Open Data Sharing

Posted on December 22, 2011

There are several competing strategies at play, so let's compare Open Geospatial Consortium (OGC) Web Services with Google, for example.

OGC wants the community to generate and agree on a model (called a schema) that all data being shared will be transformed to fit and they want the data to be in a format that supports sharing live data in real time across the web and may even be transactional. They are not invested in the ESRI's, Intergraph's and MapInfo's of the world as the structures are software-agnostic (although they do need these companies to support their ideas to produce and consume the data services). Load CityGML data from New York and it should match CityGML data from Wellington.

It's been estimated that not sharing data in NZ is costing the economy $500 million a year.

The OGC Web Services approach is careful, planned and requires a screed of metadata to be captured according to one of the metadata standards and stored in a database where it can be harvested via a CWS (the OpenGIS Catalogue Service). Then you need to publish your spatial data via one of the other services (WMS or WFS) and if it is a snapshot then you may need to republish the snapshot on a regular basis, say nightly, as well.

Of course, as they say, no man is an island, so chances are someone will be interested in someone else's data as well as yours and want to see them both together. Equally it may be you who is interested in in the other party's data. In this case, you are going to have to determine a common data model, and publish this as a schema. A vocabulary can be developed which essentially translates the different data models to a common one. You might also have to develop a web portal so people can view all this data together in one place. If the community is a large one, you are probably going to have to get together to hammer out a consensus on a schema with a group of people with different priorities, motivations and ideas.

And because each record is delivered via a schema, each record includes all of the metadata of the schema, so large datasets may be slow-loading. Submitting a bounding rectangle or storing a copy of a large dataset offline seem to be the quick fixes.

The other emerging force is the Google way. And it sounds so tempting. You just put your stuff out there. You don't have to worry about data structures. Google will throw some mega-grunt machine learning at the problem and sort it. And they are serious about the grunt: this is a video tour of their first container farm.

In comparison it is tempting but there will always be limitations. In abdicating responsibility, you are trusting Google to find everything that you want or need. And Google is pretty good at this. Due to our being very busy (or perhaps slightly lazy) we almost always click on a link near the top of page 1 - when was the last time you scrolled down let alone go looking on page 2 or 22? And it all becomes a bit self referencing. The longer Google is around, the more Google will find the sites that Google put on Page 1.

It doesn't end there. Google brings stuff together but all the joining of the data still has to be done by the reader between the ears. Unless we bite the bullet and head down the OpenGIS path, we will be committing our industry to the lowest common denominator and preventing the real exciting spatial stuff that we all know and love from gaining a foothold.

For many years now the wider geospatial industry has been promoting the importance of metadata yet this is an area that has not really taken off. If OpenGIS is to gain a foothold it will need to have widely available metadata to catalogue. The question is whether OpenGIS becomes the catalyst to finally make significant progress here or will a lack of metadata be a persistant barrier to uptake?

GeoBlog posts