Category Archives: Geospatial

The joys of extracting data from geoscientific papers

I am in the process of revising a recent discussion paper I have published on the tectonic evolution of the South Atlantic rift system. So I started to collect some information from various papers to back up and supplement an alternative plate tectonic scenario. Unfortunately there only one or two key papers on this remote offshore region on the Argentine margin (not a prolific basin to drill for hydrocarbons), so what I usually do in that case is to take a screenshot of the relevant maps in that paper and georeference them. Once this is done, one can add extra information into the files I am using in GPlates for my reconstructions.

As geoscientists sometime do excel in trying to even make published data hard to use (I mean if you are mapping geospatial features, is there ANY reason to disguise your work in a low-resolution raster graphics depicting a map in a very odd projection without any information about the projection or its location so that no one can really USE it – apart from reading the paper?). Sometimes (oftentimes) there is, apparently, as I am about to find out. So in this case, we have an overview map with a set of offshore seismic lines indicated. While this overview map has national and international boundaries and a coastline, it misses a graticule, but through the coastline and the international boundaries it still can be georeferenced adequately. Here is the georeferenced image of the overview map:

Georeferenced overview map. Scale dimension are not too far off: 128 km measured in the GIS vs. 125 km long scale bar.

Georeferenced overview map. Scale dimension are not too far off: 128 km measured in the GIS vs. 125 km long scale bar.

Even the latitudinal position looks ok (mouse position not visible but the reading was taken at the right hand margin at 48˚S)

Even the latitudinal position looks ok (mouse position not visible but the reading was taken at the right hand margin at 48˚S)

The map I am interested in covers the offshore seismic grid around the SJ.es-1 well, and   shows the tectonic inventory of the San Julian Basin offshore Argentina. It is in a different projection than the overview map (going by the map frame annotation) and has no geographical features which can be used for georeferencing apart from the well and the 2 seismic line locations SL2 and SL4. Easy, I hear you say, two beautifully straight lines, and a point, what more do you need?  Have a look at the map scale from the first image above. Based on that image, the lines are about 75 km long and about 70 km apart, measured on the seismic grid:

Seismic line dimensions - 75 km long. So far so good.

Seismic line dimensions – 75 km long. So far so good.

Seismic line spacing between SL4 and SL2 is about 75 km according to the georeferenced image.

Seismic line spacing between SL4 and SL2 is about 70 km according to the georeferenced image.

Now, we have a look at the structural map a bit more in detail – different projection most likely going by the frame annotation (no information given in figure caption), no other georeferencable features such as coastlines or boundaries. But was we also see is that the seismic lines are spaced about  40 km apart at their closest distance, not really parallel and latitudinally offset. There are a few reasons why this could be – two I can think of right away: the different projection compared to the overview map or the actual lines shown could be a subset of the full lines. Here is the image:

The structural map (modified from original). Seismic line spacing around 40-50km (not 70 km like in the overview map) and the lines are slightly rotated relative to each other and not parallel (like in the overview map).

The structural map (modified from original). Seismic line spacing around 40-50km (not 70 km like in the overview map) and the lines are slightly rotated relative to each other and not parallel (like in the overview map).

Now we’re going to georeference the structural map with the information contained in both maps, namely the two seismic lines and the well. Even though the projections might differ this should not be too hard:

Georeferenced structural map based on the seismic lines and the well location as provided in the overview map.

Georeferenced structural map (40% transparent) based on the seismic lines and the well location as provided in the overview map placed on top of the overview map (non-transparent).

When the seismic line end points are used the map is scaled and rotated. While seismic line 4 seems to match reasonably well and we also do get a relatively good match with line 2, we can see that the map scale is still close to double the stated scale (49 km in the georeferenced version vs. 25 km stated on map). Ok, next try:

Scaled and rotated structural map with a best fit to the overview map. Note the mismatch not only in the way the lines are rotated (could be due to the different projections) but also the distance between the the two seismic lines in the overview map (more than 70 km) and in the structural map.

Scaled and rotated structural map (transparent, on top of overview map) with a best fit to the overview map. Note the mismatch not only in the way the lines are rotated (could be due to the different projections) but also the distance between the the two seismic lines in the overview map (more than 70 km) and in the structural map.

This time, I scaled the map to match the length scale (25.9km vs 25 km stated in the map) and then rotated the image with the SJ.es-1 well as control point. So it seems that the overview map does not show the correct information – either the lines are wrongly indicated (ie not full length, not the right lines), or the line locations on the structural map are wrong. Simply,  there is no (easy and straightforward) way to get the line locations in the overview map to match those in the strutural map which is a basic breakdown of scientific reproducibility… Sadly this means that the information in the structural map cannot be utilised by other people (like me) who try use it.  I can understand -to a degree- that geoscientists have a tendency to obscure their data by chosing map projections which make it harder to reverse engineer the information contained in the maps. But there is a difference between publishing a “hard to reverse-engineer” map and a plainly wrong map.

GISLook – A quicklook plugin for Mac OS X

The Cartography group at Oregon State Uni offers a QuickLook plugin for Mac OS X, which allows to preview multiple types of GIS files (ESRI shapefiles etc) by just hitting the space bar. Very valuable if you want to just quickly check out the spatial content of a file without throwing it into a GIS. Here’s an example (even properly associates the auxiliary files *.dbf, *.shx etc with the right spatial content):

Previewing a shapefile using the GISlook quicklook plugin.

Here’s the link: http://cartography.oregonstate.edu/gislook/

GIS and p(l)ain text

So you are working with geospatial data. You are  collaborating with several people on the same dataset. People in your team are on different OS (Mac, Win or Linux) and want to use different geospatial tools, like QGIS, GPlates, GMT, OpenJUMP or ArcGIS or Matlab as they all have different requirements and  used to different workflows in their geoscientific research. You would like to keep track of the changes made to that specific dataset and snapshot it at different stages — ideally through SVN, git or any other revisioning tool. You don’t have any money and probably even less time.

So (unless I am mistaken) there are some options right at hand:

  1. Who cares about money: stuff all that open source software (who uses that anyway…), convert everyone to M$ Windoze and force them to use the one and only mighty ESRI ArcGIS. Yeah…NOT really.
  2. Put a lot of effort into setting up PostGIS plus versioning and then spend the rest of your life on figuring out how to connect ArcGIS in a way that you can read and write to that DB.
  3. Put the good old shp file into a a revisioning system. Hm, not too great for binary files and if you’d like to check differences on a single feature between two versions…
  4. Give up on the idea of revisioning and just make every user to save snapsots manually and store them in a central location?
Strangely, in 2012 there seems to be not a single “open”, non-binary file format which can be edited and read across the whole FOSS and proprietary GIS world – at least to my knowledge. A potential candidate, with little overhead is the GMT OGR format (see the documentation in the GMT5 cookbook – PDF file link here, which is produced by ogr2ogr, when converting shapefiles to GMT’s plain text format. So I guess this is what I will be doing:
  • Set up the common dataset as shapefile, add all attributes and geometries so far
  • Modify it in your GIS application of choice
  • Once you have done your changes, save the shapefile.
  • Convert the shapefile to GMT OGR using ogr2ogr
  • Put the GMT OGR file into the revisioning system.
  • Revert the process when needing to modify the data.
One could potentially write a Python script to do this in Arc but running ogr2ogr on the command line once shouldn’t be too hard…