Post 3 - Creating GeoBlacklight Metadata Records

This is the third in a series of posts on the development of GeoBlacklight at NYU. It was originally published in January 2016. For an outline to the posts, click here.

About Geospatial Metadata

The 2016 Geo4LibCamp un-conference at Stanford was filled with incredible discussions about the challenges of developing a spatial data infrastructure and wrangling metadata into a form that works with GeoBlacklight. More than any other issue, creating and managing geospatial metadata occupied people’s attention. How can we make GeoBlacklight metadata efficiently? How compliant with existing standards should it be? And are there easy ways to leverage the metadata generated by other institutions to bolster one’s own collection? In this post, we are going to talk a little bit about how we’ve been addressing these questions at NYU.

About GeoBlacklight Metadata

As they developed the GeoBlacklight project, Stanford’s librarians implemented a custom GeoBlacklight metadata schema to facilitate the discovery of geospatial data. The schema is closely related to Dublin Core and is a redaction of much longer and more granular geospatial metadata standards, most notably ISO 191xx and FGDC. In terms of building collections, there are several key challenges to producing metadata records that conform to these standards. To start, they are difficult to author from scratch and difficult to edit or transform. Worse, it’s unrealistic to expect non-experts or researchers who would like to contribute geospatial data to create or alter ISO or FGDC-compliant metadata records themselves. John Huck’s tweet from the conference sums it up.

The great intervention of GeoBlacklight metadata is that it implies a distinction between metadata for documentation or posterity and metadata for the sake of discovery. In short, GeoBlacklight is a minimalist conflation of several geospatial metadata standards and technological interfaces, especially Apache Solr. You need a search index to make a GeoBlacklight instance “work.” Thus, even though it is a reduced standard, you still need to find efficient ways of generating GeoBlacklight metadata records if you want to develop a discovery interface for a collection of spatial data.

How does GeoBlacklight Metadata Work?

GeoBlacklight (the application) is inextricable from the metadata behind it. This may sounds like a simplistic statement to those who are used to working with discovery systems in libraries, but for us, it was an epiphany and an important part of understanding how GeoBlacklight functions. For those who haven’t seen it, here’s a breakdown of the main elements in a GeoBlacklight metadata record:

The schema incorporates key elements needed for discovery, including subject, place name dct:spatial and file type. There are additional elements as well that pertain to the spatial discovery and Solr index, as seen in this sample GeoBlacklight record.

  "dc_identifier_s": "http://hdl.handle.net/2451/34506",
  "dc_title_s": "2012 New York City Train Stations",
  "dc_description_s": "This point layer is an extract from the Metropolitan Transportation Authority's (MTA) stops files for Metro North and the Long Island Railroad (LIRR) that have been combined to create one train station file for the entire city. The unique ID is rail_id, a field created by attaching a railroad prefix for either Metro North or the LIRR to numbers created by the MTA. This layer was created as part of the NYC Geodatabase (NYC GDB) project, a resource designed for basic geographic analysis and thematic mapping within the five boroughs of New York City.",
  "dc_rights_s": "Public",
  "dct_provenance_s": "Baruch CUNY",
  "dct_references_s": "{\"http://schema.org/url\":\"http://hdl.handle.net/2451/34506\",\"http://schema.org/downloadUrl\":\"https://archive.nyu.edu/retrieve/74705/nyu_2451_34506.zip\",\"http://www.opengis.net/def/serviceType/ogc/wfs\":\"https://maps-public.geo.nyu.edu/geoserver/sdr/wfs\",\"http://www.opengis.net/def/serviceType/ogc/wms\":\"https://maps-public.geo.nyu.edu/geoserver/sdr/wms\",\"http://www.isotc211.org/schemas/2005/gmd/\":\"http://metadata.geo.nyu.edu/records/edu.nyu/handle/2451/34506/iso19139.xml\",\"http://lccn.loc.gov/sh85035852\":\"https://archive.nyu.edu/retrieve/74759/nyu_2451_34506_doc.zip\"}",
  "layer_id_s": "sdr:nyu_2451_34506",
  "layer_slug_s": "nyu_2451_34506",
  "layer_geom_type_s": "Point",
  "layer_modified_dt": "2016-5-2T18:21:4Z",
  "dc_format_s": "Shapefile",
  "dc_language_s": "English",
  "dc_type_s": "Dataset",
  "dc_publisher_s": [
    "Newman Library (Bernard M. Baruch College)"
  ],
  "dc_creator_sm": "GIS Lab, Newman Library, Baruch CUNY",
  "dc_subject_sm": [
    "Transportation",
    "Railroads",
    "Railroad stations"
  ],
  "dct_isPartOf_sm": "NYC Geodatabase (version jan2016)",
  "dct_issued_s": "1/15/2016",
  "dct_temporal_sm": [
    "2012"
  ],
  "dct_spatial_sm": [
    "New York City, New York, United States",
    "Bronx County, New York, United States",
    "Kings County, New York, United States",
    "New York County, New York, United States",
    "Queens County, New York, United States",
    "Borough of Bronx, New York, United States",
    "Borough of Brooklyn, New York, United States",
    "Borough of Manhattan, New York, United States",
    "Borough of Queens, New York, United States"
  ],
  "solr_geom": "ENVELOPE(-73.99358, -73.72862, 40.9054239999998, 40.6091299999998)",
  "solr_year_i": 2012,
  "dct_source_sm": [
    "nyu_2451_34635",
    "nyu_2451_34636"
  ],
  "geoblacklight_version": "1.0"
}

Although Darren Hardy and Kim Durante explain what each element in the set means in their Code4Lib article, a few of them need more commentary. The dct:references field accords with a key-value schema, which accounts for multiple elements that get exposed in the GeoBlacklight interface. For instance, the http://schema.org/url key links back to the archival copy of the data in NYU’s institutional repository (the FDA). Simply put, whatever URL you place in the record after the http://schema.org/url key is the value (in this case a link to a record) that will be prominent on the item result within GeoBlacklight. Similarly, the http://www.opengis.net/def/serviceType/ogc/wfs key links to the URL specific to our deployment of Geoserver, which allows for the map to be previewed and downloaded in multiple formats within GeoBlacklight. See this document for a full outline of possible external services references that have been implemented thus far.

The other important element in this set is the dct:spatial field. The value of this field is always a string that comes from the GeoNames ontology, but there are other items in each GeoNames entry that propagate elsewhere in the metadata record. Specifically, from this entry, you can take the dct:relation field values and the georss_box values if you want to base the extant coordinates on that information. We won’t continue to belabor the anatomy of GeoBlacklight metadata records here. Suffice it to say that while simpler and more compact than most geospatial metadata standards, GeoBlacklight still requires some work to author.

Ways of Authoring Metadata

There are multiple ways to author GeoBlacklight metadata, some of which were covered in detail during workshops at the 2017 Geo4LibCamp, the 2015 Digital Library Federation, and elsewhere. Kim Durante uses a combination of editing with ESRI’s ArcCatalog and transforming existing ISO or FGDC metadata documents (in XML format) with a series of XSLT workflows. Other librarians build upon these transforms and patch them together with metadata alterations in ArcCatalog, while others, such as the Big Ten Academic Alliance, use GeoNetwork to generate metadata that becomes GeoBlacklight compliant. In short, there is no perfect way to create GeoBlacklight metadata from scratch, and it inevitably requires a lot of work.

At NYU, we’ve begun deploying a somewhat hacked version of Omeka to generate GeoBlacklight metadata from scratch. Omeka is a great tool because it allows us to mediate the metadata creation process in several different ways. Most people have encountered Omeka as a archive web publishing platform; we are not using it as such. It’s a means to and end, and that end is getting GeoBlacklight metadata to index into our instance of GeoBlacklight. We delivered a presentation to the OpenGeoPortal Metadata Working Group on January 12, 2016 about our process with Omeka. Slides and recording are available here.

What Omeka Does

We’re going to ease up on the blow-by blow description of Omeka in this post and encourage you to listen to the recording and look at the slides. However, here are a few summary points. The most important part of Omeka is that it allows us to call on existing APIs to promote authority control as we catalog GIS data. In particular, the GeoNames ontology, which has a robust API, can populate other relevant parts of the record just by having a person select a unique place name. In cases where we want to encourage multiple values for enhanced discovery (i.e., Library of Congress Subject Headings), users can easily add fields to account for multiple values. Finally, and most importantly, Omeka exports records in the .json format we need to index into GeoBlacklight.

There are other benefits as well, including the following:

We’ve created a practice user account for Omeka, and we invite anyone in the community to give it a shot. Just send us an e mail for the credentials. Further, the GeoBlacklight plug-in is housed on Stephen’s GitHub account and is available for anyone to download (although you will need to alter the code base to have it conform to your institution).

Alternative Tools for Authoring Metadata

Although Omeka has worked well for us so far, we don’t imagine it to be a long-term solution to creating GeoBlacklight metadata. There are other options, like Elevator (now defunct?) and Catmandu, and more that show promise. We hope that more streamlined options will become available and are even working on a homegrown, longterm solution that we can incorporate into our infrastructure.

Depositing into OpenGeoMetadata

The final step of our metadata creation process is sharing. We have chosen to push all of our GeoBlacklight metadata records into OpenGeoMetadata, a consortium of repository records OpenGeoMetadata, which is managed by Jack Reed at Stanford University.

OpenGeoMetadata logo

Essentially, OpenGeoMetadata is a series of GitHub repositories, from which anyone can index records into a local instance of GeoBlacklight. The goal is to facilitate cross-institution collaboration and collectively grow the amount of geospatial data that can be discovered with a single search. From each repository, administrators can index a set of GeoBlacklight records into their Solr core, and the items will be prominent within the application. If you’re interested in contributing to OpenGeoMetadata, get in touch with Jack Reed.

Next steps for the GeoBlacklight community

The 2016 Geo4LibCamp at Stanford was an unqualified success because we learned so much about making geospatial metadata, but also because we were able to crystallize some areas for improvement. The first area is that we need to foster best-practices for generating GeoBlacklight metadata in the community. Ultimately, the creation of GeoBlacklight metadata is a user experience issue; unless metadata is created according to a consistent standard of completeness, the application will behave differently depending on which record is accessed. We are excited to see how the community continues to handle these challenges.