on
Post 3 - Creating GeoBlacklight Metadata Records
This is the third in a series of posts on the development of GeoBlacklight at NYU. It was originally published in January 2016. For an outline to the posts, click here.
About Geospatial Metadata
The 2016 Geo4LibCamp un-conference at Stanford was filled with incredible discussions about the challenges of developing a spatial data infrastructure and wrangling metadata into a form that works with GeoBlacklight. More than any other issue, creating and managing geospatial metadata occupied people’s attention. How can we make GeoBlacklight metadata efficiently? How compliant with existing standards should it be? And are there easy ways to leverage the metadata generated by other institutions to bolster one’s own collection? In this post, we are going to talk a little bit about how we’ve been addressing these questions at NYU.
About GeoBlacklight Metadata
As they developed the GeoBlacklight project, Stanford’s librarians implemented a custom GeoBlacklight metadata schema to facilitate the discovery of geospatial data. The schema is closely related to Dublin Core and is a redaction of much longer and more granular geospatial metadata standards, most notably ISO 191xx and FGDC. In terms of building collections, there are several key challenges to producing metadata records that conform to these standards. To start, they are difficult to author from scratch and difficult to edit or transform. Worse, it’s unrealistic to expect non-experts or researchers who would like to contribute geospatial data to create or alter ISO or FGDC-compliant metadata records themselves. John Huck’s tweet from the conference sums it up.
#Geo4LibCamp day 2: uncontroversial view: ISO is too heavy
— John Huck (@jhuck_AB) January 26, 2016
The great intervention of GeoBlacklight metadata is that it implies a distinction between metadata for documentation or posterity and metadata for the sake of discovery. In short, GeoBlacklight is a minimalist conflation of several geospatial metadata standards and technological interfaces, especially Apache Solr. You need a search index to make a GeoBlacklight instance “work.” Thus, even though it is a reduced standard, you still need to find efficient ways of generating GeoBlacklight metadata records if you want to develop a discovery interface for a collection of spatial data.
How does GeoBlacklight Metadata Work?
GeoBlacklight (the application) is inextricable from the metadata behind it. This may sounds like a simplistic statement to those who are used to working with discovery systems in libraries, but for us, it was an epiphany and an important part of understanding how GeoBlacklight functions. For those who haven’t seen it, here’s a breakdown of the main elements in a GeoBlacklight metadata record:
The schema incorporates key elements needed for discovery, including subject, place name dct:spatial
and file type. There are additional elements as well that pertain to the spatial discovery and Solr index, as seen in this sample GeoBlacklight record.
"dc_identifier_s": "http://hdl.handle.net/2451/34506",
"dc_title_s": "2012 New York City Train Stations",
"dc_description_s": "This point layer is an extract from the Metropolitan Transportation Authority's (MTA) stops files for Metro North and the Long Island Railroad (LIRR) that have been combined to create one train station file for the entire city. The unique ID is rail_id, a field created by attaching a railroad prefix for either Metro North or the LIRR to numbers created by the MTA. This layer was created as part of the NYC Geodatabase (NYC GDB) project, a resource designed for basic geographic analysis and thematic mapping within the five boroughs of New York City.",
"dc_rights_s": "Public",
"dct_provenance_s": "Baruch CUNY",
"dct_references_s": "{\"http://schema.org/url\":\"http://hdl.handle.net/2451/34506\",\"http://schema.org/downloadUrl\":\"https://archive.nyu.edu/retrieve/74705/nyu_2451_34506.zip\",\"http://www.opengis.net/def/serviceType/ogc/wfs\":\"https://maps-public.geo.nyu.edu/geoserver/sdr/wfs\",\"http://www.opengis.net/def/serviceType/ogc/wms\":\"https://maps-public.geo.nyu.edu/geoserver/sdr/wms\",\"http://www.isotc211.org/schemas/2005/gmd/\":\"http://metadata.geo.nyu.edu/records/edu.nyu/handle/2451/34506/iso19139.xml\",\"http://lccn.loc.gov/sh85035852\":\"https://archive.nyu.edu/retrieve/74759/nyu_2451_34506_doc.zip\"}",
"layer_id_s": "sdr:nyu_2451_34506",
"layer_slug_s": "nyu_2451_34506",
"layer_geom_type_s": "Point",
"layer_modified_dt": "2016-5-2T18:21:4Z",
"dc_format_s": "Shapefile",
"dc_language_s": "English",
"dc_type_s": "Dataset",
"dc_publisher_s": [
"Newman Library (Bernard M. Baruch College)"
],
"dc_creator_sm": "GIS Lab, Newman Library, Baruch CUNY",
"dc_subject_sm": [
"Transportation",
"Railroads",
"Railroad stations"
],
"dct_isPartOf_sm": "NYC Geodatabase (version jan2016)",
"dct_issued_s": "1/15/2016",
"dct_temporal_sm": [
"2012"
],
"dct_spatial_sm": [
"New York City, New York, United States",
"Bronx County, New York, United States",
"Kings County, New York, United States",
"New York County, New York, United States",
"Queens County, New York, United States",
"Borough of Bronx, New York, United States",
"Borough of Brooklyn, New York, United States",
"Borough of Manhattan, New York, United States",
"Borough of Queens, New York, United States"
],
"solr_geom": "ENVELOPE(-73.99358, -73.72862, 40.9054239999998, 40.6091299999998)",
"solr_year_i": 2012,
"dct_source_sm": [
"nyu_2451_34635",
"nyu_2451_34636"
],
"geoblacklight_version": "1.0"
}
Although Darren Hardy and Kim Durante explain what each element in the set means in their Code4Lib article, a few of them need more commentary. The dct:references
field accords with a key-value schema, which accounts for multiple elements that get exposed in the GeoBlacklight interface. For instance, the http://schema.org/url
key links back to the archival copy of the data in NYU’s institutional repository (the FDA). Simply put, whatever URL you place in the record after the http://schema.org/url
key is the value (in this case a link to a record) that will be prominent on the item result within GeoBlacklight. Similarly, the http://www.opengis.net/def/serviceType/ogc/wfs
key links to the URL specific to our deployment of Geoserver, which allows for the map to be previewed and downloaded in multiple formats within GeoBlacklight. See this document for a full outline of possible external services references that have been implemented thus far.
The other important element in this set is the dct:spatial field
. The value of this field is always a string that comes from the GeoNames ontology, but there are other items in each GeoNames entry that propagate elsewhere in the metadata record. Specifically, from this entry, you can take the dct:relation
field values and the georss_box
values if you want to base the extant coordinates on that information. We won’t continue to belabor the anatomy of GeoBlacklight metadata records here. Suffice it to say that while simpler and more compact than most geospatial metadata standards, GeoBlacklight still requires some work to author.
Ways of Authoring Metadata
There are multiple ways to author GeoBlacklight metadata, some of which were covered in detail during workshops at the 2017 Geo4LibCamp, the 2015 Digital Library Federation, and elsewhere. Kim Durante uses a combination of editing with ESRI’s ArcCatalog and transforming existing ISO or FGDC metadata documents (in XML format) with a series of XSLT workflows. Other librarians build upon these transforms and patch them together with metadata alterations in ArcCatalog, while others, such as the Big Ten Academic Alliance, use GeoNetwork to generate metadata that becomes GeoBlacklight compliant. In short, there is no perfect way to create GeoBlacklight metadata from scratch, and it inevitably requires a lot of work.
At NYU, we’ve begun deploying a somewhat hacked version of Omeka to generate GeoBlacklight metadata from scratch. Omeka is a great tool because it allows us to mediate the metadata creation process in several different ways. Most people have encountered Omeka as a archive web publishing platform; we are not using it as such. It’s a means to and end, and that end is getting GeoBlacklight metadata to index into our instance of GeoBlacklight. We delivered a presentation to the OpenGeoPortal Metadata Working Group on January 12, 2016 about our process with Omeka. Slides and recording are available here.
What Omeka Does
We’re going to ease up on the blow-by blow description of Omeka in this post and encourage you to listen to the recording and look at the slides. However, here are a few summary points. The most important part of Omeka is that it allows us to call on existing APIs to promote authority control as we catalog GIS data. In particular, the GeoNames ontology, which has a robust API, can populate other relevant parts of the record just by having a person select a unique place name. In cases where we want to encourage multiple values for enhanced discovery (i.e., Library of Congress Subject Headings), users can easily add fields to account for multiple values. Finally, and most importantly, Omeka exports records in the .json format we need to index into GeoBlacklight.
There are other benefits as well, including the following:
- Batch Uploads: Like other institutions, we tend to collect items in batches or sets, and in many cases, much of the metadata is consistent and easily replicated. An example is India Census Data from ML InfoMap. Using a CSV, we can populate all of the relevant fields on a spreadsheet and then use the Batch Import plug-in to load files into Omeka.
- Customizations of User Interface: We’ve altered the element prompts to provide people with clear instructions that help them fill out the record from scratch. Our goal here is to anticipate a self-deposit form function.
- Ability to Distribute Accounts: We often have student workers, of varying levels of engagement and commitment, who have extra time to help with GIS metadata creation. Omeka serves as an effective way to collaborate on larger-scale project, and its ease of use allows student workers to log in and enrich bare-bones metadata, for instance. It’s very easy to create accounts and passwords and give them to students, researchers, or whoever. No other library systems are implicated.
We’ve created a practice user account for Omeka, and we invite anyone in the community to give it a shot. Just send us an e mail for the credentials. Further, the GeoBlacklight plug-in is housed on Stephen’s GitHub account and is available for anyone to download (although you will need to alter the code base to have it conform to your institution).
Alternative Tools for Authoring Metadata
Although Omeka has worked well for us so far, we don’t imagine it to be a long-term solution to creating GeoBlacklight metadata. There are other options, like Elevator (now defunct?) and Catmandu, and more that show promise. We hope that more streamlined options will become available and are even working on a homegrown, longterm solution that we can incorporate into our infrastructure.
Depositing into OpenGeoMetadata
The final step of our metadata creation process is sharing. We have chosen to push all of our GeoBlacklight metadata records into OpenGeoMetadata, a consortium of repository records OpenGeoMetadata, which is managed by Jack Reed at Stanford University.
Essentially, OpenGeoMetadata is a series of GitHub repositories, from which anyone can index records into a local instance of GeoBlacklight. The goal is to facilitate cross-institution collaboration and collectively grow the amount of geospatial data that can be discovered with a single search. From each repository, administrators can index a set of GeoBlacklight records into their Solr core, and the items will be prominent within the application. If you’re interested in contributing to OpenGeoMetadata, get in touch with Jack Reed.
Next steps for the GeoBlacklight community
The 2016 Geo4LibCamp at Stanford was an unqualified success because we learned so much about making geospatial metadata, but also because we were able to crystallize some areas for improvement. The first area is that we need to foster best-practices for generating GeoBlacklight metadata in the community. Ultimately, the creation of GeoBlacklight metadata is a user experience issue; unless metadata is created according to a consistent standard of completeness, the application will behave differently depending on which record is accessed. We are excited to see how the community continues to handle these challenges.