Many museums, archives and libraries are exploring Linked Open Data to make their online collections more meaningful to researchers and to the public at large. From the Rijksmuseum in the Netherlands to the Cooper-Hewitt in New York, our cultural heritage can be explored in digital form with links to help us dive deeper. These aren’t just spiffed up websites, these are living representations of the physical collections with hooks for developers to build new applications that link back and let us look, listen and learn in new ways.

What if we could use this emerging foundation to allow researchers to publish links from one institution to another, connecting a letter in one archive to an artifact in another museum? People have been talking about these ideas since Vannevar Bush imagined the Memex the 1940s. Making it actually work requires a lot of disparate pieces: standard protocols, ontologies, digital representations of the physical works, and an audience with the digital tools for easy access.

I’ve been investigating a very small part of this challenge: connecting people through their things. Through interviewing researchers, archivists and museum collections managers, I’ve learned about traditional research techniques that suggest how an online system could work. Instead of linking as a post-process, the act of linking data could be part of the experience of researching and exploring online collections.

An illustration of what a page might look like for Doris Cochran, a scientist who worked at the National Museum of Natural History.  The page shows the specimens she collected, publications, papers, as well as "known associates" and "mentions" from other institutions.

A Social Network of Dead People?

What if we could pull all of this information together across collections in different organizations and present it in a unified way? Last year, working as a Presidential Innovation Fellow at the Smithsonian, I imagined what this might look like (illustrated above): a social network of historical figures linking archival documents to give us insights about history.

I quickly discovered the SNAC project (Social Networks and Archival Context) which started investigating this idea several years before me. SNAC takes structured data from EAC-CPF files, and connects people through annotations made by archivists all over the world who contribute their data to the project.

This published biographical data with links back to the source archives provides a valuable resource for researchers. SNAC name matching used automated techniques, leaving “maybeSameAs” connections where there is uncertainty:

While refining the computational techniques used continues, such techniques alone will always fall short. The most fundamental problem is identifying when similar names are for the same person or different persons. Even for human editors, identity resolution can be an exceptional challenge and sometimes cannot be reliably achieved due to insufficient or ambiguous evidence. — SNAC Research Use Notes

Identity Often Requires Research

While there may be many people named Russell Hatch, these boots only belonged to one of them.

Mrs. Sydney Blake travels to South America on a scientific expedition, but her colleague’s field notebook talks only about Doris. Archival research reveals that Doris Holmes Blake, wife of Sydney, was the travel companion.

Establishing an identity, based on a name is often an act of scholarship. Researchers explore the written record, piecing together history from different pieces of paper, photographs, or physical objects in museums

Even with clear records, facts can be disputed. There are errors in the archives that do not besmirch the disciplined care of the archivists. A birth certificate may show one date and a newspaper article a different one. One historian may assume the birth certificate is correct, until another finds a diary entry telling the funny story of a town hall clerk who got the date wrong. One researcher told me of his challenges recording biographical data from Canadian Civil War soldiers because some thought it funny to write Feb. 29th as their birthday. We know an individual was actually born at a specific date and time, but once that time is in the past, facts can become subjective.

People have always researched history by looking at the artifacts left behind. Even today, researchers travel from archive to library, often across the world, to piece together stories from letters, diaries and even notes in guestbooks at historic homes. They build a picture of what a life was like from prized possessions or everyday objects now housed in museums, from news sources, and from the stories written by friends and colleagues, both positive and slanderous.

Thousands of researchers create these kinds of links every day. They are footnotes in scholarly articles, books, and research papers. They are the tabs open in a grad students’ browser. Interviews with researcher suggest that if we could allow them to leave a trail, they would contribute to the world-wide store of knowledge.

A New Model of Publishing

We’re starting to see new patterns emerge online with new interfaces that allow people to take part in linking historical artifacts. Volunteers transcribing field notebooks link scientists, subjects and specimens using wiki-like markup at FromThePage. Jazz enthusiasts can read transcripts of conversations with musicians at LinkedJazz 52ndStreet to build a social network that with links back to the oral histories.

Emerging on the web, there’s a new model of publishing. Instead of links being created with markup and behind-the-scenes tools gated by a webmaster, connections can be added with simple text annotations or the touch of a button.

Search results for "Dr. Charles Hendrickson"  in the 1966 Field Notes of Laurence M. Klauber show image of the field notebook and text with links.
At BalboaParkOnline volunteers transcribe documents and add links.
Sarah Vaughan in the center with lines connecting to other jazz musicians.
Sarah Vaughan’s social graph at LinkedJazz created by volunteers annotating transcripts of oral histories.

The project I’m working on, Midas Innovation Toolkit, was developed in the open from day one. It started as a Presidential Innovation Fellows project, sponsored by the US Department of State.

Both the State Dept and Health and Human Services (HHS) are actively working to pilot the software within each agency to foster collaboration within different target communities. Developers at each agency are leveraging each other’s efforts by submitting changes (via pull requests) to a common shared codebase (hosted on github).

It’s exciting to see this cross-agency collaboration through open source. The software is designed to help agency employees collaborate across team boundaries, and it’s wonderful that we’re doing that with the software itself using the entirely different mechanisms of open source.

I’m relatively new to the project and still learning about it myself, but would welcome volunteer contributions — or feedback on how to make the project more welcoming to people who want to help. It uses Nodejs and Sails with Backbone on the front end, and we’ve just started writing some Chef recipes for automated deployment. There’s a lot of low-hanging fruit in the github issues list.

Would love to hear what you think about this project specifically or government open source in general!

As a developer and a citizen, I am excited about open source in the US Government. I recently joined 18F, a new digital services delivery team within the federal government, part of our General Services Administration (GSA). Last week, we announced our open source policy where our source code is developed in the open from day one as public domain (CCO).

As a citizen, I believe open source makes best use of our tax dollars:

  • Leveraging open source tools & libraries is not just about saving licensing costs, it saves time. We can evaluate a library or tool by actually using it, without up-front analysis and a time-consuming procurement process.
  • New contractors can pick up a project easily, which will drive competition and reduce switching costs.
  • Different agencies in federal, state and local governments can easily leverage each other’s code through coder social networks like github. This happened recently with the 18F Answers platform, based on Honolulu Answers, developed by Code for America, and now being leveraged to improve the immigration experience (USCIS).

As a developer, open source encourages me to apply best practices: effectively communicating the impact of the code I write making choices that will yield high quality, secure code, and embracing volunteer contributions that are aligned with the project’s mission.

On a personal level, it is an amazing professional development opportunity. A long time ago, a conversation with Rob Savoy, forever changed how I thought about the personal impact of developing open source software. He said that, with rare exception, all the code he had written was available to him in any future project. Imagine if that were true for me… if the source code of After Effects, Flash video, and Shockwave (or their open source equivalents in a parallel universe) were available on my next project.

This is even more compelling for the now-defunct proprietary software I’ve create. Adobe ScreenReady turned any document into a high quality image with anti-aliasing and alpha channel (turning the “paper” into transparency). PACo/QuickPICS enabled long-format synchronized audio-video off CD-ROM (which at that time had a comparable bandwidth to a 14.4 modem). Both these products didn’t make sense to continue from a business perspective, but had passionate customers and could have evolved into powerful tools or libraries accelerating innovation in both private and public sectors.

At 18F the software we develop is for the people and by the people. Open source gives us a firm foundation to make a lasting impact for our country and for the world.