The gojūon is a Japanese ordering of kana named for the 5×10 grid in which the characters are displayed. Each kana corresponds to one sound in the Japanese language. Today I learned about いろは (iroha) a different way to learn Hiragana than the gojūon (五十音) ordering I learned in my Japanese class, where the characters are displayed in a grid. It makes sense to teach that way since it is easy to see which share same beginning (consonant) sound or ending (vowel) sound.

However, I knew the characters once and wanted to make my study session more interesting. I had forgotten about half the characters since first studying Japanese four years ago and wanted to review using actual words. If I could learn the characters with the context of real language then I could learn vocabulary at the same time. I wondered if there were a “quick brown fox” (pangram) for Hiragana.

I quickly found いろは (iroha) an ancient Japanese poem:

いろはにほへと
ちりぬるを
わかよたれそ
つねならむ
うゐのおくやま
けふこえて
あさきゆめみし
ゑひもせす

This poem not just an arcane bit of trivia, but a real ABCs of Japanese, where the ordering from the poem is still used today. I found a wonderful video What is “いろは iroha”? that tells the story of this word which means “basic” or “fundamental” in Japanese. I learned that the first 7 characters are used for musical notes (the way we use A-G, in Japanese they use いろはにほへと. I read elsewhere that theater seats are often ordered this way.

I realized that if I could learn this poem, I would also learn other useful aspects of the Japanese language and a glimpse of the culture as well. I wanted to hear it while I studied, and found answers via my new twitter friend Charelle Collett (@Charcol1900)

Here’s someone singing it in a child-like ABCs — no idea what the words on the right are, but this is the very clear to follow along and practice reading while hearing the characters pronounced:

and here’s Hatsune Miku (Vocal software) singing it:

This second one is really interesting since it also shows the evolution of early Japanese script into modern Hiragana and then shows some more variants — here’s some detail on the first three.

  1. Man’yōgana: an ancient writing system that employs Chinese characters to represent the Japanese language
  2. Chinese Cursive Script from which Hiragana evolved
  3. Modern Hiragana

Many museums, archives and libraries are exploring Linked Open Data to make their online collections more meaningful to researchers and to the public at large. From the Rijksmuseum in the Netherlands to the Cooper-Hewitt in New York, our cultural heritage can be explored in digital form with links to help us dive deeper. These aren’t just spiffed up websites, these are living representations of the physical collections with hooks for developers to build new applications that link back and let us look, listen and learn in new ways.

What if we could use this emerging foundation to allow researchers to publish links from one institution to another, connecting a letter in one archive to an artifact in another museum? People have been talking about these ideas since Vannevar Bush imagined the Memex the 1940s. Making it actually work requires a lot of disparate pieces: standard protocols, ontologies, digital representations of the physical works, and an audience with the digital tools for easy access.

I’ve been investigating a very small part of this challenge: connecting people through their things. Through interviewing researchers, archivists and museum collections managers, I’ve learned about traditional research techniques that suggest how an online system could work. Instead of linking as a post-process, the act of linking data could be part of the experience of researching and exploring online collections.

An illustration of what a page might look like for Doris Cochran, a scientist who worked at the National Museum of Natural History.  The page shows the specimens she collected, publications, papers, as well as "known associates" and "mentions" from other institutions.

A Social Network of Dead People?

What if we could pull all of this information together across collections in different organizations and present it in a unified way? Last year, working as a Presidential Innovation Fellow at the Smithsonian, I imagined what this might look like (illustrated above): a social network of historical figures linking archival documents to give us insights about history.

I quickly discovered the SNAC project (Social Networks and Archival Context) which started investigating this idea several years before me. SNAC takes structured data from EAC-CPF files, and connects people through annotations made by archivists all over the world who contribute their data to the project.

This published biographical data with links back to the source archives provides a valuable resource for researchers. SNAC name matching used automated techniques, leaving “maybeSameAs” connections where there is uncertainty:

While refining the computational techniques used continues, such techniques alone will always fall short. The most fundamental problem is identifying when similar names are for the same person or different persons. Even for human editors, identity resolution can be an exceptional challenge and sometimes cannot be reliably achieved due to insufficient or ambiguous evidence. — SNAC Research Use Notes

Identity Often Requires Research

While there may be many people named Russell Hatch, these boots only belonged to one of them.

Mrs. Sydney Blake travels to South America on a scientific expedition, but her colleague’s field notebook talks only about Doris. Archival research reveals that Doris Holmes Blake, wife of Sydney, was the travel companion.

Establishing an identity, based on a name is often an act of scholarship. Researchers explore the written record, piecing together history from different pieces of paper, photographs, or physical objects in museums

Even with clear records, facts can be disputed. There are errors in the archives that do not besmirch the disciplined care of the archivists. A birth certificate may show one date and a newspaper article a different one. One historian may assume the birth certificate is correct, until another finds a diary entry telling the funny story of a town hall clerk who got the date wrong. One researcher told me of his challenges recording biographical data from Canadian Civil War soldiers because some thought it funny to write Feb. 29th as their birthday. We know an individual was actually born at a specific date and time, but once that time is in the past, facts can become subjective.

People have always researched history by looking at the artifacts left behind. Even today, researchers travel from archive to library, often across the world, to piece together stories from letters, diaries and even notes in guestbooks at historic homes. They build a picture of what a life was like from prized possessions or everyday objects now housed in museums, from news sources, and from the stories written by friends and colleagues, both positive and slanderous.

Thousands of researchers create these kinds of links every day. They are footnotes in scholarly articles, books, and research papers. They are the tabs open in a grad students’ browser. Interviews with researcher suggest that if we could allow them to leave a trail, they would contribute to the world-wide store of knowledge.

A New Model of Publishing

We’re starting to see new patterns emerge online with new interfaces that allow people to take part in linking historical artifacts. Volunteers transcribing field notebooks link scientists, subjects and specimens using wiki-like markup at FromThePage. Jazz enthusiasts can read transcripts of conversations with musicians at LinkedJazz 52ndStreet to build a social network that with links back to the oral histories.

Emerging on the web, there’s a new model of publishing. Instead of links being created with markup and behind-the-scenes tools gated by a webmaster, connections can be added with simple text annotations or the touch of a button.

Search results for "Dr. Charles Hendrickson"  in the 1966 Field Notes of Laurence M. Klauber show image of the field notebook and text with links.
At BalboaParkOnline volunteers transcribe documents and add links.
Sarah Vaughan in the center with lines connecting to other jazz musicians.
Sarah Vaughan’s social graph at LinkedJazz created by volunteers annotating transcripts of oral histories.

The project I’m working on, Midas Innovation Toolkit, was developed in the open from day one. It started as a Presidential Innovation Fellows project, sponsored by the US Department of State.

Both the State Dept and Health and Human Services (HHS) are actively working to pilot the software within each agency to foster collaboration within different target communities. Developers at each agency are leveraging each other’s efforts by submitting changes (via pull requests) to a common shared codebase (hosted on github).

It’s exciting to see this cross-agency collaboration through open source. The software is designed to help agency employees collaborate across team boundaries, and it’s wonderful that we’re doing that with the software itself using the entirely different mechanisms of open source.

I’m relatively new to the project and still learning about it myself, but would welcome volunteer contributions — or feedback on how to make the project more welcoming to people who want to help. It uses Nodejs and Sails with Backbone on the front end, and we’ve just started writing some Chef recipes for automated deployment. There’s a lot of low-hanging fruit in the github issues list.

Would love to hear what you think about this project specifically or government open source in general!