Nikhil Deshpande (@nikofthehill) is the Director of GeorgiaGov Interactive a group within the Georgia state government that provides services to agencies and other government organizations. Today at CapitalCamp he talked about why they chose Drupal for georgia.gov and how they approached the transition.

Georgia.gov, as the website for the Georgia state government, used to be simply a landing page with links. They wanted it to be a front door to wherever you wanted to go in the state government, but people were coming there and falling off. They had a fragmented experience, a fragmented brand. They were running many different CMS’s. They had both platform issues and a design challenge.

Their users don’t look like typical personas. They are trying to be a website for everyone, to serve the whole population. How do they do this? Here are my rough notes on this intriguing talk.

Platform Choice

Since they were running a lot of CMS’s, they knew exactly what they needed:

  1. to have an enterprise element, they were hosting 60-70 websites
  2. cost effective (not saying cheap) — worthwhile, but doesn’t cost as much as others
  3. simple and usable
  4. strong presence in the public sector

Nikhil Deshpande stated that Drupal is the second largest CMS worldwide, 2.1% of the web. [note: w3techs reports Drupal as #3 at 2%, which is still pretty huge, but perhaps Joomla has grown in the past few years.]

Open Source!

Nikhail saw this as opportunity to answer valid questions about open source solutions, since they were moving from mostly proprietary systems.

  • security? high threshold of security since it has so many eyes on it
  • free? how good can it be if it is free? It’s not free. It’s free as in free speech, not free beer.
  • total cost of ownership? you need to implement it and host it

Who uses it?  Huge number of government sites, including go.usa.gov, USDA, NASA, Dept of Justice, USAID, and many more.

You can either implement it yourself or get someone to implement it for you. They sent out a bid and chose Phase 2 as a partner. Then Phase 2 brought in Acquia (started by the founder of Drupal) and mediacurrent. They decided to use OpenPublic platform, which is built very specifically for the government. It’s a distribution of Drupal that you can customize or use out-of-the-box.

  • tailored to the needs of govt
  • security
  • accessibility
  • workflow

Moving from Vignette to Drupal

Content: Oracle to MySQL
Look and feel — not just a migration, decided to do a re-design.
Single code-base, multiple databases
Cloud hosting — all public info, no sensitive data
Search
Mobile ready – 20% of traffic overall (and climbing), some websites up to 45%

Design

Internal team designed the main site, then Phase 2 did the rest of the websites
did heat maps — search 31% clicks, child support 13%, headlines 3.6%

Made it very search centric. They got a Google Search Applicance and indexed all of the Georgia government websites.

55 redesigns?

Nope…

  • template based
  • styletiles as a design methodology
  • demo websites for agencies based on the styletiles

4 main themes — agencies could choose: patriotic, friendly, official, classic

No one really likes change, but if you communicate well it can go smoothly.

56 sites, 8 batches, 150 content managers
Oct 2011 – Sept 2012
on time & budget
Best of Web — 2012 Innovation Award
99.98% Uptime
Savings 4.7MM in 5 years

Takeaways

  • define success
  • communicate & involve (120 people who were trained at content managers, decision-makers, who signed off on the look of the website)
  • carefully select implementation partner, but also build a strong internal team

No one wants to be on the receiving end of a change. It is important to communicate “things are going to be hard for you, but this is what we are doing to make it work.”

The following is based on an interview with Ben Brumfield, after which I did a bit of research myself, adding links and some additional references.

There are 6 general areas from which people are doing transcription.

  1. Investigative Journalism: crowdsourcing information for citizen investigation. The idea is to get a whole lot of people to flag and inspect information. People type up the information on scanned documents (e.g. receipts, tax returns), transcribing what they see. Then write a total of receipt on a separate page. Basic transcription + high level purpose-built extraction. Gather the data to use it for something. Volunteers are politically motivated.
  2. Bio informatics The archetypical artifact is a plant specimen with a bunch of labels on them — structured data. They are presenting users with a full image, asking people to extract information. No free form text entry. They tend to be more sophisticated in how they represent documents. “Even if you see an error, type what you see.” or select from a menu. Their volunteers are smaller group more informed group, who want to help science and/or love the subject themselves. Participate even if not active physically. Motivated by the immersive nature of transcription — living in that document, you are really “there.”
  3. Library and Archive world – scanned letters, papers and diaries. Much more immersive with a narrative flow, which keeps people coming back. Two goals: 1) improve databases (i.e. finding aids) Plain text transcript can be put into Solr DB. 2) improves findability, will be crawled by Google, & people who don’t know about the material or institution. They are also looking at the outreach perspective. Connect with potential patrons/users. Not just labor, but a service – let your users engage more deeply with collections. Tighter connection = advocates.
    • National Archives launched a transcription pilot project. All of their materials currently online are completed, but they point people to WikiSource where they have a list of images from the National Archives queued up to work on.
    • University of Nebraska Lincoln – launched campaign to transcribe alumni yearbooks.
    • DIYHistory
  4. Literature ScholarsTextGrid, system that was built where all transcription is done off-line in Eclipse, then full transcripts are contributed — this is also a way of giving what scholars already have. Sounds good, but tapping into the scholarly workflow has challenges. For scholars, transcripts are a middle work. They don’t get an incentive for contributing this effort. They typically want everything that has their name on it to be properly cited and looking good. Where that DOES work to some degree is Genealogy (see below). They’ve been doing it for longer and are ok with sharing with a somewhat broader audience. Worry about plagiarism, non commercial use, etc. Not trying to write a book like other scholars.
  5. Geneology – Generally volunteers will first focus on transcribing material about their own ancestors, then stuff about where their ancestors are from, next they move on to generally useful material like the 1940s census, which leads them to communities of like-minded folks. If you offered them the opportunity to transcribe stuff from their own families, they would leap on it. They are working with tabular records: ship records, shareholder lists.
  6. Literary and Historical – A lot of people at the conference seemed to be talking about how to add rich set of markup to indicate things like strikeouts, changes in handwritings, personal names, place names. Taking this approach generally yields small, but quite dedicated, communities of users — their data model is XML, usually TEI XML. They are embedding the information in the document. The majority of the tools that exist are all for that community. FromThePage is in this camp — you can use a lighter wiki markup for proper names, creating an automatic index and cross-linking between pages.

I signed up to try out FromThePage as “ultrasaurus” a couple of weeks ago. It’s creator Ben Brumfield (@benwbrum) recognized me from RailsBridge and the NPR story and reached out via Twitter. Ben’s blog, Collaborative Manuscript Transcription is a wealth of information about crowd-sourced transcription.

Jason Shen and I were able to connect on the phone while Ben was in Austin TX at Social Digital Scholarly Editing conference last week. Ben kindly gave us an overview of the landscape of crowd-source transcription projects and the open source software this is behind a few of them. We also got a glimpse of how he got started in this fascinating corner of next generation web tech and why he quit his day job to work on crowd-sourced transcription solutions full-time.

FromThePage started as a family history hobby. As a software developer, he was able to create a web site, originally based on MediaWiki, later moved to Ruby on Rails, to allow other people to help with transcription. Working with his great-great-grandmother’s diaries, he saw how a bunch of people could really do research on a topic together. He was inspired by wikipedia, by the idea of getting a community together — not just to comment, but actually edit, beyond the abilities of a single person. With a wiki format, someone who was a good typist could type it all up, another with special knowledge could make corrections, etc.

Wikipedia used to feature more prominently “what links here” — this is an index. He wanted to use this to figure out a way to link portions of text with other places those subjects are mentioned. He very quickly found mediawiki is not the tool — the difference between text and articles about the text were not clear.

One of the challenges is deciding how to handle the material, what are the guidelines for transcription: encoding abbreviations and incorrect spellings, etc.? There very detailed, technical solutions like TEI, an XML format, and less formal text markup. He started reviewing other systems, posting on his blog, speaking on the topic for the last two years. A number of organizations were interested in using FromThePage. Some people pay, some can’t — it’s open source. After 15 months of doing this full-time, he has worked on enhancements on FromThePage, a new transcription tool for structured data, and other sites built with different tech, based on the needs of the content and the community.

Ben’s personal mission is to transform the landscape of what amateurs do with their own material. Right now, if you are someone who has a lot of old historic papers or diaries. what you do is sit down and write a book about it — and you probably write a crappy book. Ben would like these folks to provide their materials in a way that other people can use it.