Comments on: tesseract html page with text overlay /2013/07/tesseract-html-page-with-text-overlay/ Sarah Allen's reflections on internet software and other topics Mon, 15 Jul 2013 16:06:43 +0000 hourly 1 https://wordpress.org/?v=5.7.1 By: Matt Christy /2013/07/tesseract-html-page-with-text-overlay/#comment-986 Mon, 15 Jul 2013 16:06:43 +0000 /?p=4050#comment-986 If you made this into pop-ups for each word instead of an overlay, it could be useful for some difficult to read fonts like black letter. Of course, that assumes we can get good OCR results for black letter — eMOP is working on that.

I could imagine some DH projects that might find this a useful way to show page images and have transcriptions available in a convenient way on the same page. Proyecto Cervantes is one. Maybe if done by line or paragraph instead of by word.

Just thinking out loud.

]]>
By: Antoine /2013/07/tesseract-html-page-with-text-overlay/#comment-985 Mon, 15 Jul 2013 07:07:18 +0000 /?p=4050#comment-985 Most useful. I had to make two slight changes:
-the class for OCR words was named ocr_word, not ocrx_word, on my version of Tesseract for Ubuntu 12.04.
-I had to add to the generated CSS for each word the fragment “position: absolute;”. I didn’t have as much bandwidth as you did, adding a CSS style to set it globally in head. I might try that.

I used opacity on the image to get the words to show more clearly. Setting it to 0.4 was the right setting.

]]>
By: Matt Christy /2013/07/tesseract-html-page-with-text-overlay/#comment-984 Sun, 14 Jul 2013 18:27:24 +0000 /?p=4050#comment-984 Very nice tool. Great idea. Definitely going to try it out. Thanks.

]]>