We all have old documents that we’d like to rescue, archive, and share. Letters that we or our parents or grandparents wrote before computers came along, or in the early days of computers when we were using some ancient word processing software. Grandma’s recipes with handwritten notes. Diaries and journals.
What does it take to rescue these documents? Mostly time. You don’t need to know a lot about computers.
Here are my notes about retrieving documents that are in old file formats, and scanning typewritten documents as text.
Recently I decided to dredge up some old letters I wrote to my mother when I was overseas for 10 years. In those days the filename had to be eight characters followed by a three-letter extension (which could be totally random so I used that to indicate a date). The first task was to locate the files. At least I’d had the foresight to keep moving from floppy disks to CDs to external drive, so I still have them. Next I opened them in WordPad, where I could see a lot of rubbish (scrambled characters) at the top of the file, interspersed with text. In some files the whole file was full of rubbish. I was able to copy and paste the text into a clean file and then save it as a Rich Text Format (RTF) file.
An RTF file can be read in most word processing software and will not become locked into a proprietary format – I don’t want to have to extract plain text from a mess of scrambled characters again ten years from now.
My next task was to convert a set of typewritten articles (written in the late 80s and early 90s) into electronic format. For some reason my ancient scanner (Canon’s Canoscan Lide 200) now refuses to perform the Optical Recognition (OCR) function. I think it’s something to do with Windows 7. I used to be able to use this function, though the results were not satisfactory – in many cases it took longer to clean up and format the output than to retype the document.
First I tried scanning the articles and saving as a PDF, then uploading to free conversion software (http://www.free-ocr.com/). However, you have to use CAPTCHA to verify that you’re not a spammer, and I find that after a few uses it becomes too difficult for me to identify the letters. I don’t know how anyone recognises the letters in the audio version of CAPTCHA, I never can.
Fortunately, the authors kindly recommend an alternative – Abbyy FineReader OCR software. I downloaded the trial version and began using it. This software is amazing! It detected my scanner, and allowed me to scan directly to the desired output format (RTF, ePub, PDF and others). It also allowed me to convert the files I’d already scanned, from an image (the PDFs I had saved were actually images of the text) to any of these formats. In some cases I had a copy of the article from a newspaper or magazine, and I saved the image file as well.
The output is very clean, and the editing functions are efficient and easy to learn. I found the software very intuitive. Needless to say, once my trial period expires I’ll be purchasing the software – the time it will save me easily justifies the cost.