Document Management System with OCR

These oldies can be made understandable for Folderit’s OCR content search after scanning

Folderit document management system has added a new utility: OCR. OCR stands for optical character recognition and is a key part of converting pages of scanned text to documents that computers can read.

OCR (Optical Character Recognition) 

OCR has a broad variety of potential uses. These include scanning in old manuals or text narration and being able to update them without retyping the entire document. This can save hours of human investment in updating handbooks, textbooks, and similar documents. With the right settings, and computer “training”, advanced OCR can even decipher hand-written script.

One of the original applications of OCR is to allow machine reading of text for the visually impaired. The idea was for text to be scanned into a document, then for a mechanical voice to read it aloud. It was further adapted to transcribe text documents into braille – and braille into text.

Uses for OCR

The use of OCR enables keyword searching within scanned documents, making locating information in them less dependent on tags, metadata, and labeling. It also makes minutia that might not be mentioned in tags or metadata locatable.

OCR can be used for the following:

  • Converting text to speech
  • Converting PDF (portable document format) to word processor files for editing
  • To edit and fill out PDF forms
  • Edit the size of PDF files
  • Mark and annotate PDF files
  • Extract, rotate and cut pdf pages
  • Create digital signatures
  • To add bookmarks and hyperlinks to pdf documents

OCR Added to Folderit DMS

If you’ve used Folderit, the most user-friendly document management system in the world, you know what an asset this cloud document management system can be in terms of saving time, improving communication and cooperative project efficiency. Adding OCR-based content search document management software further increases the efficiency of an already beneficial business DMS.

A Little History of OCR

Like many computer systems, OCR or optical character recognition is not precisely new. In fact, the first OCR machine was invented in1928 by Gustav Tauschek of Vienna, and a similar machine was invented in 1931 by Paul Handel of GE. Both machines used photocell light recognition to “read” printed material. In 1949, RCA laboratories created a machine that could read printed text aloud – creating a new means of literacy for the visually impaired. David Shepherd used the technology to create machine readable printed information for the U.S. Military. Lawrence Robert created the means for machine reading of multiple simple fonts. Reader’s Digest and RCA developed the first commercial OCR in 1960. In 1974, the Kurzweil reading machine combined a flatbed scanner and a speech synthesizer to create a machine that would read printed pages aloud. The result was later purchased and marketed by Xerox as Scansoft. The next development was the PDA – personal digital assistant – a handheld device that could read handwritten letters on a touchable screen. Granted, the letters had to be block letters written in a specific way but it was another step toward machine readable handwriting.

OCR and the U.S. Postal Service

How is OCR used? Not just as a verbal reading machine. Perhaps you might be old enough to remember when zip codes became standard on mailing addresses. School children learned to write addresses in a set format, and students in typing classes learned where and how to type addresses on envelopes. Zip codes were started in 1943 and were standard by the mid-1960s. The numbers help make the millions of pieces of mail that are sent through the mail system readable by machines, making them easier to sort, so that human workers are saved hours of painstaking work.

Five numbers at the end of an address are much easier for a machine to read than the complex combinations of letters that make up city names, street addresses and personal names, but even those can be read using computers – provided that they are written in block letters and carefully spaced. If you have taken a machine-gradable exam, you’ve probably filled out the name and address on an exam form, printing information in carefully spaced squares. If you’ve ever wondered why students are taught to print legibly, and why beautiful copperplate cursive – once known as longhand – has largely disappeared, this is part of the reason why. The flowing shapes of the connected letters that make up the cursive alphabet are far harder for machines to read.

Using OCR

When documents are involved – pages of handwritten or hand printed notes, typewritten manuscripts or similar materials – the pages are usually scanned into a computer and a PDF (portable document format), then converted into a Word file that can be edited in a word processor. The two files are displayed side by side, and errors are then corrected by hand. Grammar and a spell checker can be helpful in flagging out words and sentences that don’t make sense, but as most computer users know, these automatic programs are not perfect, and can – without human intervention – sometimes create nonsense of a well-organized sentence.

OCR and Hand-Written Material

When the material is written in longhand or cursive, it is much more difficult for a computer to separate out the letters. To solve this problem, there is ICR, intelligent character recognition, a newer generation of OCR. This is an important development, because while you can train some humans to print neatly in little boxes, the process is less successful with other humans. Large government organizations that receive applications for assistance frequently have problems with documents that are filled out incorrectly or incompletely. When handwritten entries are less than legible, it can cause applications that might otherwise be acceptable to be thrown out of the system. ICR can relieve the hundreds of human hours needed for processing by being able to decode a greater portion of entries.

Advances in OCR

The newest generation of this type of software is Intelligent Word Recognition. Rather than deciphering each individual letter, this software works by recognizing whole words. Like voice to text programs, OCR, ICR and IWR, are trainable. This means that the longer you use a program on your computer to translate documents, the better job it will do. The computer “remembers” the fonts, your printing or your handwriting, and can even improve on picking up writing by clients or customers. The technology isn’t perfect, but it keeps improving with every generation.

There are a variety of OCR, ICR and IWR programs on the market. Some of them are available for free, while others can be quite pricey.

A Few Programs for OCR

One Note: If you are already using Microsoft Office Suite that includes One Note, you have an OCR program at your fingertips. To use One Note’s OCR feature, you can use your phone or tablet to take a picture of the page in question. Import the picture into One Note, and you will then be able to copy text from the pictures and paste it onto a Word page – which will allow you to edit the text. One Note, however, is far from perfect and will probably present the user with a text which needs a lot of editing.

Google Keep: This is another software program that does OCR. If you are a regular user, this program will also allow you to copy the words in a picture, paste them into a document, and edit the document. Reviews indicated that Google Keep does a better job than One Note, but some editing will still be required.

Tesseract: Another open source OCR program that supports more than 100 languages – straight out of the box. It has versions that are operable on Linux, Windows and Mac systems, and can operate with Google. In addition to supporting more than 100 languages, it can be taught new languages, as well. In addition, it will not only read text that’s presented from left to right, it will also read those that are written right to left, such as Arabic.

Back to Folderit and OCR

Chances are, however, that when you want character recognition software, you want something that will work with all your documents software. That’s where Folderit comes in because we can help you integrate OCR with your existing Folderit applications.

Where and how can Folderit OCR save you time and money?

Let’s look back over the applications for OCR, ICR and IWR.

Using OCR in Your Organization

One of the biggest time consumers for a large, busy office is deciphering hand-written notes. Human hands come is all sorts of sizes and levels of dexterity. This means that handwriting – even when it’s careful printing – comes in a variety of shapes and sizes. Even though elementary schools across the U.S. have endeavored to teach uniform character shapes, there are still variations in the way people write. Handwriting is influenced by eyesight and hand/eye coordination, as well. It is also affected by fatigue, mood, and the amount of handwriting practice that the writer has had. Receipts, forms, notes, and more often have handwritten entries.

An Example of OCR Use

For example, a busy medical office requires patients to fill out forms that include an entry that asks why the patient is in their office that day. While check boxes can eliminate a lot of things, they cannot completely accommodate all entries.

Once written, these documents must be filed in a way that they can become a part of the patient’s permanent record. With more and more medical facilities creating online access where the patient can check their own records, entering the data from these records can become quite a chore. Even though the machine translation of the notes needs to have a human editor, a great deal of time can be saved if even a portion of the record can be machine readable. This is also true for college applications, applications for assistance and insurance forms – as well as many other sorts of social or business applications. The principle can even be applied to standardized skills testing.

Document Management and OCR

Thanks to the many advantages offered by having a DMS (document management system) that incorporates an OCR, we determined that adding an OCR to Folderit would provide an advantage for our Folderit DMS clients. Since Folderit already assists users with the storage of documents and provides ways to easily locate data as needed, we determined that adding an OCR to our DMS could only increase its efficiency and usability.

Here are some of the ways that might work:

  • Folderit provides tracking of sign-offs on documents. This process can be made easier with digital signature options – no need to rescan a document just to record the signature.
  • Folderit allows for easy sorting of your documents by placing them in dedicated folders
    • Metadata and tags make locating information easier.
    • OCR can allow searching within a document, even if it’s a PDF.

By combining a cloud DMS with an OCR, your business processes can be streamlined and your efficiency increased. You can scan in documents, such as receipts, applications, and invoices, making it easy to share them with associates in your organization.

Going Paperless with OCR

A cloud DMS combined with OCR also contributes to your ability to become a paperless office. As information is shared across your organization via email, shared folders and by using OCR, you diminish the need for paper and ink – two recurring expenses for many modern offices.

When paper documents must be used, they can be scanned, sorted and stored in folders in your DMS. When they are stored as PDFs, the chances for corruption of the files are diminished. However, by using OCR, you increase the usability of the files because they can more easily be converted into Word documents for editing. You also reduce the number of human hours that must be employed in re-typing to edit materials that are frequently reused, such as employee handbooks, instruction manuals, and even receipts that need to be updated periodically.

Although many business materials remain the same from year to year, technology has brought changes to many operations from such basic institutions as daycares and preschools, through the higher academic settings and on into the business world. Keeping up with information and with information skills is a never-ending task that now moves like the wind as improved communications and the ability to share information steadily improves. OCR helps keep your business up to speed with world changes and enables you to take advantage of the amazing technology that’s now at our fingertips.

A Review of Document Management Systems

Document management systems have come a long way from the carefully stacked clay tablets employed by the Sumerians, or even the more portable papyrus, parchment or vellum used later. Digital media and cloud storage systems allow businesses to do away with the ubiquitous corner filing cabinet, and to even reduce the size and power of an access computer. In fact, some businesses have gone so far as to encourage such portable mediums as laptops, tablets and mobile phones as their preferred workstations. This allows downsizing of physical office settings, and lets workers take part in their business tasks from their homes or even from vacation settings.

The Nature of Human Information

With that said, the nature of the information has not greatly changed. The clay tablets of Sumeria, while including the amazing story of Gilgamesh, focused primarily on tallies of grain and cattle. Later records from the Romans included how many measures of grain were given to citizens, and the rations that were accorded soldiers. Even Julius Caesar, who spent his winters writing books that would be read by the Roman citizens (encouraging them to support Caesar’s campaigns) was often engaged in recording the minutia of the many aspects of keeping an army well fed and happy.

OCR Helps Users Find Information

OCR document management systems not only make keeping up with the minutia easier, because many of the documents can be electronic, it also makes it easier to collect, store, and access documents in a meaningful way. From simple receipts that are printed by a cash register and automatically fed into the system to scholarly journals and treatises that are essential to further technological development, digital document storage makes it easier to keep the information close at hand without requiring elephants or Percherons to move them; and without requiring several large rooms in which to store them.

Space, the Final Frontier

In many ways, as a society, we can all say, “Space, the final frontier” whether we are Star Trek fans or not, because square footage for human activities of all kinds is beginning to be at a premium. Cloud document management systems decentralize storage of your business records and other print materials, keeping them secure from local weather and disaster events. Fire and flood have long been the enemies of record keeping. Paper and parchment burn, clay can dissolve in water. Even tales etched in stone can be worn away by wind and weather.

Information Storage Security

That’s not to say that digital storage is always more permanent. It is subject to the vagaries of weather, local vandalism, and similar events, and history teaches us that it can be made inaccessible through updates. However, the modern cloud storage model includes – at least here at Folderit –  triple backups of your information to minimize the chance for loss of your material. PDF, a format that has been around for some time, helps keep your materials in an accessible format that can survive updates and data transfers far better than simple word or spreadsheet files. Unfortunately, PDFs are often difficult to edit, even just to add information.

OCR for Stability and Accessibility

That’s where the OCR comes in. Because OCR can convert those PDF files into a form that can be edited with relative ease, your valuable information becomes even more valuable as you are able to both keep it as it was, and to generate new versions of it as you go along.

Document Version Management

Folderit is excellent at managing versions of documents – you might say it is one of our specialties. With our DMS, you can easily keep an original version of a document or form while using OCR to edit the material to produce a new object that will accept added information or that can be updated to accommodate progress that is being made by your business, school or organization.

In our fast-paced world with its ocean of information, one of the concerns is losing older versions of information, of re-writing history as it were. By keeping your older version secure, and working on a copy created using OCR, you can develop a record of how processes and procedures have changed and perhaps even evaluate which version is more efficient at driving your business toward success.

Folderit and OCR for You

That is why we added OCR to our document management system. As always, Folderit is looking out for our clients and exercising due diligence in increasing the options that we can offer to improve your efficiency. We believe that by adding OCR capability to our services, we will be better able to support what you do. We are deeply aware that when your business or organization benefits from our services, that everyone benefits – your company, our company, and your customers and clients. While OCR is not yet perfect, it is improving every year, along with our understanding of how and when to apply it. By incorporating it into our system, we can be a part of that steady improvement of being able to keep up with necessary information quickly, efficiently and accurately.

There is scarcely an aspect of our modern lives that doesn’t involve record keeping and tracking information. Births, deaths, medical records, school transcripts, work records, and retirement funds – all are part of record keeping. It is our goal to make tracking records, no matter what you do, just a little bit easier. Perhaps our future generations will not have to resort to trying to decipher the lives of their ancestors from drawings on cave walls or lines scraped in clay.