Key points
- OCR is Optical Character Recognition, and it has been around since the 1900’s.
- To search a document, it must be processed first.
- OCR processing takes time – it should be done before the document is needed, or at the time of scanning.
- All documents should be searchable, OCR capable, as you will be able to keyword search all documents, from paragraph numbers to words, and entire folders of PDFs.
OCR is Optical Character Recognition, and it goes with the other acronym PDF or Portable Document Format. Adobe developed PDF in the 1990s, and applications produce PDFs, such as acrobat reader or by exportation from other programs, for example, exporting a file from .doc to .pdf. This process involves many more acronyms, API, PCL, HPGL and DVI to name a few.
OCR is the mechanical or electronic conversion of images, and you may have guessed - it is a form of AI, it reads or translates images into text, and it has been around for a long time – since the early 1990's. You can see OCR operate in real time if you use the Google Translate app, you can point your phone/camera to a document (or menu), and it translates it in real time.
In the legal profession, the primary use of OCR is to enhance scans of documents, so they are searchable. If a scan is just an image, a non-searchable PDF, a program is used to convert the PDF to searchable; it means adding a second layer to the document which interprets the image – i.e. the text. OCR overlays the PDF, allowing lawyers to CTRL + F their way through the material.
In most instances, OCR is an automatic function of saving or exporting a document in .pdf. However, scanned images from a copier may not have OCR, which means they are not searchable. They are an image, much like a photocopy. These documents need to go through a process to make them searchable, for example, Abbyy Fine Reader, Acrobat Pro, Nuance and OmniPage Ultimate to name a few.
Converting documents to searchable is time-consuming and can take up to 30 mins for 100 – 200 pages. The shifting sands of litigation mean lawyers should always be prepared – and avoid the situation when you need a text enhanced document, and it has not been processed. Therefore, it is preferable to process documents when uploading to a system, whether this is automatic or a pre-set on your scanning device.
OCR is important to lawyers using digital documents, as digital documentation is a form of process innovation. Paper has been a successful form of media for a very long time. It allows readers to highlight, tab, organise, annotate. For a digital process to innovate, it should replicate and improve it further. Searching a document with key words is better than flicking through pages. In fact, with Adobe, you can search the metadata and multiple PDFs for search terms. Digital documents are a convenient way to store documents, however, when used properly, they can assist lawyers to find information quickly and easily – as well as allowing annotation, highlighting and exportation of text.
OCR is one of the best forms of AI available, as it has already processed the document for you. You will be free to search for the essential information required for your legal matter, rather than manually scanning the documents to find what you are looking for. Ideally, all documents would be processed in this way, but if not – invest in a good .pdf reader to enhance your scans, well ahead of time.