Keyword searches are still the preferred method of the legal profession to locate electronically stored information (ESI) in an investigation. But, while it is still the predominant methodology used by attorneys, it has significant drawbacks.
One academic study conducted by Blair and Maron demonstrated that keyword searches can miss up to 80 per cent of relevant evidence in a case. The 2009 TREC Legal Track study demonstrated that keyword searches have a recall ratio of less than four per cent. In other words, keyword searches missed up to 96 per cent of relevant documents.
Legal professionals dealing with large volumes of ESI need to consider other tools to locate relevant evidence. The most promising tool available to locate ESI is predictive coding.
How Predictive Coding Works
Predictive coding involves a partnership between humans and technology. An expert on the case will code a set of documents as responsive or non-responsive. The technology will learn from the human expert and will develop an implicit set of rules to apply to the document set.
The technology uses words and phrases from the responsive and non-responsive documents to develop models that can be used to predict which documents in the next random set are likely to be classified by the expert as responsive. The expert will continue to interact with the technology until the technology learns to appropriately code documents as responsive or non-responsive.
The use of predictive coding allows for millions of documents to be reviewed in a much shorter period of time compared to pure human review of all of the documents or keyword searches.
Who is Involved
Predictive coding is not the case of technology replacing humans. The armies of lawyers and paralegals used in the past to review ESI will be replaced with educated e-discovery professionals and subject matter experts.
- The subject matter expert will review documents for responsiveness and non-responsiveness.
- E-discovery counsel will work to cull down the ESI to a manageable amount by identifying the timeframe for the ESI and the key custodians to the litigation.
- E-discovery counsel will also negotiate quality control measures with opposing counsel, educate opposing counsel regarding predictive coding if counsel is unfamiliar with the technology, and educate the court regarding the technology and methodology in the event there is a conflict regarding the predictive coding protocol.
Key Advantages
Predictive coding has a number of key advantages over keyword searches or pure human review.
- Human review of every document in a data set can be incredibly expensive when there are millions of documents at issue in the case.
- Human reviewers are prone to commit error when reviewing thousands of documents on a daily basis.
- At best, human review may uncover 60 per cent of documents relevant to the litigation. Keyword searches may uncover only 20-25 per cent of the relevant documents. In contrast, a properly designed predictive coding protocol can uncover 70 per cent or more of the relevant documents at only a fraction of the cost of human review or keyword searches.
- Predictive coding allows counsel to learn the facts of the case at a much earlier stage. Rather than spending months culling documents to discover the relevant facts, predictive coding can empower counsel to learn the facts of the case in a matter of days or weeks. Thus, predictive coding can reduce litigation costs while allowing for an earlier case assessment.
Because of limitations and drawbacks of keyword searches and human review, a lawyer dealing with an ESI intensive case needs to consider employing predictive coding.