Legal Technology News - E-Discovery and Compliance Blog

« ED Predictions for 2008 | Main | Aptara Sells Lit Support Division to Ivize »

January 15, 2008

EDD Showcase: Discovery Overload

Jason Baron's article, posted below, appeared in the January issue of Law Technology News.

Lawyers of a certain age — especially those of us who recall the joys of IBM selectric typewriters — must confront the fact that litigation today is a different animal, especially when it comes to search and retrieval issues posed by massive document productions.

The accelerating pace of change and the exploding increase in the volume of electronically stored information is forcing a re-examination of how lawyers approach their obligation to perform reasonable searches for relevant evidence. Simply put, how do you respond to a request for "any and all" relevant documents within the scope of a complaint, when a client has 1 billion files?

Whether e-mail and attachments, instant messages, web pages, databases, or data residing on backup tapes — the answer cannot be just to throw more resources into manual searching, because the EDD problem doesn't scale.

In the past, legal professionals would develop a list of keywords, which resulted in a set of hits, which in turn produced a manageable subset of documents to be manually reviewed for responsiveness and privilege by legions of junior and contract attorneys. Searching through 1% of 20 million e-mails, even if it takes six months and 25 individuals to review every e-mail and attachment, is doable. But searching through 1% of a billion e-mails presents an insurmountable problem. There simply isn't time in any litigation to manually review 10 million e-mails, even using current automated methods based on keywords to cull the larger universe down.

Indeed, this sheer volume of ESI changes the basic dynamics of civil discovery, and forms the baseline assumptions behind recent publication of The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery ( As the Sedona search commentary makes clear, traditional approaches to searching for relevant evidence are no longer practical or financially feasible. At its outset, the commentary notes three main problems: 1) the exponential growth in ESI; 2) ESI contains human language, with all of its inherent ambiguity; and 3) the known deficiencies of keyword searching in producing too many false positives (noise, junk), while missing much that may be considered relevant evidence.

Craig Ball, in "Unlocking Keywords: How You Frame Your Search Words Will Shape Your Success," LTN January 2007, notes some of the problems involved in using simple keywords, and the prospect of improving search results through various techniques such as eliminating noise words such as "law" and "legal." Depending on the type of dispute at issue, using simple keywords may suffice to find relevant evidence. However, potential limitations are profound:

• There may be dozens of ways to spell relevant terms. E.g., there are more than 250 variations on the word "tobacco" in the Master Settlement Agreement in the multi-state litigation involving marketing by seven tobacco companies (

• The use of optical character recognition can increase errors.

• Most English words contain ambiguity or are subject to multiple synonyms, as any reader of Roget's Thesaurus can tell you. No single person is smart enough to understand all the terms that he or she needs to input into a large data store to retrieve all or substantially all relevant documents in response to a given discovery request.

Recent research has confirmed that even relatively sophisticated Boolean searches, using keywords, "and," "or," and various types of proximity and wildcard operators net a much smaller proportion of relevant documents than most lawyers think they are retrieving.

The Text Retrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology and the U.S. Department of Defense, has established Legal Track, an international research project aimed at evaluating the effectiveness of various types of search methodologies when applied in a simulated EDD environment. Now in its second year, its research protocol involves four steps:

1. Creation of hypothetical sets of FRCP 34 requests to produce based on hypothetical complaints.

2. A 7 million document database, available to the public, as a test collection.

3. Searches proposed by computer scientists in academic and other institutions.

4. Human assessment by volunteers (lawyers and law students) of the pooled results of the automated searches, to determine whether each hit found by the automated search methods is responsive to the designated request to produce.

By the second year of the project, only 22% of the total number of relevant documents were found by means of the reference Boolean search! In other words, 78% of relevant documents were found only by using automated searches with different types of methodologies, based on concept searching and other more sophisticated statistical, ranked retrieval methods.

This result is in line with results of a 1985 landmark study by David Blair and M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, which showed that while lawyers thought they were retrieving about 75% of relevant documents for use in a particular litigation, the true results were more like 20%.

The TREC Legal Track affirms the need for all lawyers to think about what kind of production is being obtained when keywords alone are being used, without incorporation of a variety of other useful techniques.

The Sedona commentary contains eight practice pointers, with commentary, among these recommendations:

• Automated search methods should now be viewed as reasonable, valuable, and even necessary. Other practice points include emphasis on the need for a well-thought out process for using automated search methods.

• Due diligence is important when choosing particular information retrieval products or services from a vendor.

• Greater collaboration is needed among parties re: search protocols involving choices of keywords, concepts, or other types of search parameters.

The Sedona commentary also provides summary descriptions of alternative search methods and other information to help lawyers confront vendors to ask hard questions about how their software truly performs.

There is no magic answer, but consider thinking about this problem in new and different ways. It will help you secure a just, speedy, and inexpensive determination of your lawsuit.

Jason Baron is director of litigation for the Office of the General Counsel, National Archives and Records Administration, based in College Park, Md. Views expressed by the author are his alone, and do not necessarily represent the views of NARA or other government entities.


TrackBack URL for this entry:

Listed below are links to weblogs that reference EDD Showcase: Discovery Overload:


The comments to this entry are closed.

Sign Up for the E-Discovery and Compliance Newsletter

An Affiliate of the Network

From the Newswire

Sign up to receive Legal Blog Watch by email
View a Sample

Contact EDD Update

Subscribe to this blog's feed

RSS Feed: LTN Podcast

Monica Bay's Law Technology Now Podcasts are also available as an RSS feed.

Go to RSS Subscribe page

March 2013

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Blog Directory - Blogged