News & Press: Affiliates in the News

Investigating Received Data: Creative Use of Data Analytics in eDiscovery (part 3 of 4)

Tuesday, June 6, 2017   (0 Comments)
Posted by: Mary Mack
Share |
Investigating Received Data: Creative Use of Data Analytics in eDiscovery (part 3 of 4)
By Ian Campbell and Michael Fischer

We have looked at collaboration for MDLs and using a group of analytics tools, to investigate a large disparate data set. We started with clustering, more-like-this with exemplar documents, near-duplicate identification and email threading. The beauty of analytics is that they can be combined and used many different ways, depending on the situation. In this chapter, we will start with keyword searches, the tried and true way of trying to locate relevant data to prove the case.

In this example, the defense reviewed the documents before producing, rather than delivering all non-privileged data with a clawback provision. Matthew, an associate overseeing discovery on this case has a better idea of what should be in the dataset. The assumption is that the documents are responsive to the discovery request. At the 26(f) Meet & Confer, Matthew and his team came to an agreement with opposing counsel on keywords. He figures those terms are a good place to begin the investigation.

More than Just Keywords
While keywords are a good place to start, just searching for those exact words only tells Matthew part of the story. There are many tools to augment keyword searches including dictionary lookups and known abbreviations. Matthew begins by identifying all the versions and misspellings of the key terms. Dictionary look up is a tool which people might not think of as analytics, but which is. This is a process that identifies all the iterations of a word, or close versions (misspellings) of that word, present in the dataset. This is not a pre-defined directory; instead, it is an inventory of terms that is created from the actual documents. This word list can be more helpful than just adding a wildcard at the end of a truncated term. Matthew uses it to decide which of the terms is relevant to his investigation, selects them, and adds them to his search. As the dataset includes a combination of documents from the US and the UK, he also runs a phonetic search. This will deliver leukaemia for leukemia or paediatric for pediatric. This type of tool is also used to identify key abbreviations for relevant topics, studies, and datasets in the early stage investigation of received productions. 

Once Matthew has identified the iterations of the target keywords and has a results list, he wants to understand the distribution of those documents. Metadata searches are a good tool for that. He can search for To, From, CC, Record Type, Document Size, or any other document information that the system admin made available. The “old school” way of doing metadata searches, such as an email To search, was to guess all the possible iterations of an email address for the target recipient’s name and search for those. As you can imagine, this method left lots of room for omissions. Today, tools such as faceted search show Matthew all the available options in any field and allows him to select those he believes are applicable to that particular search. This type of filtering tool is similar to the filtering tool found on shopping sites like Amazon, or travel sites like Expedia, which groups together items by similar metadata fields, and dynamically updates the results based on other searches. This eliminates the need to train Matthew and his team on how to filter search results. This might get him where he needs to begin batching documents for review. But more likely, there are loads of documents that simply are not going to help him understand or build his case.

Let the Machine do the Heavy Lifting
Plaintiff firms often spend more time reviewing unhelpful documents than helpful documents. Matthew needs assistance so his reviewers can focus first on the documents that are likely to be of use for his case. He decides to let the intelligent Predictive Coding (PC) system help him issue sort the data.

He begins by reviewing and coding a randomly generated set of documents. The PC system uses this set to understand what he is looking for and as the control set, against which it checks its understanding during each step of the iterative process.

During the verification process, the system delivers documents that it considers non-responsive along with responsive ones for his review. This helps the artificial intelligence algorithm differentiate between the two. Once he has completed the system training process, Matthew can assign documents to his

reviewers based on the issue codes the system identifies. This puts the most relevant documents in front of his team first. After reviewing the relevant documents, he can QC those marked non-responsive and be confident that the documents considered non-responsive by the system don’t include the types of information he is looking for. He has saved potentially months of review time and cost and gets to the useful information quicker. His successful use of Predictive Coding means that he was able to deliver the likely-relevant and useful information to the review team, to get a jump on planning the case strategy.

In the final installment of this four part series, we will look at using technology to help prepare for the 26(f) Meet & Confer.

About the Authors

Ian Campbell is the President and CEO of iCONECT Development LLC, which has been developing innovative eDiscovery review software since 1999. He is responsible for sales operations, business development, product lifecycle development, and partner relations. With more than 16 years of strategic product development in the litigation support field, Campbell has is a frequent industry spokesperson, sharing his experiences and expert commentary with audiences for the American Bar Association, LegalTech, ILTA, AIIM, IQPC, Marcus-Evans and other legal and management groups around the world.

Mike Fischer is Director of Information Services at Schlichter Bogard & Denton, LLP.  Mike has worked for nearly 10 years in Legal Technology and manages E-Discovery review projects for many large scale multi-party complex matters.  Schlichter, Bogard & Denton, LLP represents individuals harmed by corporate wrongdoing – and consistently prevails at trial. Their work has been repeatedly profiled in the media and recognized by judges; among other things, it has been called “pioneer[ing],” “tireless,” and “historic.”


What our customers say?

©2018 Association of Certified E-Discovery Specialists
All Rights Reserved