Using extrapolated precision for performance measurement

This article originally appeared at and has been republished here with the permission of its author, Bill Dimm.  This is a brief overview of my paper “Information Retrieval Performance Measurement Using Extrapolated Precision,” which I’ll be presenting on June 8th at the DESI VI workshop at ICAIL 2015. The paper provides a novel method […]

Robust metrics needed to counter impact of ‘data shift’ on predictive coding measures

Attorneys and judges often rely exclusively upon “precision” and “recall” thresholds for acceptance of dichotomous classification models in what is commonly referred to in the legal industry as “predictive coding.” Because these measures fail to provide a complete understanding of the proposed model’s characteristics and efficacy, this paper will argue that interested parties should go […]

The Single Seed Hypothesis

This article was originally published at Clustify Blog. It is republished here with the permission of its author, Bill Dimm.  This article shows that it is often possible to find the vast majority of the relevant documents in a collection by starting with a single relevant seed document and using continuous active learning (CAL). This […]

Major firm’s bid to reverse training video sanction airs deep-seated frustration over discovery gridlock

It has been said that a true test of one’s character is what he does when no one’s watching. That standard, according to some, applies with equal rigor to practitioners conducting discovery. “[T]he obligation of a lawyer is to act as if you are acting in court with a judicial officer present,” Christopher Duggan, representing the American […]

Can you really compete in TREC retroactively?

This post was originally published at It is republished here with the permission of its author, Bill Dimm.  I recently encountered a marketing piece where a vendor claimed that their tests showed their predictive coding software demonstrated favorable performance compared to the software tested in the 2009 TREC Legal Track for Topic 207 (finding […]

With rare exception, Hague Convention proves powerless to pierce Chinese data wall

Attorneys looking to the Hague Convention as a method of extracting evidence from China should consider three little words of advice from Dan Harris, whose practice focuses on organizations working in the Middle Kingdom: “Expect to fail.” As the world shrinks and litigation increasingly crosses borders, practitioners in the US are confronting strict foreign laws that […]

Why math matters: Random sampling for binomial classification of documents

A recent post by Craig Ball led me to an interesting classification opinion in an antitrust case. Essentially, the parties were working out classification protocols and asked for the court’s input on one issue — namely, what to do if the Boolean keyword search was unproductive or not productive enough. The plaintiffs wanted a “random” […]

Tension between client confidentiality, public disclosure stifling law firm cyber-breach reporting

As cyberattacks on law firms increasingly take on an air of inevitability, though their accounts are largely anecdotal, new questions center on how to respond to breaches of sensitive materials and how to responsibly disclose these incidents without jeopardizing client relationships, and running afoul of professional codes. Now, amid the release of high-profile reports that take […]


Federal Judicial Center granting judges’ request for e-discovery and computer science education

With electronically stored information now prominent in almost every major civil lawsuit, federal judges are starting to receive government-sanctioned training on emerging e-discovery issues and advanced technologies. Such topics, which also include cybercrime and surveillance, are often challenging for non-specialists to understand — and are increasingly the subject of legal investigations and disputes. Officials at […]