A Considered Response from Gordon Cormack
Thursday, August 11, 2016
Posted by: Jason Krause
NOTE: The following letter is from Gordon Cormack, Professor with the School of Computer Science at the University of Waterloo in Ontario, Canada. Gordon is an information retrieval expert in the area of technology-assisted review in litigation, including influential works co-authored by Maura Grossman, a fellow researcher at the University of Waterloo. He wrote in response to yesterday's ACEDS webinar, How Automation is Revolutionizing E-Discovery, which you can watch here.
You can read Bill Speros' response here and Bill Dimm's reaction here.
Maura and I attended today’s ACEDS webinar entitled “Faster, Better, Cheaper: How Automation is Revolutionizing eDiscovery.” During the webinar presented by Bill Speros, Bill Dimm, and Doug Austin, we observed that the panel chose to rely on a selective reading of our five-year-old work in the Richmond Journal of Law and Technology (JOLT), which was incorrectly referred to as a “white paper.”
Overall, we believe that the webinar presented the false impression that we, and the courts, are resting on our laurels and that no legitimate empirical work has been done with respect to TAR. Maura and I have been—and continue to be—driven to advance the state of the art, to evaluate the technology, and to demonstrate its applicability to the real problems of eDiscovery and related fields.
In the webinar, Speros chose to elide the essence of the definition of TAR from both our 2011 JOLT and 2013 Federal Courts Law Review definitions: “the computer codes the remaining documents in the collection for responsiveness (or privilege” [JOLT article, at page 4], and “extrapolates those judgments to the remaining Document Collection” [Grossman-Cormack FCLR Glossary, at page 32].
Dimm, on the other hand, failed to note that the “UW”method tested at TREC 2009 was in fact Continuous Active Learning (CAL), later dubbed “TAR 2.0” by Catalyst and others. It is my work with Maura [at SIGIR 2014; see http://dx.doi.org/10.1145/2600428.2609601] that demonstrated CAL to be superior to other TAR methods.
No research since our 2011 JOLT study has contradicted the results presented therein: that CAL can achieve comparable recall to exhaustive manual review, and vastly higher precision, with orders of magnitude less coding effort. To the contrary, studies by ourselves, and the work reported by Dimm, Webber, and others, are all consistent with our original findings.
Contrary to what was stated on the webinar, the Enron dataset used for TREC 2009, along with the relevance assessments we used in our CAL study, and an implementation of CAL, are available here: http://cormack.uwaterloo.ca/tar-toolkit/. They were never withdrawn, and they can easily be downloaded to a commodity laptop computer.
Through the TREC Total Recall Track (trec-total-recall.org), founded by us in 2015, we have created many more datasets—and a new evaluation platform—that can be used by anybody to assess TAR methods, either as a TREC participant, or on their own.
Maura and I have written at length about the problem of using arbitrary recall thresholds as a measure of success [see, e.g., Grossman M.R. and Cormack G.V., Comments on "The Implications of Rule 26(g) on the Use of Technology-Assisted Review.", 2014 Fed. Cts. L. Rev. 8 (July 2014).] We have published studies indicating that CAL achieves high recall on various facets or subcategories of relevance [see, e.g., Cormack G.V. and Grossman M.R., Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review, SIGIR 2015], and that CAL can reliably achieve very high recall, and also high levels of completeness by other, more nuanced measures, without requiring seed sets or validation samples [see, e.g., Cormack G.V. and Grossman M.R., Engineering Quality and Reliability in Technology-Assisted Review, SIGIR 2016].
Our newest work (to be presented at CIKM 2016) shows how to extend CAL to the situation in which not all responsive documents need to be reviewed [see Cormack G.V. and Grossman M.R., Scalability of Continuous Active Learning for Reliable High-Recall Text Classification, CIKM 2016], work that provides an alternative solution to the problem addressed by Dimm in his discussion of “TAR 3.0.” Our results are derived from publicly available datasets; indeed, we invite Mr. Dimm to present comparative results for “TAR 3.0” derived from the same datasets.
Maura and I believe that it would be much more constructive for your membership if ACEDS were to concentrate on disseminating these new results, rather than re-litigating real or perceived limitations of our 2011 JOLT study, or dwelling on the fact that the term “TAR” has been so overused in the marketplace as to dilute its meaning considerably. While we do not disagree with the latter, we question how dwelling on that fact moves the ball forward?
To be clear, we stand by the methodology and results of our JOLT study. It correctly used Bonferroni correction to capture the fact that we tested only the best (two of eleven) methods at TREC. Our paper correctly pointed out that “not all technology-assisted reviews (and not all manual reviews) are created equal.” It correctly observed that continuous active learning, along with a rule-based method, yielded superior results, at vastly lower cost. It was silent on the efficacy of other methods that were subsequently branded as “TAR” in the marketplace.
Remarkably absent from this polemic panel was any constructive suggestion as to how to proceed: How should one conduct review? How should one evaluate methods and tools? What research could reasonably be conducted to advance the legal industry’s understanding of the issues?
Gordon V. Cormack