Reconsidering Dr. Cormack’s “Considered Response”
Wednesday, August 17, 2016
NOTE: The following letter is from Bill Speros who was a panelist on ACEDS' webinar How Automation is Revolutionizing E-Discovery. Bill is an influential and experienced e-discovery consultant and attorney who has logged tens of thousands of hours in high-stakes discovery disputes. He responds to Gordon Cormack's comments following our webinar, published here.
J. William Speros, Esq. - firstname.lastname@example.org
Last week ACEDS’ Executive Director, Mary Mack, moderated a webinar presented by Doug Austin, Bill Dimm and Bill Speros entitled, Faster, Better, Cheaper: How Automation is Revolutionizing eDiscovery.
The next day ACEDS published Dr. Gordon Cormack’s response asserting, among other things, that the 60 minute long webinar was based upon “selective reading” and “presented false impressions.”
Here I speak to those concerns and emphasize the webinar’s proposals as to how the profession can move forward because, however well informed and intelligent Dr. Cormack is, his perspective with respect to the webinar, at least, is different than mine.
All panelists were gratified to learn from the webinar host that nearly all 120+ attendees remained throughout our webinar. Happily, 95% of the attendees who rated the session scored it as either “Good” (24%), “Very Good” (38%), or “Excellent” (33%).
Generally, I stand behind the webinar’s substance with two exceptions. The webinar mistakenly referred to the Richmond Journal of Law and Technology (“JOLT”) law review article as a “white paper.” Instead, of course, it is an “article” or “report”.
In addition, in responding to an attendee’s question the webinar panelists were uncertain as to whether the original data set and relevant topical codes used for TREC 2009 remain available online. Regardless of whether those data were ever available to the public from TREC, Dr. Cormack noted they are available via his own site.
I appreciate Dr. Cormack’s noting these inadequacies and agree with his corrections.
Considering the webinar’s scope
Dr. Cormack listed several of his papers, including one yet to be published, and observed that it would have been “much more constructive [for the webinar]…to concentrate on disseminating these new results.”
For reasons explained below the webinar focused only on Dr. Cormack and his collaborator’s 2011 JOLT paper1
Happily, I understand that ACEDS has invited Dr. Cormack and his collaborator to publish their own webinar which we all should look forward to attending; after all, no one can describe authors’ work better than the authors themselves.
Considering the Webinar’s Accuracy and Tone
Dr. Cormack asserts that the webinar was based upon “selective reading” and “presented false impressions…” and that webinar’s panelists were “polemic”.
I take Dr. Cormack’s comments seriously because they are indictments of the panelists’ intentions, intellect and integrity. While these accusations deserve more attention—I suspect that we all will continue to consider them and hope that Dr. Cormack clarifies his statements—here are my immediate thoughts about each of Dr. Cormack’s observations.
I recognize that the webinar challenged various key assumptions, one of which the court in Da Silva Moore v. Publicis Groupe quoted (in dicta) directly and without qualification from Dr. Cormack’s and his collaborators’ JOLT paper [emphasis added]:
“Technology-assisted review can (and does) yield more accurate results than exhaustive manual review.”
As the webinar sought to make clear, however, the JOLT study itself acknowledged various limitations—not to mention others identified by other commentators—that tempered the “(and does”) phrase. Even the formal JOLT article’s title put it in context [emphasis added]: “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive ManualReview.”
Consequently, the webinar noted that:
The phrase “TAR can yield more accurate results” is reasonable.
The phrase “TAR can (and does) yield more accurate results” which was adopted by the court without qualification and giving the impression that all forms of TAR usually (or even always) give more accurate results, was not justified by the analysis performed in the JOLT article or other preexisting, persuasive analysis.2
From my view, rather than attacking Dr. Cormack and his work, the webinar’s content spoke to the quality of the court’s interpretations in a manner entirely consistent with thoughtful and professional analysis.
Did the Webinar Present a False Impression?
Similarly Dr. Cormack observed that the webinar “chose to elide the essence of the definition of TAR from both our 2011 JOLT and 2013 Federal Courts Law Review definitions.”
Just as the webinar challenged the 2012 court’s adopting the “(and does)” phrase, the webinar challenged the 2012 court’s finding legal significance in the acronym “TAR” when the TAR definition provided within the 2011 JOLT article upon which the court relied was so inherently vague.
Consequently, the webinar’s panelists did not “selectively read” only Dr. Cormack’s pre- 2012 articles. Nor did the panelists imply that key players (including, of course, Dr. Cormack) have since 2012 “rested on their laurels” are doing “no legitimate empirical work.” We all acknowledge that meaningful progress has been made. But for purposes of the webinar what mattered is this: to what information did the court have access when it made its 2012 ruling?
Obviously, to that issue all post 2012 work, however legitimate, is irrelevant.
Was the Webinar Polemic?
Admittedly, the webinar was not designed to be a “go-along-to-get-along” infomercial. After all, by now we all have attended too many webinars in which evangelists masquerade as educators.
Certainly the webinar challenged—OK, perhaps to some, “attacked”—particular assumptions which the legal profession has accrued. The webinar did so because those assumptions are sufficiently significant that they are manifesting as generally accepted notions and also as emerging legal precedent.
Nevertheless, the webinar’s intent was not to attack Dr. Cormack or his work and—based upon what we heard so far from other attendees—that was not our result.
Offering a Way Forward
Dr. Cormack ended his response with this: “Remarkably absent…was any constructive suggestion as to how to proceed: How should one conduct review? How should one evaluate methods and tools? What research could reasonably be conducted to advance the legal industry’s understanding of the issues?”
While the webinar’s panelists all have experience with all those important questions and did not necessarily highlight them as such, the webinar identified several “constructive suggestions as to how to proceed” including, for example:
- Clarifying the (general lack of) judicial acceptance of TAR. Many in our profession remain confused about the level of judicial deference afforded TAR’s results. For example, a quick survey conducted during the webinar reflected what seems to be a common misperception: about half of those who responded thought courts have required parties to employ TAR even though that assertion is nearly perfectly false.
Nevertheless, the market’s confusion is a natural consequence of TAR vendors’ regularly asserting that courts “order,” “require” or “approve” TAR.
The natural implication, of course, is TAR vendors and other proponents should aim to give a more balanced presentation and should be called out when they fail.
- Differentiating alternative TAR techniques and technologies. The profession requires additional clarity regarding legally defensible TAR techniques and technologies, including understanding the relative strengths of various TAR technologies’ approaches and the point at which they deem the results to be defensibly “good enough.”
That is why to help the profession move forward the webinar presented animations depicting TAR 1.0, 2.0, 3.0 and encouraged the profession to recognize that all TAR approaches are different and that no single approach is optimal in all situations.
- Developing independent and valid TAR assessments. As our profession refines TAR techniques and technologies, it continues to search for independent assessments—and, better yet, verification—as to TAR’s purported capabilities, its operating requirements and its limitations. Obviously, clarifying these attributes—essentially the technologies’ “flight envelope”—is critical to gauging the reliability of any technology including each TAR approach.
Unquestionably, the most well-known and positive set of claims about TAR appeared in Dr. Cormack’s and his collaborator’s own 2011 JOLT article whose conclusion has been cited in nearly all TAR-related judicial opinions: “Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.”5
Nevertheless, the TREC study and the JOLT paper from which it emerged employed performance measures that many of us, including perhaps Dr. Cormack, recognize offer limited sensitivity in the context of document review.6 Nevertheless, the TREC study and the JOLT paper from which it emerged employed performance measures that many of us, including perhaps Dr. Cormack, recognize offer limited sensitivity in the context of document review.[i] For example:
- Recall is insensitive to probativeness (e.g., “redundantly relevant” vs “hot”) and completeness (e.g., producing high proportion of documents re uncontested issues can offset few documents re critical issues).
- F1 (which is derived in part from recall) is inappropriate in the document review context and insensitive to real-world document review risks.Recall is insensitive to probativeness (e.g., “redundantly relevant” vs “hot”) and completeness (e.g., producing high proportion of documents re uncontested issues can offset few documents re critical issues).
The webinar reviewed the implications of relying upon such performance measures to help TAR vendors and other proponents address known weaknesses and to help our profession more capably and independently assess vendor’s claims about what Dr. Cormack may call “methods and tools” to design and negotiate document review practices, and to satisfy legal duties.
- Developing more convincing evidence that TAR is reliable. Nearly all TAR-related research asserts that technology “interplays” or “extrapolates” human decisions which are “harnessed,” etc.
As stated in the webinar, the legal profession needs additional clarity as to how this works reliably and, therefore, defensibly.7
That is why in webinar I discussed how TAR vendors and other proponents could pursue the path taken by the National Academy of Sciences when it reviewed other forensic technologies (e.g., fingerprint, bite mark and bullet lead comparison) and largely debunked some of them: the Academy’s analysis started with clear and direct statements regarding each technology’s purported:
- Capabilities: which in the TAR context includes what is the algorithm’s sensitivity to human decisions which it “harnesses,” etc.;
- Operating requirements: TAR’s corpus’ minimum/maximum volume; minimum necessary reviewed documents and relevant topics, necessary reviewer training and management, etc.;
- Limitations: TAR’s (in)sensitivity with respect to very small and cryptic or very large and multi-topical documents, (in)sensitivity to handle multiple conceptually varying research topics, (in)sensitivity to what it “perceives” to be inconsistent user interactions, etc.
To be clear, perhaps in time TAR’s capabilities, operating requirements and limitations can be delineated.
And, perhaps in time, the underlying legal problems that TAR seeks to solve will indeed be solved.
But in the meantime, knowing what about TAR is ambiguous and, perhaps more to the point, what problems TAR seeks to solve that are currently beyond TAR’s capabilities is essential to optimizing TAR-related technology and techniques and to the profession’s defending its results.
To summarize, TAR vendors and other proponents cannot expect the legal profession to adopt technology the use of which imperils law licenses, professional reputations, legal claims and defenses and, yes, justice until the TAR proponents are clear about what TAR is and can do, what is necessary to make it do it, and what TAR cannot do.
Again, we appreciate Dr. Cormack’s feedback. As indicated in this response and the webinar which it discusses, we all want to help move the profession forward by proving TAR’s reliability, improving TAR’s capabilities and, thereby, extending TAR’s adoption.
Underlying differences of opinion are based upon our respective perspectives.
That isn’t a problem.
Instead, that’s the stuff from which solutions emerge.
1.↩ As discussed below, the webinar focused exclusively on the 2011 JOLT paper not because there have not been other interesting and prospectively important analysis performed since then but because it:
Remains the only available published significant comparison of (“exhaustive”) manual review relative to technology assisted review
The 2011 JOLT paper has been cited in nearly all TAR-related judicial opinions so it remains significant and its findings remain relevant.
2.↩ By contrast, if Dr. Cormack believes that the JOLT study does support the “(and does)” claim, we hope to hear from him regarding that.
3.↩ Here is that TAR definition as it appeared on a webinar PowerPoint slide:
“A technology-assisted review process involves the interplay of humans and computers to identify the documents … [and] may involve, in whole or in part, the use of one or more approaches including, but not limited to, keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, and sampling.”
The text which Dr. Cormack seems to complain was improperly omitted—in his terms “elided— as represented by the ellipses is:
…in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege. [FN3] A human examines and codes only those documents the computer identifies – a tiny fraction of the entire collection. [FN4] Using the results of this human review, the computer codes the remaining documents in the collection for responsiveness (or privilege).[FN5] A technology-assisted review process…
I believe that the excerpted text employed on the PowerPoint slide captured the point: that in 2011 the phrase TAR “can (and does)” perform better than “exhaustive manual review” coincided with a definition of TAR that was essentially unbounded with respect to technology and its use and unspecified with respect to how that technology “involves the interplay of humans.”
It could be that Dr. Cormack sees within that definition—presumably in the text that the webinar omitted—language which is sufficient to differentiate TAR from other document review strategies. The point we made in the webinar, however, is that given the definition’s subsuming within “TAR” “in whole or in part, one or more approaches including, but not limited to” a long list of technologies—including notably Boolean search that has been employed for decades—I simply do not see necessary precision to prescribe “TAR” or, conversely, to sanction a party for failing to use “TAR.”
As to Dr. Cormack’s objection to the webinar’s mentioning a revised, 2013 definition of the TAR—again with necessary ellipses—the webinar sought to make the point that the TAR definition itself has over time became no precise. Beyond that, TAR’s evolving definitions imposes additional difficulty in interpreting TAR-related court rulings: inasmuch as courts haven’t defined what they mean when they refer to “TAR,” given the alternative available definitions attorneys are left wondering this: when courts refer to “TAR” to what specific capabilities are the courts referring?
4.↩ Any claim that JOLT’s definition of TAR was precise is undermined by the article itself, particularly in the sentence which followed “(and does)” which reads: “Of course, not all technology-assisted reviews (and not all manual reviews) are created equal.”
5.↩ As his reaction to the webinar notes, of the eleven TAR teams’ technologies, Dr. Cormack’s own University of Waterloo’s (“UW”) TAR engine was one of two that Dr. Cormack and his collaborators selected to be “most likely to demonstrate that TAR can improve upon exhaustive manual review.”
To be clear, the webinar mentioned that JOLT focused on two TAR systems “most likely to demonstrate” their point, not to question whether JOLT’s statistical significance claims were warranted. More specifically, the webinar pointed out that JOLT’s claims were based only on the TAR systems which had the best performance, not systems that were in any sense average or typical.
Should courts be relying on a study of the best systems to make rulings on the use of systems bearing little resemblance to the systems studied? This is not an issue that is resolved by a Bonferroni correction (which merely ensures that statistical significance claims about the systems studied are valid for those systems). It is important for litigants to know how strong the evidence and judicial support for TAR really is.
6.↩ Even though the TREC study and the JOLT paper relied upon those performance measures (which, in turn were relied upon by various courts), to our knowledge there have been no relevant separate, independent studies and no Daubert / Kuhmo Tire / FRE 702 adversarial hearings with respect to any TAR technique or technology that addresses those weaknesses.
7.↩ More specifically, what “harnesses” or “interplays” or “extrapolates” means in the document review context is critical. After all, how can the profession appreciate whether any study’s results are dependent upon a particular set of reviewers working with a particular set of data pursuing a particular set of topics?
Conversely, how can the profession understand the extent to which any particular test’s results are true in all other (reviewer, data, topic, etc.) contexts?