Despite Early Success, Technology Assisted Review’s Acceptance Is Limited by Lack of Definition
Wednesday, August 31, 2016
Posted by: Jason Krause
NOTE: This is an expanded version of an article accepted for publication and presented at the 2016 ASU-Arkfeld eDiscovery and Digital Evidence Conference. This paper is not meant to disparage the work of any researchers or judges, but expands on the recent debate regarding the defensibility of TAR in litigation.
by Bill Speros
Predictive Coding and Technology Assisted Review (“TAR”) help ameliorate a problem that denigrates civil litigation: extortion by discovery when discovery costs “force settlements for reasons and on terms that related more to the costs of discovery than to the merits of the case.”
More specifically, when searching some particular information, in some particular circumstances, as employed by some particular people who are pursuing some particular objectives, using some particular technologies, TAR works well in some particular ways.
This problem remains: while TAR has been patented, promoted and demonstrated, TAR lacks a definition of what it does, in what conditions it works, and in what circumstances is it ineffective.
Interestingly, even without those definitions, conditions and limitations, some courts express more confidence in TAR than TAR’s own proponents. In the highly influential Da Silva Moore Opinion and Order, for example, the court quotes a law review article [emphasis added]:
“Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.”
Nevertheless, in focusing on that “can (and does)” phrase the court ignored the underlying article’s many significant constraints—not only those mentioned in the article but identified by various experts. In addition, the court ignored the article’s own conclusion that [emphasis added] “technology-assisted review can achieve at least as high recall as manual review…” and the court ignored the article’s title that Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review.
Therefore, idiosyncratically, the court’s opinion that TAR technology “does yield more accurate results” was more absolute and more optimistic than the conclusion that TAR “can achieve” or “can be more effective” claimed by the law review article upon which the court itself relied.
To be clear, the Da Silva Moore court tempered its ruling by assuring the parties that whenever they identified discovery-related problems then they could return to the court. In addition, the court recognized that TAR was emerging, not fully formed:
“[t]he technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the courts need to examine."
The Da Silva Moore court, however, did not examine the TAR “process used and the interaction of man and machine” in a “gatekeeper” role to exclude unreliable evidence, a role that is anticipated by many states’ Frye standard, by the Federal Rule of Evidence 702, and under Daubert, and Kuhmo Tire. Instead, the court ruled (and was subsequently affirmed by the district court) that:
Federal Rule of Evidence “702 and Daubert simply are not applicable to how documents are searched for and found in discovery…”
That opinion, however, is not universally adopted. For example MJ Waxse, et. al, summarizes his and other commentators’ and judges’ views that:
“[Federal] Rule [of Evidence] 702 and the Daubert standard should be applied to experts with technical expertise or knowledge pertinent to a party’s ESI search and review methodologies and who provide the court with evidence on discovery disputes involving these methods.”
In time, perhaps, courts will address that legal issue.
Notwithstanding when they do and what they decide courts may be confused by the current state of the TAR art.
Some of that confusion was formed starting immediately after Da Silva Moore when proponents sought to reframe the court’s ruling.
Perhaps TAR proponents were confused by the court’s misperceiving the article upon which the court relied as being proof-of-capability rather than as proof-of-concept. Or TAR proponents could have been confused by the court’s disconnecting the article’s claims from its conclusions. Or TAR proponents could have been confused by the court’s proclaiming that the “Court has approved the use of computer-assisted review [which] now can be considered judicially-approved for use in appropriate cases.”
In any case, TAR proponents—not only vendors who sell TAR products or services but those whose reputation and celebrity is enhanced by TAR’s promotion—celebrated the court’s ruling, oftentimes reporting that the court “ordered” or “required” TAR, using malformed analogies, and filling law technology publications with advertisements, filling conference showrooms with exhibitors’ booths and salespeople, filling calendars with webinars and prolifically pumping out law review articles, white papers and infomercials.
Naturally, except for a select set thoughtful skeptics—not Luddites but people who think carefully like those listed at footnote 8—who seek to define TAR’s capabilities, requirements and limitations, few people are incented to address publicly underlying weaknesses or, conversely, methods to overcome them. After all, such details provide competitive advantage only to the extent that they are not commonly understood.
In light of that asymmetric marketing push, it is not surprising that the US Tax Court concluded that the “technology industry now considers predictive coding to be widely accepted for limiting e-discovery to relevant documents and effecting discovery of ESI without an undue burden.” Nor is it surprising that the absence of evidence has become evidence of absence. According to the High Court of Justice, Chancery Division, “There is no evidence to show that the use of predictive coding software leads to less accurate disclosure being given than, say, manual review alone or keyword searches and manual review combined“
Regardless of the extent to which judicial opinions were influenced by marketing opinions, they all share a common characteristic that any court which examines TAR’s reliability—whether in its FRE 702 and Daubert gatekeeper role or to consider whether a particular producing party satisfied its discovery obligations—remains:
What is TAR?
After all, a common definition of TAR—see Footnote 1—consists of humans interplaying with computers using one or more undefined approaches to pursue one or more vaguely defined objectives.
Obviously, that definition and other commonly employed ones do not designate TAR’s capabilities, operating requirements and constraints. Nor do those definitions delineate scientific mechanisms and methods. Nor do those definitions prescribe a set of particular technical components whose behavior to associate and valuate text in relevant circumstances can be modeled.
Instead, those definitions are essentially aspirations and un-testable puffery.
But even if those definitions are established, TAR’s reliability faces inherent challenges akin to fingerprint impression and bullet fragment analysis which:
“[H]ave been developed heuristically. That is, they are based on observation, experience, and reasoning without an underlying scientific theory, experiments designed to test the uncertainties and reliability of the method, or sufficient data that are collected and analyzed scientifically.”
More specifically, even after bullet lead compositional analysis was supported by 50 peer-reviewed scientific articles the National Academy of Sciences (“NAS”) observed [emphasis added]:
“For more than 40 years, for example, analysts from the esteemed FBI Crime Lab testified that…they could match crime-scene bullet fragments… by their distinctive elemental makeup. An exhaustive analysis of the technique, however, again by the NAS, found no scientific basis for such claims.”
Even when there is a bona fide scientific basis—fingerprints are unique—subjectivity in the comparing process confounds results:
The NAS concluded that, absent set standards for declaring a fingerprint “match,” “[e]xaminers must make subjective assessments throughout… As a result, the outcome…is not necessarily repeatable from examiner to examiner [and some] experienced examiners do not necessarily agree with even their own past conclusions…”
That is why even with its compelling potential, its preeminent marketing, and an optimistic judiciary, until TAR consolidates definitions about what it is, its capabilities and its limitations, and specifies any underlying science and all necessary protocols, TAR will face meaningful criticism about its reliability.
And it should.
 “A technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege… A technology-assisted review process may involve, in whole or in part, the use of one or more approaches including, but not limited to, keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, and sampling.”
Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L.& TECH. 11 (2011), p. 3-4. Similarly: Maura R. Grossman and Gordon V. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Courts L. Rev. 1 (2013) .
 Subrin, Fishing Expeditions Allowed, 29 Boston Coll. L. Rev. 691, 730 (1998).
 See, for example: A Visual Representation of Predictive Coding Case Law, September 23, 2015,http://logikcull.com/blog/a-visual-representation-of-predictive-coding-case-law/
 Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L.& TECH. 11 (2011), http://jolt.richmond.edu/v17i3/article11.pdf
 Id, Da Silva Moore, p. 19.
 For example:
· For 3 of 5 topics “no significant difference in recall” between TAR and manual reviews (p. 44)
· “[C]onsidered…only the two of eleven teams most likely to demonstrate that TAR can improve upon exhaustive manual review” (p. 48)
· The manual reviews were the ‘First-Pass Assessments’” (p. 24) by loosely managed volunteers (p. 28).
 For example:
· William Webber, Re-examining the Effectiveness of Manual Review, http://www.williamwebber.com/research/papers/w11sire.pdf
· Gerard J. Britton, Courts must reassess assumptions underlying current predictive
coding protocols, http://postmodern-ediscovery.blogspot.com/2014/07/courts-must-reassess-assumptions_7.html
· William C. Dimm, Predictive Coding: Theory & Practice: http://www.predictivecodingbook.com/
 Da Silva Moore, p. 17
 Frye v. United States, 293 F. 1013 (D.C. Cir. 1923)
 Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579
 Kumho Tire Co. v Carmichael, 526 U.S. at 151
 Da Silva Moore, p. 15. See also, Maura R. Grossman & Gordon V. Cormack, Comments on “The Implications of Rule 26(g)on the Use of Technology-Assisted Review,” THE FEDERAL COURTS LAW REVIEW. Volume 7, Issue 1. (2014), p. 311 (“To be clear, we do not suggest that a formal Daubert hearing is necessary or appropriate in evaluating individual TAR efforts; the admissibility of expert testimony at trial and the adequacy of production present different objectives and standards.”)
 David J. Waxse* and Brenda Yoakum-Kriz, Experts on Computer-Assisted Review: Why Federal Rule of Evidence 702 Should Apply to Their Use, 52 Washburn Law Journal 207, 223 http://washburnlaw.edu/publications/wlj/issues/52-2.html
 Some TAR implementations train or validate TAR by selecting a random sample of documents which is a process sufficiently well defined and validated to rise to the level of a tested and reliable science. By contrast, at its core TAR employs machine learning to form artificial intelligence based models. Those models’ capabilities, requirements and limitations are insufficiently precise to accommodate rigorous assessment because, as one meme notes, “If you know what Artificial Intelligence does and how it works then its just software.”
 Da Silva Moore, pp. 25, 26
 TAR proponents analogize TAR to other technologies like Pandora even though its “musicology” project carefully analyzes each musical piece using dozens of standards and tracks reactions by hundreds of thousands of users. TAR proponents analogize TAR to Amazon’s “people-who-bought-this-also-bought-that” up-selling feature even though those product linkages are formed by analyzing continuously millions of customers’ actual buying habits. TAR proponents analogize TAR to Google’s capacity to find web pages even though the authors of those pages employ unambiguous terms of art as linguistic beacons to highlight the content and Google continuously rates the millions of users’ level of interest in the content.
 Dynamo Holdings Limited Partnership, 143 TC No. 9143 TC No. 9 (2014)
 Pyrrho Investments v. MWB Property,  EWHC 256 (Ch)
 Because TAR does not consist of scientific mechanisms and methods beyond random searching, the courts’ gatekeeper role will be associated with Kuhmo Tire pursuant to which courts gauge the reliability technology and technical processes.
 National Academy of Sciences, Strengthening Forensic Science in the United States: A Path Forward (2009), p. 128.
 Keith A. Findley, Reforming the ‘Science’ in Forensic Science, Wisconsin Lawyer, November, 2015
 Id, “Reforming The ‘Science’” refers to I.E. Dror & D. Charlton, Why Experts Make Errors, 56 J. Forensic Identification 600 (2006)