Posted by Rational Enterprise | Wed, Jul 17, 2019 | Comments (0)
Rational Review’s (“RR”) predictive coding technology is built on one of the most advanced machine learning techniques available today: Convolutional Neural Networks (“CNN”). A CNN is a type of deep learning, that companies like Google use to automatically recognize and categorize images, behavior, emotions, and more – all things that a human’s brain is wired to do instantly - yet require incredibly complicated and intricate analyses of relationships and patterns. Rational has brought the power of this technology to eDiscovery review.
Foundations of Predictive Coding
Predictive Coding is known by many names, and covers a wide range of technologies, techniques, and workflows. The use of predictive coding in eDiscovery has been widely approved in case law, but decisions on methodology , what technologies to use, and what workflows to follow are still left to individual attorneys.. On top of that, over 36 states have adopted Duty of Technology Competence rules, which should incentivize lawyers to increase their knowledge base as much as possible on the best implementations of the technology.
All predictive coding implementations face the basic problem of making human language machine readable. Various methods can accomplish this task:
- Simple character/word to number exchanges.
- Exchanges overlaid with predetermined weighting to certain words, accounting for semantic relationships inherent in language.
- Exchanges where weightings are determined through applications of machine learning that discover specific semantic relationships in the targeted character set.
- Exchanges that use a pre-trained machine learning algorithm with semantic and syntactic relationships factored into its weighting.
Rational Review uses a pre-trained implementation of the algorithm Global Vectors for Word Representation (“GloVe”), which outperforms other algorithms in capturing complex linguistic functions – such as analogies, word similarity, and named Entity Recognition – in its weighting (this process is also known as word embeddings).
Once a document is converted to numerical data, machine learning technology is applied to uncover patterns in the number sets. By itself, the algorithm will not be able to distinguish between important and unimportant patterns, however, once a human trainer provides example documents to the technology, it begins to understand (or learn) the important patterns (i.e., most relevant). Using this training set, the technology develops a model to uncover similar patterns in new documents. This modeling is the essential function of predictive coding: to learn the patterns that lawyers detect within a set of documents and identify that same pattern in documents it has never seen before.
In a traditional applied knowledge approach, an attorney deeply familiar with the factual and legal issues of a matter will convey that knowledge to a review team with instructions to manually and laboriously identify the type of documents relevant to a matter. The knowledgeable attorney might only list a handful of features for the review team to evaluate when reviewing documents.
With machine learning, the knowledgeable attorney instead provides the machine examples of relevant documents, so every feature – even those that cannot be articulated – is available for the machine to consider in the formulation of its model. Essentially, the machine reviewer is able to absorb orders of magnitude more information in constructing its definition of relevancy than a team of inconsistent reviewers.
The Difference: Why Neural Networks?
First invented in 1963 and standardized in 1995, Support Vector Machine (“SVM”) technology is one of the oldest and most popularmachine learning applications. Indeed, many eDiscovery companies are still using SVM technology today. SVMs are capable of evaluating many different features of a document and provide useful confidence levels of their prediction outputs. Most studies attempting to compare algorithms for predictive coding in eDiscovery point to SVM as the most capable, however, these studies have not compared SVM to the powerful combination of word embeddings and CNNs.
A neural network uncovers patterns in the numerical data ; then, unlike previous forms of machine learning, a secondary layer of neurons looks for patterns in the primary patterns; and a third looks for patterns in those patterns, etc.; some networks can be comprised of dozens or even hundreds of layers. This structure is what makes neural networks so rich in their analysis of data. SVMs, by contrast, only take a single pass at analyzing the patterns in the data and are limited to a binary classification.
SVM is akin to a junior reviewer who arranges every document in a matter on a large conference table. All the hundreds of things he observes about a document prescribes where on the table he places it. Then, when the senior attorney asks for a type of documents she wants to see more of (e.g., a relevant document), the junior attorney identifies where that type of document may be, and hands the senior attorney a pile of nearby documents on the table, in the hope that those documents are also relevant.
Now imagine if CNN were a junior reviewer At first, he observes all the different document features but doesn’t make any decisions about the documents or place them into piles yet. Instead, he makes a post-it note for each feature he notices, and arranges the those notes on the conference room wall. He then notices various patterns in the post-it notes, recording these observations on set of cue cards, which he arranges on the conference room table. Next, he recognizes patterns in the cue cards, and records them on scrap pieces of paper, laid out on the floor. He is then able to analyze all the relationships within the post-it notes, cue cards, and scraps of paper, and without knowing anything about the case, makes a guess as to whether or not a certain document is relevant.
At this point, he checks with the Senior attorney to see if his guess is right. If he is correct, he knows that the patterns he identified were useful, and thus those same patterns should be used in judging the next document; if his guess was wrong, he will place less importance on those patterns when looking at the next document. The more times he sees the same pattern lead to a confirmed guess, the more confidence he places in his predictions. The junior attorney makes sure to track his performance, so if the Senior Attorney finds that he was correct for the past 100 documents, she can be confident he is doing well.
If you are thinking that the CNN reviewer is going to need a lot more space (and office supplies) to do his work than the SVM reviewer, you’re right; this kind of analysis takes much more computing power. However, he still does it just as fast, and we think he does a much better job. Furthermore, you can give him as many classification options as you want. His process still works if you want him to decide between Hot, Warm, Somewhat Relevant, Not Relevant, etc. instead of just Relevant/Not Relevant.
CNNs in Rational Review
The most significant impact of using this technology is a marked improvement in understanding complex ideas, but there are several additional benefits that manifest in Rational’s tool:
- Precision – The deep learning technology does not force users to categorize a document in a binary selection. In other words, the classification does not need to be either relevant or non-relevant, but can instead be hot, warm, relevant, non-relevant, etc. The model also provides a confidence score from 0-100 for each categorization. There is no limit to the number of categories that the machine can learn, it just needs 25 examples from each category to get started.
- Context – RR’s implementation can incorporate the metadata of the document into the initial input of the neural network and the pattern analysis. Users can specify through the graphic user interface what metadata fields should be considered for a particular model.
- Auditability – RR’s implementation allows users to track a “Global” machine learning model in addition to each individual, reviewer-specific model separately. Dashboards allow users to compare visually side-by-side the predictions from these different models to understand how they differ. These audit features allow review managers not only to flag irregularities and pinpoint exactly where accuracy fluctuations are occurring, but also to hand-select a subset of reviewers to create a custom model and eliminate subpar work product. If there is ever a question of whether the model predictions influenced the decision making of individual reviewers or case managers, administrators are able to view a full audit log of exactly who saw what prediction and when, to maximize defensibility.
- Accuracy – Rational built its machine learning application in-house based on best-of-breed tools, and as a result, understands which metrics are most useful to display in its visual, intuitive dashboard. Tracking accuracy and performance has never been easier, with the platform providing calculations for the user on the per-document impact of the current model state, such as the expected rate of false positives rendered, or average prediction accuracy per confidence level category. A continuous learning implementation ensures this information is always up-to-date and gives review managers all the information they need to decide how to use the model and when.
As exciting as this technology is, it represents another tool in a legal professional’s eDiscovery toolbox, albeit a powerful one. Rational Review is fully equipped with email threading, concept analysis, clustering, near duplicate analysis, and all other analytic tools that complement machine learning in eDiscovery review. Perhaps RR’s greatest differentiator is a simple, approachable design that is focused on helping lawyers do their work, where analytics and machine learning are integrated intuitively and do not require the assistance of expert data scientists. We look forward to giving you a closer look soon.
Posted by Rational Enterprise | Wed, Mar 9, 2016 | Comments (0)
Mark Greisiger of NetDilligence, a Cyber Risk Assessment Services and Data Breach Services company, recently interviewed Rational's Business Development Manager, Tom Preece, on ensuring proper data governance in the face of increased cyber threats.
Please follow this link to read the full interview, entitled "Data Governance: Managing and Safegaurding Important Information Assets," published on the Junto eRiskHub blog.
Posted by Rational Enterprise | Fri, Feb 19, 2016 | Comments (0)
Tom Preece, Rational’s Business Development Manager, recently spoke with Kurt Wimmer about his thoughts on emerging data privacy and security trends and how organizations should begin to address these risks. Kurt Wimmer is the U.S. chair of Covington & Burling’s Data Privacy and Cybersecurity practice and is the immediate past chair of the Privacy and Information Security Committee of the American Bar Association’s Antitrust Section.
Tom Preece: You have written in the past about how companies manufacturing IoT devices will need to pay attention to consumers’ desire for privacy of personal data in order to be economically successful. However, there is an increasing sentiment that Generation Y values privacy far less than their preceding generations. As Gen Y purchasing power increases, will the IoT still boom without having to take this desire into account?
Kurt Wimmer: I think the privacy motivations of Generation Y are often misinterpreted. Of course, I agree that Gen Y members are willing to share more personal data than previous generations. But they will only do it on their own terms, and they are quite conscious of protecting their privacy. Gen Y consumers are also cautious about sharing information with institutions their parents trusted implicitly -- including banks and other financial institutions that didn’t fare well during the Great Recession. I don’t expect Gen Y consumers to trade privacy and data security for the simple convenience of IoT devices -- I expect them to be careful consumers.
Tom: What is your advice to Fortune 500 companies struggling to align their business goals and priorities with data security and privacy protection? Is compromise always necessary?
Kurt: I have seen a marked change in U.S. companies’ attitudes toward personal privacy over the course of my 20 years of practicing in the privacy area. Back in 2000, when the EU-US Safe Harbor came into force, I expected U.S. consumers to finally begin demanding the same privacy rights that U.S. companies were affording EU consumers under the Safe Harbor. That didn’t happen, but the rise of social media and the ubiquity of data breaches has caused the American public to finally focus on privacy and security as a product differentiator. Companies such as Microsoft and Apple now are taking tough privacy positions against government access of their customers’ data, and are finding that these policies resonate with consumers. In my view, Fortune 500 companies are moving toward more progressive and consumer-centric views of privacy, both as the marketplace has begun to demand it and as companies begin to understand that privacy can be a competitive advantage for their products and services in a competitive market.
Tom: Security and Privacy managers often fail to secure executive buy-in for comprehensive privacy or information security programs. What advice do you have for managers seeking such buy-in?
Kurt: Executive support for privacy and security programs is, of course, essential. Corporate structures take their cues from boards and CEOs, and leaders who understand and value privacy will create companies that protect privacy. I believe that executive buy-in for data privacy and cybersecurity programs is massively increasing in today’s market, both in light of cybersecurity issues arising out of high-profile data breaches such as Sony and in light of new penalties being adopted in Europe. The FTC, too, is being taken seriously as a tough regulator as it imposes 20-year consent orders on an increasing number of companies across industry, and state attorneys general are stepping up. These enforcement actions are gaining the attention of CEOs, CFO and boards of directors, and top-level corporate support for protecting privacy and security is increasing.
Tom: There seems to be a fundamental difference between the way that the US and EU view personal data, culminating most recently in the Safe Harbor agreement being struck down. Is one perspective more correct than the other, and why?
Kurt: Europe has long held that privacy is a fundamental human right, whether it is impinged by the government or a private company. The U.S. has had even longer historical protection for privacy, but U.S. privacy interests have focused on protecting the individual against the government. Both perspectives are valid, and arise from different cultural and historical experiences with privacy. In enacting the General Data Protection Regulation (GDPR), the European Union has gone a step further in attempting to restrain commercial use of personal data, and it will be interesting to see how much that effort may impact the EU’s ability to host cutting-edge digital companies as the GDPR is implemented over the next two years.
Tom: What role does Information Governance play in data privacy and security?
Kurt: Enlightened companies looking to truly secure their data against internal and external threat often begin with organizing their data. Data storage often is based more on history than logic, and idiosyncratic storage procedures that have built up over time can present a serious danger to establishing clear administrative, physical and technical controls over important personal data, IP and trade secrets. Establishing an information governance system is often an important first step toward creating a secure and effective system for collecting, storing and effectively using a company’s data.
Posted by Rational Enterprise | Mon, Feb 1, 2016 | Comments (0)
Tom Preece, Rational’s Business Development Manager, recently spoke with James Sherer about his work as a litigator, author, e-discovery practitioner, and Information Governance thought leader. James Sherer is Counsel at BakerHostetler, where he co-chairs the Information Governance practice team and serves as part of the E-Discovery and Management and Privacy and Data Protection groups. His work focuses on litigation; discovery management processes; enterprise risk management; records and information governance; data privacy, security, and bank secrecy; technology integration issues; and related merger and acquisition diligence. James holds CIPP/US, CIPP/E, CIPM, and CEDS credentials as well as an MBA in finance; is a member of The Sedona Conference® Working Groups One, Six, and Eleven; and writes and presents on e-discovery, information governance, privacy, investigation, and merger and acquisition issues.
Celebrating the 100th anniversary of its founding this year, BakerHostetler is a leading national law firm that helps clients around the world to address their most complex and critical business and regulatory issues. With five core national practice groups – Business, Employment, Intellectual Property, Litigation, and Tax – the firm has more than 940 lawyers located in 14 offices coast to coast. BakerHostetler is widely regarded as having one of the country’s top 10 tax practices, a nationally recognized litigation practice, an award-winning data privacy practice, and an industry-leading business practice. The firm is also recognized internationally for its groundbreaking work recovering more than $10 billion in the Madoff Recovery Initiative, representing the SIPA Trustee for the liquidation of Bernard L. Madoff Investment Securities LLC. Visit www.bakerlaw.com for more information.
Tom Prece: Your previous and current experience gives you a unique perspective into the shortcomings of the typical e-discovery process. As a litigator and as part of BakerHostetler’s E-Discovery and Management practice group, what is the most overlooked error? How is it best addressed?
James Sherer: I’m lucky that I’m able to bring my E-Discovery experience most directly to bear in the cases I’m involved in, and lucky as well to work at a firm with a team focused on E-Discovery and its application in litigation, regulatory response, and even merger and acquisition activity. BakerHostetler’s E-Discovery and Management (“E-DAM”) team is comprised of practicing litigators, and operates in close concert with but still separate and apart from BakerHostetler’s Litigation Support team. E-DAM practitioners provide strategic advice to case teams advocating intelligent discovery decisions. They are also expected to be well-versed in technological choices as well as the laws specific to discovery, including the recent changes to the Federal Rules. We’re therefore involved primarily in two phases of litigation: when we’re part of the case team generally, where we’re helping with that advocacy process with only the discovery aspects of the case but also witness selection and some points of overall strategy; and when there’s a problem specific to discovery, usually on the pleadings side.
Given all of that background, it’s my opinion that the most consistent issue I—and other people on my team face—is the thought that the technological solution will be “push-button easy.” It’s just not. And the solution to that predicament is being able to work with the other attorneys on the case, who often include attorneys working for the opposing party or the government, and being straightforward about exactly what the technology will do, how it operates, what the timing looks like, and what the end result will be.
Tom: From a former in-house counsel at a Fortune 500 company to now a co-chair of BakerHostetler’s Information Governance team, your perspective on IG has surely evolved. Knowing what you know now, what would you have done differently and why?
James: I don’t know if I would be able to do it “differently,” but I think part of success in this area of practice is continual improvement of listening skills. The engineers who build the technologies are excited to talk about them; the people who build internal processes are just as excited to share. The key here is not telling people what you think they’ve said (or what they mean). Instead, active listening provides them with the opportunity to explain what they mean. The same holds true when interacting in peer groups within the E-Discovery, Data Privacy, and Information Governance spaces. Because active participation in these spaces means attending events and becoming involved in associations, a practitioner will see his or her peers over-and-over-and-over again. An elevator speech is unnecessary. Active listening is better.
Tom: You are a prolific published author. The BakerHostetler website lists 13 papers that you authored or co-authored in the past 2 years alone. What is your next topic and why are you choosing to write on it?
James: Thank you. It’s been a lot of fun, and most of the opportunities I’ve had to write have come from the people I work with or who I’ve met within those conferences and associations I mentioned earlier. I write either for the opportunity to write with people I respect and/or because I’m excited to learn from the topic. The best opportunities, of course, are those that involve both.
Most immediately, there is a cross-border Bring Your Own Device (“BYOD”) paper that will be published any day now in Bloomberg BNA that I co-authored with two other BakerHostetler attorneys. I am also one of five authors (including a law student) who contributed to an already-submitted second Merger & Acquisition Due Diligence paper that Richmond’s Journal of Law and Technology (“JOLT”) will publish this spring as a follow-up to our prior 2015 M&A article in JOLT. And I was one of three BakerHostetler attorneys contributing to an ABA book chapter on discovery-related search standards that was submitted earlier this year.
Finally, I’m currently writing another JOLT paper with two colleagues on the ethics of data privacy for JOLT’s February conference, and another drafting team I work with is submitting a paper for peer-review to the ECCWS conference in Berlin this summer on C-Suite recognition of insider threats.
Tom: As member of The Sedona Conference Working Group Series 1, 6 & 11, you have contributed to their thought-leadership articles as well. What are the common threads that run between Information Management, e-Discovery, and Privacy/Security?
James: Within the Sedona Conference, and as a supporting member certainly speaking only for myself, I’ve noted an interesting practice when it comes to drafting the advice the Sedona Conference provides. That is this: the drafters work to define, examine, and explain the specifics of the issue, whether it is a specific technology or protocol or point of law. But much of that research and discussion—and it is very active discussion—doesn’t make it into the final piece. That is because the Sedona Conference wants its work-product to last. Its members are aiming for principles that won’t become outdated as soon as the guidance makes it out of committee(s). And that’s a great process to understand when you’re relying on the principles as a practitioner: that the people who drafted them weren’t just thinking about “principles” when drafting. Instead, they were applying true, open-ended, inductive reasoning. They start with the specifics of the real world, and divine and present their principles from a background of true understanding.
Tom: The article “M&A Due Diligence…,” which you co-authored, suggests that the synergies of a framework encompassing Data Privacy, Information Security, E-Discovery, and Information Governance are necessary to a successful M&A deal, an event meant to find synergies between companies. For those who can only muster the focus to read a blog post as opposed to a 76-page paper, would you mind recounting the highlights?
James: While I still encourage interested readers to read both the first paper and the soon-to-be-published second paper, I think the overall point of the paper is that the time has come to include E-Discovery, Information Governance, Data Privacy, and Information Security in M&A due diligence. Which is not that big a stretch; I and my colleagues are certainly involved in those types of due diligence projects on a more-or-less consistent basis. But the article suggests that it’s not enough to think about these issues when they are impossible to miss. Instead, even if these issues are ultimately unnecessary for a given deal, there is still value in their consideration. Secondarily, once participants have determined that one (or all) of these issues should be part of the process, we present a framework for that type of consideration—hopefully providing value for people who are relatively new to this kind of work or these kinds of issues.
Posted by Rational Enterprise | Thu, Nov 19, 2015 | Comments (0)
Tom Preece, Rational’s Business Development Manager, recently spoke with Ted Augustinos, Partner at Locke Lord LLP, about his thoughts on how organizations should best address emerging trends in cyber and information security. Ted is a member of Locke Lord’s Privacy and Cybersecurity Practice Group, which assists clients in developing and enforcing data privacy and security policies to ensure compliance with the standards and practices of the industries and legal frameworks in which they operate. The group also provides legal evaluation, guides forensic evaluations, prepares data breach response plans, oversees remediation, and helps respond to inquiries from governmental agencies.
Tom Preece: It seems that despite a growing number of state, federal, and international laws dealing with cyber security breaches and privacy protection, legislation regularly lags behind technological advancement. How is this gap best addressed?
Ted Augustinos: Laws and regulations always lag the market, and that’s clearly the case in privacy and cyber security, where both the threats and solutions change very rapidly. That’s why the best approach is to view the issue of privacy and cyber security for what it is – an enterprise-wide risk management issue. Legal and regulatory compliance is certainly a part of the response, but if this is viewed as a legal problem, the real challenge will not be met. Similarly, it’s not just an IT problem, and IT professionals can’t solve it. The best approach addresses privacy and security holistically across the enterprise by incorporating legal and compliance resources, IT solutions, business leaders, HR personnel professionals, and marketing and PR talent.
Tom: The private sector seems to be responding to consumers demanding increased privacy, with companies like Apple making privacy protection a selling point. Will data privacy be a competitive advantage for companies?
Ted: Increasingly, we see that companies are looking for ways to trumpet their privacy and cyber security profile as a competitive advantage. Of course, this comes with risks, and woe to the company that says it’s better than its peers at this, but isn’t.
Tom: There has been an increased trend towards cloud computing in recent years, including a “cloud first” policy among many federal government departments. Are the advantages of moving to the cloud worth the purported risks, and what are the most important data privacy and security considerations a company should address before doing so?
Ted: The cloud solution is one of many options, and needs to be viewed as such. In considering a cloud solution, the question has to be asked, “compared to what?” Also, not all cloud solutions are created equal. Any assessment must include a thoughtful assessment of objectives, risks, and costs, all in the context of other available options, and all of that looks different for different companies.
Tom: There seems to be a large disparity in the information security maturation level of companies today, even amongst the Fortune 500. Without naming names, can you describe the worst or best information security program you have seen at large corporations.
Ted: The best of the best continually monitor the threat landscape, and reprioritize perceived threats. They follow developments in applicable laws and regulations, and in technology solutions, and update their policies, procedures, and technology to reflect these developments. Their business leaders work with privacy and security professionals in designing and implementing new products, services and initiatives, so that the related privacy and security issues are considered and addressed at the outset. Their governing boards are fully engaged and updated on these issues. The best conduct regular training of all personnel to avoid mistakes, and to identify and escalate incidents that may be of concern. They also conduct simulation exercises with their incident response teams, and use those experiences to improve.
It’s a rare company that hasn’t done any of these things, but the worst will face harsh consequences for not taking reasonable steps toward addressing their legal, regulatory and contractual obligations to protect the data with which they are entrusted, and toward mitigating these risks.
Tom: Many companies require the devastating impact of a security breach, regulatory investigation, or litigation to overcome the inertia of developing a modern information security or privacy protection program. For growing companies that have not yet felt this jolt, is there anything you might say to shock them into action?
Ted: No – if the daily news and the stories of their peers haven’t made an impression, there’s not much anyone can say! Seriously, to overcome the causes of inertia (typically, lack of budget, bandwidth, or a sense for where to start) I would suggest that they work with someone who can help identify and rank threats and vulnerabilities, and set a reasonable list of priorities. Then establish a reasonable budget that won’t bankrupt the company, and start working down the list. There are important things that can be done to improve the company’s risk profile with literally no budget, and there are others where a modest investment will make a significant improvement. There is no silver bullet, but focusing on these issues and establishing a culture of compliance and risk mitigation will better position the company to survive in the current and developing environment.
Locke Lord is a full-service, international law firm of 23 offices designed to meet clients’ needs around the world. With a combined history of more than 125 years and a wide domestic and global footprint, Locke Lord is a worldwide leader in the middle market sector. Locke Lord advises clients across a broad spectrum of industries including energy, insurance and reinsurance, private equity, telecommunications, technology, real estate, financial services, health care and life sciences, while providing a wealth of experience through its complex litigation, regulatory, intellectual property and fund formation teams. To learn more about Locke Lord, visit http://www.lockelord.com.