Legal Tech is not just about Robots and AI
One of the topics that come up when speaking to clients is what technology we use to deliver work as efficiently as possible. Inevitably, the phrase ‘Artificial Intelligence’ will be mentioned - followed by a question on how Clifford Chance uses AI. Do we really use AI? More importantly – has it been useful?
The answers to the above are yes, and yes – but the crux lies in how. AI is extremely helpful in reducing the time taken to review and extract data from a large document set – but using an AI tool alone is not always the answer. Instead, other types of technology such as Regular Expression (“RegEx”) can also be very effective in reviewing and extracting data.
Should you use a RegEx tool instead of AI?
RegEx is a sequence of characters that specify a search pattern – think of wildcard searches on steroids – which provide a flexible means of matching and identifying text. This works well when you know how a certain phrase is structured. For example, to identify the maturity date in a document – you could write the RegEx rule “(will).{1,20}(mature|expire) on”. This rule tells the tool to look for where “will” appears within 20 characters of either “mature” or “expire”. A further RegEx rule can then tell the machine to extract any digit or month that suffixes the identified phrase.
A well-written RegEx can be more accurate than relying on machine learning models, as RegEx allows you to specify what to search for, at a lower cost to get the results. This allows us to ‘codify’ our lawyers’ expertise and knowledge of the possible terms one should look for, which can then be run on multiple documents. RegEx is very useful to extract short and well-defined data points, such as date of agreement, currencies, interest rates, definitions, etc.
Ba RegEx rules can also be quickly adjusted to improve accuracy to accommodate new patterns based on your review of the results. While we do use AI review tools, it is not always the most effective solution to meet the speed and agility needed by our clients. Supervised machine learning requires a domain expert tagging the correct examples as training data (and often significant volumes of it), and the machine learning algorithms build a predictive model which adapts with more training data. This also means that the accuracy of machine learning models can never be 100%, since it is highly contingent on the quality and volume of training data.
This is not to say that you should always use a RegEx tool instead of AI. RegEx rules excel in extracting data points but not clauses and are of limited use where there is a high variance in drafting. A well-trained AI model can identify clauses or paragraphs even if there are variations in the way that they are drafted or without the presence of keywords or patterns.
How we use a combination of the tools
In fact in many cases, it would make sense to use a combination of both tools to optimise the accuracy of our searches. We have used Machine Learning and RegEx to augment each other in the work that we do for clients, around understanding their IBOR exposure.
In one matter, we were given a pool of 5000 documents consisting of a mix of contracts and non-contracts – and were required to identify the client’s potential IBOR exposure in these documents. Machine Learning was used to bring the documents to a level of predictability and structure to optimise the use of RegEx.
- First, unsupervised machine learning - where the machine tries to make sense of basic structure of data in documents without training data – was used, to cluster and classify documents (e.g. by key words and proximity).
- Next, pre-trained AI models were used to automatically identify documents by type, such as whether they were a contract, and eliminate those that were not.
- At the due diligence stage, a mix of AI and RegEx was used for extraction depending on where the technology would excel.
- For legal concepts such as governing law, entity recognition, and interest rate clauses, highly accurate AI models could be used due to identify and extract clauses more contextually, given the abundance of training data available and the work that has been put into these models to date.
- For more uncommon concepts and data points such as IBOR exposure, with the expertise of our Clifford Chance lawyers, RegEx rules were written to identify certain patterns in drafting (e.g. type of IBOR used, typical fallback mechanisms), allowing us to accurately extract the data points.
Augmenting both types of technology to fill in the gaps ensured a higher level of confidence in the output, significantly reducing the number of documents and time spent by our low-cost delivery centre in the review. The matter team was also focused solely on analysing risk for the client based on the output.
When considering how technology can be used to increase efficiencies, it is important that there is an understanding of what and how data in the document is found and extracted. It may not always be the case that a single tool or technology is best suited for the matter, but instead a mix of technologies should be used to optimise results. Therefore, when asked by our clients to help we bring in our Legal Technology Advisors who understand what the best tool is to use in each circumstance.