When I was a tax lawyer, I often easily spent hours with ten people in a room discussing how to interpret a specific phrase. "What if the comma were here and not there?" People call it nitpicking. Lawyers don't. For lawyers, precision is of great importance.
Generative AI is undoubtedly the most powerful technology applied in the legal sector since the introduction of digital tools. But, as with any innovation, there is a challenge. Specifically for the legal sector, that challenge is precision.
The accuracy of AI models is directly linked to trust. After all, at the end of the legal production line, there is a client who bases his actions on what his lawyer has advised him.
Recently at Lexpo, it was reiterated that lawyers are "people who want to know exactly how things stand, before they will trust a solution enough to adopt it." As a former lawyer, I recognise and understand this all to well, so below I will explain as precisely as possible what we are actually talking about when we discuss the "precision" of an AI model. What do we measure?
Recall and Precision
Recall measures the ability of an AI model to identify all relevant cases within a dataset. For example, you have a dataset with 25 contracts and 25 emails. If the AI model has placed all 25 contracts in the "contracts" bucket, the recall is 100%.
Precision measures the accuracy of the positive predictions made by the AI model. For example, in the same dataset, if the "contracts" bucket contains 40 documents, of which 25 are contracts but 15 are emails, then the precision is 60% (15/25).
Improving recall can result in a decrease in precision because the AI model tries more broadly to find all positive cases, generating more false positives as a result. For example, the broader the criteria to classify a document as an "email," the more documents will end up in the "email" bucket. All emails will be in the "email" bucket, but the risk of non-email documents also ending up there increases. The art is to find a balance that suits the specific needs of the application.
Hallucination
Hallucination refers to situations where AI models generate information that is not supported by their training data or by reality. This phenomenon occurs when an AI model produces incorrect or completely untrue statements as if they were factual.
There are several reasons why hallucinations can occur:
- Insufficient training: the AI model is not trained with enough or varied data;
- Bias in the training data: the data used to train the model contains biases or inaccuracies; and
- Complexity of language: Language can be ambiguous. AI models can struggle to correctly interpret complex, subtle nuances, and context.
It's not that hallucination "just exists." To reduce hallucinations in AI models, various improvements can be applied both in the AI models themselves and in the software that uses these AI models:
- Improved training protocols: using larger, more representative, and reliably annotated datasets can help reduce hallucination;
- More robust model architectures: developing models that better handle uncertainties and variabilities in language;
- Post-processing techniques: applying techniques after the initial output generation to correct errors before the user sees the information; and
- User feedback: incorporating user feedback in the training process can help fine-tune and correct the models where necessary.
Consistency
Consistency refers to the ability of AI to produce stable outcomes that align with context or previous outputs. As the scope of the application grows, it can become more challenging to ensure consistency everywhere. Take, for example, a summary of a 100-page document. If you ask an AI model to summarize this document twice, you won't get two identical summaries. Both summaries can still be correct. It would be nice if they contain as much of the same "correct" important information as possible.
Examples of strategies to improve consistency in language models themselves and their applications:
- Context management: implementing techniques that help AI models retain a better understanding and memory of the context over multiple interactions; and
- Model architecture: using model architectures that can remember long dependencies within the data.
Hockey Stick
Hopefully, after reading this article, you understand exactly how precision of AI models is measured. The benefit to its users, when it comes to the precision of AI models, is that thousands of smart developers in a competitive market are daily working on improving the AI models and their applications, including their precision in the ways described above. The "learning curve," as lawyers so aptly call it, is incredibly steep. With this knowledge in mind, you might dare to outsource the first simple tasks to AI.