top of page

Finetuning a LLM to Automate Due Diligence

due diligence, LLM

Due diligence (DD) is the investigation or exercise of care that a reasonable business or person is normally expected to take before entering into an agreement or committing to a transaction. It is widely used and often required for a variety of business practices, such as mergers and acquisitions, equity research, government project assessment/approvals and compliance investigation. DD tasks are labour intensive, costly and sometimes incredibly monotonous, requiring professionals to search, review and summarise large amounts of documents like contracts, reports and policies.

Large Language Models (LLM) bring efficiency, scalability, consistency as well as cost-effectiveness to any manual tasks that require intensive search, reading and writing of textual data, as demonstrated by ChatGPT; they hold the promise of saving tremendous manual reading and writing efforts by professionals while also offering reliability and high accuracy.

Here we present our internal case study that trains a LLM to automate multiple due diligence tasks. Specifically, we finetune an open source LLM, Llama-2, to accomplish six common due diligence tasks, including

  • classification,

  • Q&A,

  • Q&A with explanation,

  • text generation,

  • summarisation and

  • auditing.

The objectives of our study are to:

  • study the feasibility of finetuning a single LLM model to automate multiple DD tasks

  • establish a repeatable process for building business-specific LLM applications

The key findings of our case study are:

  1. Through the instruction finetuning approach, even with a small training dataset, LLM does learn to perform multiple custom tasks. Our methodology of finetuning a large LLM is practical and effective, although we need to rely on powerful GPU servers and overcome a number of technical hurdles.

  2. The finetuned model can perform some DD tasks quite well, and generate answers in a consistent format and style commensurate with our instruction prompts. When building business applications, apart from the concerns with accuracy, another important driver is to manage risks. Our methodology showcases how to exercise controls on LLMs to produce intended output format and languages through the finetuning process.

  3. The finetuned model does not always perform better than a base model. The quality of curated instruction prompts is the key to the success of building custom LLM applications, which is more impactful on the model performance than the size of the training dataset. This is consistent with the findings of various academic studies and empirical studies.

If you are interested in learning more about our case study, you can download the full report below. It contains detailed description of our approach and sample outputs of the DD tasks.

Download PDF • 386KB

31 views0 comments

Recent Posts

See All


bottom of page