Despite the setback from Steven Schwartz’s botched use of a ChatGPT-crafted legal document filled with fake references, we’re still intrigued by how well large language models (LLMs) like ChatGPT can process and produce text. Will these AI powerhouses one day conjure personalized legal motions, briefs, or contracts out of nothing? Could they possibly outperform traditional document automation systems with their sometimes rigid templates and pre-set questions? I see at least 5 ways that LLMs will speed up and replace some aspects of document automation.

Document automation is usually thought of as a more advanced form of filling in the blanks. It takes templatized documents and adds an interactive question and answer layer. A lot of lawyer and engineering time can go into building document automation systems for any given document template, which is why large language models are an attractive alternative.

However, large language models, advanced text prediction engines that have recently captured the imagination of the general public, have a fundamental limit. GPT-4 can’t be counted on to produce the same results each time, and it is likely to “hallucinate” or invent details when tasked with generating large amounts of text. While these dangers can be reduced by providing very specific training data, they are inherent to the design of large language models. It’s a question of entropy. When the GPT-4 model is given less information than the generated text it produces, it’s not possible for the generated text to be reliable.

As Attorney Schwartz found out, too, there’s a serious ethical concern with relying on generated content without the ability to supervise and edit it. What I have found works well is to turn the roles on their head. Instead of editing and supervising LLM-generated text, use the LLM as an editor and interrogator to make sure you include all of the information that you need.

Here are 5 ways that generative AI can safely help in the world of document automation:

  1. Eliminating unneeded questions in interview-style document automation programs by extracting the details from existing sources, just like lawyers refer to intake notes and primary materials when starting the drafting of a document.
  2. Identifying the missing information in a narrative and asking the follow-up questions that it takes to make a compelling argument.
  3. Drafting rules that can both be verified and improved by human authors and be reused again and again.
  4. Drafting templates by adding variable markup to terms in sample documents.
  5. Constrained drafting of guided interviews from those templates.

These tasks put large language models in a supportive role, safe from fake citations and other hallucinations. Let’s consider the tasks one at a time.

Extracting and summarizing primary materials and notes

A frustrating reality of document automation tools is that they can turn the user into a data entry machine. But often the information needed to complete the template can be found in some other source, whether narrative intake notes, primary sources like leases, or a case management system.

I’ve found that GPT-4 can do a great job at what is sometimes called “entity extraction.” You can turn GPT-4 loose on those unstructured intake notes and ask it if the notes or primary sources contain answers to any of the questions needed to complete the template. If so, making those answers the default values can save the end user a lot of time.

When using an LLM in this way, it helps to give the model the list of information that you want to extract in an explicit format, like JSON. For example:

Below is a JSON datastructure that shows a list of fields and a brief description of the meaning of each field. It is followed by a narrative from a user. Respond with a JSON datastructure that lists a value extracted for each field that has an answer in the narrative and that sets the value of questions that aren’t answered in the narrative to a null value. If you are not confident, leave the answer as a null value.

{“user_name”: “The user’s name”, “op_name”: “Name of the opposing party”, “jurisdiction”: “state, city or county where the user lives”}

“Joseph Conrad / Heart of Darkness LLC is trying to evict our client Jane Smith from her home in Maryland. I don’t think he should be able to evict me because he hasn’t made any repairs to Jane’s home in the last 10 years. She has a leaky faucet and a problem with bed bugs.”


And here is a sample response:

Based on the narrative given by the user, the appropriate JSON data structure would be:

“`
{
    “user_name”: “Jane Smith”,
“op_name”: “Joseph Conrad / Heart of Darkness LLC”,
“jurisdiction”: “Maryland”
}
“`

If an answer to a question requires the summary of a source document, GPT-4 can help here, too. Ask it a question like “summarize the text of this lease and include the most important provisions related to this alleged breach” and it can create a good first draft.

Asking follow-up questions

Sometimes document templates contain sections where the user of the program needs to add a narrative or summary. Document automators may vacillate between asking a very rigid series of questions to construct these narratives or leaving a large open text area for the user of the tool to complete. 

The problem with the structured question approach is that it is both rigid and very time consuming to create. There’s also a real risk of asking redundant questions or the user providing details in the wrong place. 

The problem with the open text area approach is that it requires the user of the tool to know what to say and how to say it. The open text area approach risks producing a final document that doesn’t include all of the details that it needs to.

LLMs allows the author of a document automation tool to chart a path in the middle. To do this, the document automation author should come up with a list of important details that the narrative needs to contain. The user can be asked a broad open-ended question, and the first draft is fed into the LLM with an instruction to ask follow-up questions until all of the important details have been provided.

Let’s extend the earlier example to demonstrate this question and answer format:

What follow-up questions would be needed to answer if this person has a claim for violation of the warranty of habitability in Maryland? provide the list of variable names and questions in the same JSON format.

In order to determine whether there’s a potential claim for violation of the warranty of habitability in Maryland, we would need more details. These details might include the following fields, with associated questions:

“`

{
    “notification_to_landlord”: “Did Jane notify the landlord about the issues in her home and request repairs?”,
    “response_from_landlord”: “What was the landlord’s response to Jane’s request for repairs?”,
    “condition_duration”: “How long have these conditions (leaky faucet and bed bugs) persisted?”,
    “living_conditions”: “Are the living conditions unhealthy or unsafe due to these issues?”,
    “rent_payment_status”: “Has Jane continued to pay rent during this period?”
}

“`

A document automation system could directly take the output from the LLM to ask the additional suggested questions, and repeat the process to extract the responses and to either generate a draft narrative response or to check additional boxes in the output document.

The human can review the responses and rewrite in their own voice as needed.

Drafting rules for human review

The kind of document automation that I specialize in, documents used in litigation, frequently revolve around rules: statutes and case law that tell us whether a claim or a defense can be included in an answer, complaint, or a brief. But rules appear throughout all kinds of document automation projects; for example, to turn on or off optional clauses in the generated document. It’s important for these rules to be right.

Drafting these rules in a way that computers can consume can be hard for humans to do but the knowledge required to generate these rules is often in the head of non-coder attorneys. My friend Jason Morris and rules as code advocate created a graphical tool, Blawx that attempts to empower attorneys in this task, but it remains complex. Rules as code scholars also use another language, called ErgoAI (formerly named Flora-2).

ErgoAI rule representations look a bit like this:

@!{violates_ROW}
?I[violatesRightOfWay -> ?Q] :-
?S : DrivingSituation[participant -> {?P,?Q}:Entity],
?I : IllegalDrive[agent -> ?P, follows -> ?S],
?P != ?Q.

These representations have many benefits to the automation of law and could have a robust place in document automation. But they are hard to author.

One interesting way to use LLMs is to ask for plain language versions of these rules and ask the computer to convert them into the “rules as code” language. GPT-4 does a good job at this. It can also generate test cases. These test cases, the code, and graphical representations of the rules can all be reviewed carefully by humans. Once they have been created and validated, they can be used again and again. Finally, the LLM can be used to generate a plain language explanation of the rule to double-check the author’s understanding. This representation will not be foolproof, but it can be a useful check.

Once the rule has been generated, it can be used in a modular way in the document automation tool itself.

While rules as code authors are enthusiastic and the promise is undeniable, it’s only been applied to a small amount of legislation. LLMs open up the ability to draft these rules to a much wider audience.

Drafting templates for human review

LLMs ability to summarize and extract information can help with converting sample precedent documents into generalized templates. You can provide the LLM with the document and ask it to convert anything that should become a variable into the proper syntax.

Sample prompt: 

Below is the text of a contract. Assume I want to use this contract between new parties. Use the Jinja2 syntax to replace anything that should be a variable with an appropriate variable name in snake_case. Use Jinja2 syntax to mark anything that might be an optional clause with the appropriate syntax (jinja2 condition).

I used a real contract that I entered into with this prompt, and got back this plausible formatted document:

INDEPENDENT CONTRACTOR AGREEMENT

This INDEPENDENT CONTRACTOR AGREEMENT (“Agreement”) is entered into as of this {{ day }} day of {{ month }}, {{ year }} (“Effective Date”) by and between {{ client_name }}, a {{ client_state }} not-for-profit corporation (“Client”) and {{ contractor_name }} d.b.a. {{ contractor_business_name }} (“Contractor”).

RECITALS

WHEREAS, Client is a non-profit organization with a mission to {{ client_mission }};

WHEREAS, Contractor is engaged in the business of providing {{ contractor_service }};

TERMS

  1. Delivery and Payment of Services. Client will engage Contractor to provide the services and deliverables (collectively, “Services”) described on the statement of work attached hereto as Schedule A and made a part hereof, or any supplemental Services agreed upon by the parties in writing in a supplemental statement of work (collectively referred to as “SOW”). Client and Contractor have agreed to a project budget and payment terms as reflected in Contractor’s estimated project budget provided to Client. Upon the written request of Client, Contractor shall provide project plans, progress reports, and appropriate documentation.

{% if optional_clause_2 %} 2. Term. …. {% endif %}

This template is unlikely to be usable at first. But the process of marking up a template is very time consuming. Using the LLM for a first draft of the template may save a lot of developer time.

Results might be improved by also pre-training the LLM with a list of suggested variable names for some of the most common fields so it is consistent across multiple templates.

Drafting document automation workflows

If your document automation system uses input files that are in plain text, the LLM can be used to generate some of the interview code, too. 

For example, your prompt may be as simple as “Create a Docassemble interview that asks questions to fill in these fields.”

The main challenge to this approach is that existing large language models might need fine-tuning to produce syntactically correct code in document automation platforms, which have limited public examples online. An approach that works around this limitation is to ask the LLM to produce a known intermediate language, such as JSON, that can be converted into the proper format in an automated way, or to describe the appropriate format with more detail.

A tool that I built at the LIT lab, named the Weaver, already does a good job of producing a guided interview from a template that is appropriately marked up. Adding an LLM into this pipeline is something we are actively evaluating. The Weaver can continue to generate readable, syntactically correct code, and we may be able to use an LLM to make the choices that humans were previously required for, such as naming fields and assigning them to screens.

Other things to think about

Document automation fills an important need that is unlikely to be entirely replaced by LLMs. But as AI models continue to evolve, we can expect more sophisticated document automation solutions that could fundamentally change how businesses operate. We might also see the price of document automation drop and more areas automated as LLMs can be integrated to reduce the effort involved in the most time consuming steps. We must, however, remain aware of the potential ethical implications and ensure that adequate safeguards are in place so that we use LLMs in the areas where it can be the safest.

My consulting company, Lemma Legal, is ready to help integrate AI responsibly into your document automation projects. Reach out to us if you would like to learn more.