So How Good is ChatGPT at Drafting Contracts?
TermScout's CTO Evan Harris puts ChatGPT to the test
Well, a good Tuesday to you too! Fancy some legal disruption?
Today I'm sharing an insider look at how a contract tech startup, TermScout (in which I am a very enthusiastic investor), runs a deep dive into ChatGPT's capabilities in light of their own technology. Evan Harris is the CTO of TermScout and meeting him all the way back in 2021 was a significant factor that pushed us to invest in TermScout’s seed round last year. TermScout is currently working with some of the top tech companies in the world like IBM, NetApp and Qualtrics. He is today’s guest author and the first guest author of Zach Abramowitz is Legally Disrupted. Evan’s analysis is a must-read for any lawyer who wants to understand better/"feel" some of ChatGPT's weaknesses as well as see where the future is headed.
We spend a lot of time with contracts at TermScout. We’re always looking for the best tools available to enable us in our mission to make contracts more simple and useful. When ChatGPT came online a few months ago, we immediately tried to get it to draft, compare, explain, rate and review contracts in similar ways to existing AI techniques. The bottom line: we’re truly impressed. We can’t wait to see how this technology progresses and which use cases crop up as new products in the months ahead.
Imagine telling a contract lawyer a few years ago that soon you would be able to use an AI system to draft a contract, coach it to make revisions in real time, and then use another AI system to analyze that contract and rate it’s favorability, all in a few minutes, entirely for free. This is all possible today – this post will walk you through the exercise step by step. We’ll see if ChatGPT can draft a contract with specific instructions to make it more favorable to one of the two parties. Then we’ll use TermScout’s AI to review the contract and see how we did.
What is ChatGPT? What is TermScout?
Before we jump in, let’s give a brief overview of these two AI tools.
ChatGPT is a conversational language model developed by OpenAI, capable of generating human-like responses based on the input provided. In fact, ChatGPT generated the previous sentence. Put simply, it’s a chatbot. It will operate in nearly any domain you can think of. Of course, today, we’re experimenting with contracts.
TermScout is a contract review AI platform that specializes in rating contract favorability. Its AI distills a contract down to the information that truly matters to the parties involved. It uses favorability ratings and market data to help them make decisions about signing and negotiation. Functionally, it’s a web application with a free version that people can use to create an account, upload a contract that they’re considering signing, and get results within a few minutes.
Contract Drafting with ChatGPT
The following screenshots are from a fresh ChatGPT session using their publicly available web application. There is no configuration or priming for the given subject matter, you just give it tasks to complete and ask it questions. As a starting point, we’ll ask it to generate a contract with a simple prompt.
At first glance, this went pretty well. We’ve got a fairly standard software terms of service that looks like any publicly hosted software clickthrough agreement. It’s worth noting at the outset how cool this is. ChatGPT doesn’t keep boilerplate software contracts on hand. It generates this text on the fly without being explicitly trained to write contracts.
A seasoned software contract expert could find a lot to argue with as to whether or not this contract favors the customer, and they might have some nitpicks on style and phrasing. Let’s dig in a little more and see if ChatGPT can articulate some specifics about the previous task.
There is some coherent reasoning at play, but something is wrong. One of the three items isn’t particularly customer favorable: the limitation of liability clause. This clause limits the vendor’s liability, with no mention of the customer’s liability. Additionally, according to TermScout’s market data, capping the vendor’s liability at 12 months’ fees is the most popular liability cap among both vendor forms and negotiated contracts. This clause isn’t going out of its way to favor the customer.
Will ChatGPT respond to some coaching?
Even though it didn’t initially think to offer a mutual limitation of liability, it responds well to this nudge.
Let’s see if we can get it to draft additional clauses that would be favorable to the customer.
Similar to the limits on liability clause, the indemnification clause is also one-sided, favoring the vendor. The vendor can also change the terms without notice. Let’s address both of these.
This went okay. We achieved the nudge towards mutual indemnification. But this clause is a little off. It appears to be conflating the concepts of indemnification and limitation of liability. We don’t typically see caps on indemnification like this, while they are extremely common in limitations of liability.
Finally, we’ll see if it can organize all of these revisions into a final draft.
This was really interesting. Some of our revisions didn’t make it into the final draft. Assignment and compliance with laws made it in, the changes to terms edits made it in, but indemnification and limits on liability reverted back to being one-sided in favor of the vendor. We also see a number of changes unrelated to anything we discussed with ChatGPT.
Contract Review with TermScout
We now have a contract that we can upload to TermScout. Regardless of our opinion of the contract’s favorability, TermScout’s AI will offer a third party review. We copied and pasted ChatGPT’s final draft and uploaded it to TermScout. Here are the results:
This tells us right away that:
Even after coaching ChatGPT through revisions, the contract is still more favorable to MyCo than it is to the customer.
The expected negotiating effort is high, meaning if you’re the customer and you plan to negotiate with MyCo, you’re likely to have many points of contention to work through.
14 red flags were found from the customer’s perspective.
A specific state or country was not specified for governing law or dispute resolution.
TermScout also gives us a “Term Sheet” that summarizes the key information in the contract:
Since TermScout sorts its summary by the topics that matter most in software contracts, we can see the issues raised to ChatGPT right away with regards to limits on liability and indemnification.
TermScout also lists the red flags detected by its AI. Note that TermScout not only identifies red flags found in explicit source language in the contract, but also clauses that it would have expected to find in a customer favorable contract but didn’t.
Again, we see the unlimited customer liability and lack of mutual indemnification. TermScout also raises other issues like lack of protection for the customer’s confidential information (the contract protects the vendor’s confidential information, not the customer’s), lack of requirement to notify the customer of a security breach, and lack of vendor’s commitment to security standards (GDPR was in the original draft but didn’t make it into the final).
We can also drill in to see market data for a specific red flag. TermScout offers an opportunity to bring the contract closer to market and make it more favorable to the customer. The below chart shows how often the vendor does not offer any warranties depending on whether you’re looking at the vendor’s starting position, the customer’s starting position, or negotiated contracts.
Bottom line: you can spend hours working through these types of scenarios with ChatGPT. In this exercise, we see an AI tool that gets into the arena of contract drafting with party favorability in mind, but doesn’t quite nail it. After coaching it through revisions, we still ended up with a vendor favorable contract.
If there is one thing we’ve learned in this field, it’s not to underestimate what the future will bring. Here’s a few things to mention in ChatGPT’s defense:
It hasn’t been trained explicitly on the task of contract drafting and recognizing party favorability in software contracts. Fine tuning GPT-3 on this task could yield significantly better results.
Prompt engineering goes a long way (carefully phrasing the questions, adding context, giving examples, etc.). We didn’t do much of that in this exercise.
OpenAI is constantly iterating. If the current pace of progress continues, new releases and enhancements could blow the performance we see here out of the water. Prompt engineering ought to become less important as the underlying models improve.
Contracts are long. ChatGPT would likely do a better job working on one clause at a time vs drafting a contract in its entirety like this.
If you found this interesting, the best way to start exploring AI and contracts is to start using these tools. Both TermScout and ChatGPT have free versions. As you saw here, you don’t need any proprietary data, sensitive contracts or much expertise to get started.
Thanks again Evan, and thanks for reading! You have now been legally disrupted.