Enhancing Contract Analysis with AI in Shareflex
Study: Setting up an intelligent AI system capable of answering complex questions about company contract documents (PDF) stored in SharePoint Online with Shareflex Contract.
Examples of questions the system should address
- Identify contracts containing fine clauses and explain the costs in a worst-case scenario.
- Highlight contracts that could harm the company and explain the risks involved.
- Identify existing NDA agreements and summarize their contents.
Manually analyzing such questions can be time-consuming for a contract officer. This study explores whether AI can perform this task faster and more effectively.
Objective:
Focus exclusively on analyzing the contract PDF documents, not their metadata. Shareflex already provides an excellent interface for filtering and presenting contract metadata, so an AI system for metadata analysis is not required at this time.
To achieve this, I utilized Make.com to build automated workflows for an AI system that processes contract documents to answer these queries. While Make.com was chosen for its simplicity, similar results could be achieved using Python or other platforms.
Workflow Scenario 1: From Shareflex / SharePoint to Pinecone
In this workflow, contract PDF documents (along with some metadata) are summarized by OpenAI and stored as vectors in Pinecone. Vector representations of OpenAI summaries are used because OpenAI’s GPT-4 model cannot process all contract data in a single prompt due to its 120,000-token limit (approximately 200–300 pages of text). Keeping prompts concise improves accuracy.
Pinecone, a vector database, enables retrieval of the top 10 search results (for example) for any query. Learn more about vectors in this video.
The complete Make.com workflow scenario is illustrated below:
Key Workflow Components
1. SharePoint
The starting point is SharePoint Online, where contract list items and metadata are stored. Contract documents are located in SharePoint document libraries and are linked to their metadata using contract IDs.
2. OneDrive
Contract documents may also be stored in OneDrive. Using the contract ID from the metadata, the corresponding PDF documents are located.
3. CloudConvert
CloudConvert is used to convert PDF documents into plain text. This step is essential, as images or non-text formats cannot be processed by OpenAI.
4. HTTP Module
The HTTP module retrieves the text content of the converted documents from CloudConvert.
5. OpenAI
OpenAI is used to generate meaningful summaries of the contract documents. Prompts are tailored to exclude irrelevant details and focus on key legal terms and clauses. Fine-tuning these prompts is crucial for real-world applications.
You are tasked with generating a detailed, concise, and meaningful representation of the provided contract text This output will be used to create vector embeddings for a contract management system. Guidelines: Core Information Only: Focus on extracting and summarizing the most important aspects of the contract, including its core intent, key clauses, obligations, penalties, fine clauses, and other critical legal terms. Detailed Summary: Ensure all relevant information, especially legal obligations and potential financial risks such as fines, is clearly represented in the output. Avoid excessive brevity. Plain Text Output: The output must be in plain text and free from unnecessary characters or formatting. Character Restrictions: Do NOT use the following characters:`+ - * / % = ** @ # $ ^ & ~ ! ' " () [] {} : ; , . \ < > | `` Formatting: Use only alphanumeric characters and spaces in the output.Comprehensive Representation: Ensure no critical details, such as fine clauses, termination penalties, or obligations, are omitted. Additional Instruction: If any fines, penalties, or financial obligations are present in the contract, explicitly mention them and their conditions in your output. Here is the contract text: {{38.newlinesspaces1}}
6. Tools
This step removes newline characters and replaces them with spaces to prevent interference with subsequent processing.
7. OpenAI – Generate Vectors
OpenAI generates a vector representation of the summarized contract document using the following request:
8. Pinecone
The contract summary, along with relevant metadata, is stored in Pinecone. This enables efficient searching and retrieval of contract-related information.
Below is an example of a Pinecone database entry:
For each PDF contract document, a unique Pinecone record is created, containing vectors and associated contract metadata. Pinecone databases can store thousands of such records. The use of vectors adds a layer of security, as reversing vectors back into text is computationally infeasible without advanced resources.
Workflow Scenario 2: Finding Vectors Close to My Query
The goal of this scenario is to find vectors (contract PDF documents) in the Pinecone database that are closely related to a specific query:
“Identify contracts containing fine clauses and explain the costs in a worst-case scenario.”
The word closely related is key in this context.
The complete Make.com scenario is illustrated below:
Key Workflow Components
1. Google Sheets
Used to clear data from a Google Sheet stored during a previous run of this scenario.
2. OpenAI – API Call
OpenAI generates a vector representation of the query:
“Identify contracts containing fine clauses”
using the following request:
3. Pinecone
Using the OpenAI vector output, the Pinecone database is queried to return the top 10 vectors (contract documents) most closely related to the query vector. A limit of 10 is imposed to account for OpenAI’s prompt text limit.
4. OpenAI – Create a Completion Prompt
This is the most complex step. With the 10 Pinecone records retrieved in step 3 (each containing metadata such as raw contract text, titles, etc.), OpenAI is tasked with completing the following action:
You are tasked with determining the relevance of the provided content to the query related to company contract documents. Your evaluation will refine the reranking of initial vector search results for this query. Instructions: 1. Assess whether the content below aligns with the query. 2. Specifically, identify if the content includes or references fine clauses or penalties, even implicitly. Output: Return two values, separated by a comma: Relevance: Yes or No Confidence percentage: An approximate measure of certainty (e.g., 84%) Output Format examples: Yes, 84% No, 42% Query: Identify contracts containing fine clauses. Context: This analysis aims to find contract documents relevant to fine-related queries. Contract document content: Contract Document Content Part 1: {{13.metadata.`contract data raw 1`}} Contract Document Content Part 2 (if applicable): {{13.metadata.`contract data raw 2`}} Contract Document File Name: {{13.metadata.`contract title`}} Contract Metadata: Contract Record Title: {{13.metadata.`contract title`}} Contract Record Type: {{13.metadata.`contract type`}}
6. Collect the Data in Google Sheets
This step stores the results of the vector re-ranking in a Google Sheet. So far, we have identified some contracts with “Yes” values.
Workflow Scenario 3: Getting the Query Answered
In this scenario, we focus exclusively on the contracts that scored “Yes” in Workflow Scenario 2. At this stage, we no longer need to process all the other contract documents, as the scope has been narrowed down to a few contracts relevant to the query.
The Make.com scenario used here is displayed in the image below:
Key Workflow Components
1. Google Sheets
A Google Sheet now contains data for only the contracts relevant to the initial query.
2. Tools – Text Aggregator
This text parser creates a clean and organized text list of all the relevant data for further processing.
3. OpenAI – Create a Completion Prompt
The output of the text aggregator is used in the final OpenAI prompt to answer the query effectively.
You are an expert in legal contracts, specializing in detailed analysis and interpretation. Your reputation is built on providing precise, concise, and actionable insights into contract documents. Key Objectives: 1. Accurately answer the user's query about the provided contracts. 2. Clearly explain any findings, especially those directly relevant to the query. 3. If unsure or unable to fully respond: - Explain why a definitive answer cannot be given. - Specify what additional information or context is needed to provide a complete response. Your Role: Be helpful, transparent, and professional in your responses. Context of Analysis: You are tasked with analyzing contract documents stored in a vector database. The database utilizes Pinecone to store embeddings representing these documents. The following steps were used to prepare the data: 1. Vector Search: A query was executed to retrieve the most relevant contract documents. 2. Reranking: The initial vector search results were reranked using GPT-4O to ensure relevance, with a subset of the top-scoring results included here to fit within the context token limit. Table of Contents: Query: Identify contracts containing fine clauses and explain the costs in a worst-case scenario. Contract content: The raw contract text provided below contains details relevant to the query. Top Scoring Reranked Vector Search Results: {{2.text}} Action Plan: 1. Thoroughly review the query and reranked results. 2. Focus on fine clauses, worst-case costs, and any associated terms or triggers. 3. Respond clearly, referencing the search results where applicable. 4. Highlight limitations or any missing context needed for further accuracy.
OpenAI result
### Query: Identify Contracts Containing Fine Clauses and Explain the Costs in a Worst-Case Scenario
### Analysis of Contracts:
#### 1. **Contract: Non-Disclosure Agreement (NDA)**
– **Fine Clauses Identified**:
Section 5 (“Penalties and Fines”) establishes significant penalties for breaches of confidentiality:
– **Immediate Financial Penalty**: €10,000,000 for each instance of unauthorized disclosure or use.
– **Ongoing Financial Penalty**: €1,000,000 per day until the breach is rectified to the satisfaction of the Disclosing Party.
– **Additional Compensatory Damages**: Any actual damages resulting from the breach, including military operations disruptions or national security threats.
– **Worst-Case Costs**:
In the event of multiple simultaneous breaches (e.g., unauthorized disclosures across several documents), the fines would compound. For example:
– **Single Disclosure Over 10 Days**: €10,000,000 (initial penalty) + €10,000,000 (daily fines) = €20,000,000.
– **Multiple Instances**: If three separate breaches occurred and remained unresolved for 10 days, the penalties would escalate to €60,000,000 (€30,000,000 for initial fines + €30,000,000 for ongoing penalties).
– **Additional Damages**: Compensatory damages (potentially unlimited) could further increase costs, but these would depend on quantifiable harm caused by the breach.
#### 2. **Contract: SharePoint Support Services**
– **Fine Clauses Identified**:
Section 5.1 introduces penalties for failure to meet Service Level Agreement (SLA) response/resolution times:
– **High Priority Issues**: €2,000 per occurrence.
– **Medium Priority Issues**: €1,000 per occurrence.
– **Low Priority Issues**: €500 per occurrence.
– **Cumulative Clause**: If SLA breaches exceed 10 occurrences within a calendar year, the Client may terminate the agreement.
– **Worst-Case Costs**:
Assuming frequent SLA violations, the worst-case scenario would depend on the number of breaches, priority levels, and whether the breaches cause contract termination. For instance:
– **10 High-Priority Breaches in One Year**: €20,000 in penalties (€2,000 x 10).
– Extended SLA issues leading to termination would involve opportunity costs, unquantified value of unprovided services, and damage to reputation.
#### 3. **Contract: SharePoint Fixed-Price Application Development**
– **Fine Clauses Identified**:
Section 4.1 specifies penalties for delays:
– **Missed Deadlines for Milestones**: If completion is delayed by more than 2 months, the Client may request a full refund of payments made for the delayed milestone.
– **Late Completion Penalty**: €10,000 penalty per delayed milestone exceeding 2 months.
– **Termination Option**: The Client may terminate the agreement if delays exceed 3 months.
– **Worst-Case Costs**:
– If **all four milestones** are delayed by over 2 months and the delays persist beyond 3 months:
– Full refunds for all milestones (€200,000) + €40,000 penalties (€10,000 x 4 milestones) = €240,000 total.
– Contract termination would result in no further payouts, but additional costs could include hiring another developer to complete the project.
—
### Summary of Worst-Case Scenarios Across Contracts:
1. **NDA**: Penalties for breaches could quickly escalate into tens or hundreds of millions of euros (€20,000,000+ for a single unresolved breach over 10 days). Compensatory damages are unbounded, limited only by the scale of damages proved.
2. **SharePoint Support Services**: SLA breaches are less costly (€20,000 for 10 SLA violations, for example), but repeated issues could lead to contract termination.
3. **SharePoint Fixed-Price Development**: Delays and penalties may result in €240,000 in worst-case financial loss (full milestone refunds + penalties), excluding additional costs for terminating and replacing the Developer.
### Missing Context:
The NDA penalties, though stark, rely on the assumption there’s no mitigation or dispute over breach severity or occurrence. Similarly, for both SharePoint contracts, real-world costs may include ancillary losses (e.g., reputational harm) that go beyond the contractual fines. If specific scenarios (e.g., number of breaches, their resolution, or dispute outcomes) are known, I could refine this cost analysis further. Let me know if you’d like deeper insights into one part.
Conclusion
The results are excellent: the three contract documents containing fine clauses were successfully identified, and the worst-case scenario I requested has been accurately estimated.
In the Shareflex Contract application, 16 contract PDF documents were stored across 10 contract records. I carefully created different contract document contents using ChatGPT, ensuring that the setup included appropriate test content. In 3 of these contract documents, I specifically asked ChatGPT to include fine clauses, and the Make.com scenarios successfully identified only these 3 documents.
The workflow automations are not yet production-ready as user interfaces are still missing. However, creating these interfaces is not expected to be a significant challenge.
In the near future, OpenAI’s token limit is anticipated to increase substantially, enabling the processing of much larger datasets.
This same setup can also be adapted for other purposes, such as analyzing application code to identify parts responsible for sending data to other systems, for example.