We started with a strategic assessment of business processes to understand the client’s requirements of a successful solution. We interviewed business stakeholders and real estate experts to identify key requirements.
Our data scientists used three models to address the contract intelligence challenges:
- First, we developed a machine vision model using convolutional neural networks (CNNs) to visually scan over the documents to identify locations of key areas in contracts. A machine learning classifier then determine if those areas contain signatures, stamps, barcodes, or tables among others. Identified areas are extracted and made available for search.
- Second, we implemented custom language models specifically trained on contract language to correct any OCR errors that were introduced by commercial software. In this process, we identified cost savings by replacing the commercial OCR software with open source software. Our implementation achieved better results, while saving 100% of the commercial software’s licensing cost.
- Third, we augmented pre-trained NLP models to focus on real estate contract language to extract entities and relationships between entities of interest.
With these models in place, we were able to extract the full text and real estate specific metadata from every document.
Our platform architects designed and implemented a cloud-based environment that scales with the volume of documents. The environment:
- Dynamically scales up during peak times, and scales back during low usage times to minimize cloud costs.
- Extracts all data using the ML/AI models and indexes all documents, giving users of the platform the ability to search for any and all entities across all document.
- Implements APIs to make all functionality available to an existing document management system.
- Creates alerts and notifications based on the status and content of contracts. If a contract is missing vital information, an alert is issued to the responsible real estate agent.