Automating data extraction from trade documents
Highlights
Challenge
Automate the process of document processing and key data extraction from various scanned trade documentsSolution
In-house-built ML models for document classification and parsing to extract and interpret data from trade documentsResults
Mean accuracy of 92% and 90% decrease in manual data entryAbout the Project
CHALLENGE
Identify and extract key information from scanned trade documents
Transactions in international trade involve dozens of stakeholders, such as importers, insurance companies, shipping lines, exporters, and customs agencies, each generating and transferring documents.
To import goods, traders need to fill in a customs declaration form that contains information about the goods, transport details, and other relevant information such as the invoice, bill of lading, air waybill, packing list, etc. Traders use these scanned documents to manually extract key information and fill out the import declaration form. This process is highly time-consuming and error-prone.
This industry problem entails a set of technical challenges:
- The same type of document can have various layouts depending on the creator of the document (for example different companies use different invoice layouts when issuing the invoice)
- The key information placement may differ on different documents
- Documents may come in different languages
- There is significant noise in scanned documents. They may have rotated pages or a poor-quality or low-resolution scan, which makes the content less readable
- Multiple fonts and text sizes make text recognition challenging
SOLUTION
Based on the Client’s business needs, we delivered a product that does the following:
RESULTS
Want to see how the solution works in practice?