Automating data extraction from trade documents
ChallengeAutomate the process of document processing and key data extraction from various scanned trade documents
SolutionIn-house-built ML models for document classification and parsing to extract and interpret data from trade documents
ResultsMean accuracy of 92% and 90% decrease in manual data entry
About the Project
Identify and extract key information from scanned trade documents
Transactions in international trade involve dozens of stakeholders, such as importers, insurance companies, shipping lines, exporters, and customs agencies, each generating and transferring documents.
To import goods, traders need to fill in a customs declaration form that contains information about the goods, transport details, and other relevant information such as the invoice, bill of lading, air waybill, packing list, etc. Traders use these scanned documents to manually extract key information and fill out the import declaration form. This process is highly time-consuming and error-prone.
This industry problem entails a set of technical challenges:
- The same type of document can have various layouts depending on the creator of the document (for example different companies use different invoice layouts when issuing the invoice)
- The key information placement may differ on different documents
- Documents may come in different languages
- There is significant noise in scanned documents. They may have rotated pages or a poor-quality or low-resolution scan, which makes the content less readable
- Multiple fonts and text sizes make text recognition challenging
Based on the Client’s business needs, we delivered a product that does the following:
Want to see how the solution works in practice?