April 04, 2022
At Portmind, we not only deliver AI/ML solutions to our clients, but we also focus on scientific research to find new and more efficient methods to solve various ML problems.
The Portmind research team recently published a scientific paper titled “Tokengrid: Towards More Efficient Data Extraction from Unstructured Documents” in IEEE Access. The paper covers our new method for automatic information extraction from invoices. The underlying technology can be used for other document types as well.
We are proud of our researchers and the work they do! Follow the link below to access the paper and explore our new method.
Key information extraction from unstructured documents is a practical problem in many industries. Machine learning models aimed at solving this problem should efficiently utilize the textual, visual, and 2D spatial layout information of the document. Grid-based approaches achieve this by representing the document as a 2D grid and feeding it to a fully convolutional encoder-decoder network that solves a semantic instance segmentation problem. We propose a new method for the instance detection branch of that network for the task of automatic information extraction from invoices. Our approach reduces this problem to 1D region detection. The proposed network has fewer parameters and a shorter inference time. Additionally, we suggest a new metric for evaluating the results. Learn more.