BLU DELTA AI Learning Accelerator: Bounding Box
The BLU DELTA AI learns from training data. In this article, you’ll find out how this process works and why the position information of the training data, also referred to as Bounding Box, plays a central role. Training data, also known as Ground Truth, can be provided via the Learn-API interface to train artificial intelligence.
Artificial Intelligence requires context, text, and layout information to understand documents. The context can come from external sources, such as the industry, the country where the invoice was issued, or directly from company data. Text and images are crucial for capturing the meaning of the information. The layout of the document provides implicit information about the position and grouping of texts and images, including tables, lists, and text blocks.
In most cases, AI needs all this information during the training process as well as later for quality measurement (benchmarking). Therefore, it’s essential to pass the position data of information to the AI during training.
Let’s consider a concrete example:
If the AI is to learn the order number in a particular document, the following training data (Ground Truth) is required:
“Value”: “258934”,
“X”: 479,
“Y”: 915,
“Width”: 127,
“Height”: 30
These values define the so-called Bounding Box, which is the convex hull that surrounds the text characters and determines its position.
Integration of the BLU DELTA Learn-API
The BLU DELTA AI offers its own interface, the Learn-API. This allows corrected values to be fed directly into the AI training process, whether from a workflow or a business process. The Learn-API accepts both the document and the essential document information as well as their position data.
It’s crucial that position data is captured during the correction in one’s own process or interface and then forwarded to the Learn-API. The interface must display the documents as images with a resolution of 300 dpi and allow the user to highlight text fields in the document.
In optimal integration, only the documents or invoices that the BLU DELTA AI is uncertain about are shown to the workflow. An operator reviews and, if necessary, corrects these by marking the relevant word, which is then taken over directly by the Learn-API. In the background, the workflow transfers this information directly to the Learn-API, so this correction is considered in the next training.
Optionally, the Learn-API can also accept values without position information. In such cases, a smart algorithm combined with AI tries to determine the Ground Truth (i.e., the corresponding position data for the values). However, this is not always successful. If this fails, a BLU DELTA Data Labeler must manually add the missing information, making the system learn slower.
Of course, the Learn-API can also only take over the documents. In this case, all training data must be manually checked. The information is prioritized by relevance (cluster size, active learning) and captured manually by the customer or a BLU DELTA Data Labeler before being included in the training.
For more technical details about the BLU DELTA Learn-API, visit www.bludelta.dev.
If you would like to find out more about data collection with BLU DELTA KI, we look forward to hearing from you.
BLU DELTA is a product for the automated capture of financial documents. Partners, but also our customers’ finance departments, accounts payable clerks and tax consultants can use BLU DELTA to immediately relieve their employees of the time-consuming and mostly manual entry of documents by using BLU DELTA AI and Cloud.
BLU DELTA is an Artificial Intelligence by Blumatix Intelligence GmbH.
Author: Christian Weiler is a former General Manager of a global IT company based in Seattle/US. Since 2016, Christian Weiler has been increasingly active in various roles in the field of artificial intelligence and has strengthened the management team of Blumatix Intelligence GmbH since 2018.
Contact: c.weiler@blumatix.com/span>