Researchers have developed an AI-based solution which can automatically analyse and extract large amounts of information from computer documents.
The team from Loughborough University and Xceptor – a ‘no-code data automation platform’ developer – has created a deep learning model for natural language processing (NLP) that can analyse the content and structure of invoices, tax forms and other digital documents and sort the information into categories.
The system is designed to streamline processes such as setting up bank accounts, approving mortgages, responding to customer queries and processing insurance claims by speeding up fraud checking and extracting details from identity documents.
Lead developer Chao Zhang, of Loughborough’s department of Computer Science, said the technology was faster and cheaper than current systems which perform the same task and would benefit similar tasks in the banking, financial service and insurance sectors.
He said: “Compared with the traditional rule-based or pattern matching approaches, the developed NLP can identify terms, learn language structures, extract contextual correlation and classify texts into semantic groups and clauses, such as invoice numbers, payee addresses, counterparty names as well as distinguishing due date with invoice date.”
The AI model was trained to deal with complex freeform contents and robustly extract information linking with context rather than relying on pre-defined templates in texts and is built on state-of-the-art deep learning technology.
The concept of graph modelling was introduced in the learning process, to improve the model performance on complex documents which may include tables, blocked texts with spatial alignment information. Such documents are more difficult to process than plain texts in paragraphs.
Research was conducted as part of an 18-month Knowledge Transfer Partnership project, jointly funded by Xceptor and Innovate UK.
The academic lead Professor Baihua Li, from Loughborough’s School of Science, added: “Extracting required information from a large number of documents is currently a very time-consuming manual process. Developing AI solutions to learn contextual meaning and correlation presented in complexly structured documents is extremely challenging.
“We are pleased that Loughborough University’s specialists in NLP and machine learning are working with Xceptor on this game changing innovation and can successfully integrate the AI automation function into the company’s smart document analysis platform for improved speed and accuracy.”