I am evaluating the Document Understanding Key-Value Extraction for the Invoice pre-trained model, and find it lacking some basic features. Specifically:
- There is no Currency field.
- The model is bad in recognizing the difference between US and European number notation.
E.g. Field Text 1.817,70
becomes Field Value 1.8177
.
- Similarly, the model is bad in recognizing the difference between US and European date notation.
Is there any way to influence the extraction?
Just the Language parameter seems very limited. I was able to get the correct number format by setting the language to DEU (which is by the way not the BCP 47 code for German), but we have invoices with all kinds of number and date notations and languages.