language:
- ja
1,000 Images – Japanese Invoices Collection Data. The data includes 500 images for basic editing and 500 images for professional editing. Data diversity includes different invoice contents, different editing types, and multiple invoice formats. The company name, address, name, fax number, phone number and other sensitive information on the invoice have been virtually edited and are not real information. The data can be used for tasks such as Japanese invoice detection, recognition, and end-to-end OCR.
For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/1841?source=Github
1,000 images, including 500 images for basic editing and 500 images for professional editing
different invoice contents, different editing types, multiple invoice formats
scanner
the data is stored in two formats: one is PDF format, and the other is JPG format (converted from PDF)
all sensitive fields—including company names, addresses, personal names, fax numbers, and telephone numbers—have been anonymized with synthetic data; no real-world identifiers remain
according to the collection requirements, the collection accuracy is not less than 95%
Commercial License