|
| 1 | +--- |
| 2 | +title: Custom classifier model - Form Recognizer |
| 3 | +titleSuffix: Azure Applied AI Services |
| 4 | +description: Use the custom classifier model to train a model to identify and split the documents you process within your application. |
| 5 | +author: vkurpad |
| 6 | +manager: nitinme |
| 7 | +ms.service: applied-ai-services |
| 8 | +ms.subservice: forms-recognizer |
| 9 | +ms.topic: conceptual |
| 10 | +ms.date: 03/08/2023 |
| 11 | +ms.author: lajanuar |
| 12 | +ms.custom: references_regions |
| 13 | +monikerRange: 'form-recog-3.0.0' |
| 14 | +recommendations: false |
| 15 | +--- |
| 16 | + |
| 17 | +# Custom classifier model |
| 18 | + |
| 19 | +**This article applies to:**  **Form Recognizer v3.0**. |
| 20 | + |
| 21 | +Custom classifier models are deep-learning-model types that combines layout and language features to accurately detect and identify documents you process within your application. Custom classifier models can classify each page in a input file to identify the document(s) within and can also identify multiple documents or multiple instances of a single document within an input file. |
| 22 | + |
| 23 | +## Model capabilities |
| 24 | + |
| 25 | +Custom classifier models can analyze a single- or multi-file documents to identify if any of the trained document types are contained within an input file. Here are the currently supported scenarios: |
| 26 | + |
| 27 | +* A single file containing one document. For instance, a loan application form. |
| 28 | + |
| 29 | +* A single file containing multiple documents. For instance, a loan application package containing a loan application form, payslip, and bank statement. |
| 30 | + |
| 31 | +* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices. |
| 32 | + |
| 33 | +Training a custom classifier model requires at least 2 distinct classes and a minimum of 5 samples per class. |
| 34 | + |
| 35 | +### Compare custom classifier and composed models |
| 36 | + |
| 37 | +A custom classifier model can replace [a composed model](concept-composed-models.md) in some scenarios but there are a few differences to be aware of: |
| 38 | + |
| 39 | +| Capability | Custom classifier process | Composed model process | |
| 40 | +|--|--|--| |
| 41 | +|Analyze a single document of unknown type belonging to one of the types trained for extraction model processing.| ● Requires multiple calls. </br> ● Call the classifier models based on the document class. This step allows for a confidence-based check before invoking the extraction model analysis.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model containing the model corresponding to the input document type. | |
| 42 | +|Analyze a single document of unknown type belonging to several types trained for extraction model processing.| ●Requires multiple calls.</br> ● Make a call to the classifier that ignores documents not matching a designated type for extraction.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model. The service will always pick a custom model within the composed model with the highest match.</br> ● A composed model cannot ignore documents.| |
| 43 | +|Analyze a file containing multiple documents of known or unknown type belonging to one of the types trained for extraction model processing.| ● Requires multiple calls. </br> ● Call the extraction model for each identified document in the input file.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model.</br> ● The composed model will only invoke the component model once on the first instance of the document. </br> ●The remaining documents are ignored. | |
| 44 | + |
| 45 | +## Language support |
| 46 | + |
| 47 | +Classifier models currently only support English language documents. |
| 48 | + |
| 49 | +## Best practices |
| 50 | + |
| 51 | +Custom classifier models require a minimum of five samples per class to train. If the classes are very similar, adding additional training samples will improve model accuracy. |
| 52 | + |
| 53 | +## Training a model |
| 54 | + |
| 55 | +Custom classifier models are only available in the [v3.0 API](v3-migration-guide.md) starting with API version ```2023-02-28-preview```. [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. |
| 56 | + |
| 57 | +When using the REST API, if your documents are organized by folders, you can use the ```azureBlobSource``` property of the request to train a classifier model. |
| 58 | + |
| 59 | +```rest |
| 60 | +https://{endpoint}/formrecognizer/documentClassifiers:build?api-version=2023-02-28-preview |
| 61 | +
|
| 62 | +{ |
| 63 | + "classifierId": "demo2.1", |
| 64 | + "description": "", |
| 65 | + "docTypes": { |
| 66 | + "car-maint": { |
| 67 | + "azureBlobSource": { |
| 68 | + "containerUrl": "SAS URL to container", |
| 69 | + "prefix": "sample1/car-maint/" |
| 70 | + } |
| 71 | + }, |
| 72 | + "cc-auth": { |
| 73 | + "azureBlobSource": { |
| 74 | + "containerUrl": "SAS URL to container", |
| 75 | + "prefix": "sample1/cc-auth/" |
| 76 | + } |
| 77 | + }, |
| 78 | + "deed-of-trust": { |
| 79 | + "azureBlobSource": { |
| 80 | + "containerUrl": "SAS URL to container", |
| 81 | + "prefix": "sample1/deed-of-trust/" |
| 82 | + } |
| 83 | + } |
| 84 | + } |
| 85 | +} |
| 86 | +
|
| 87 | +``` |
| 88 | + |
| 89 | +Alternatively, if you have a flat list of files or only plan to use a few select files within each folder to train the model, you can use the ```azureBlobFileListSource``` property to train the model. This requires an additional ```file list``` in [JSON Lines](https://jsonlines.org/) format. For each class, add a new file with a list of files to be submitted for training. |
| 90 | + |
| 91 | +```rest |
| 92 | +{ |
| 93 | + "classifierId": "demo2", |
| 94 | + "description": "", |
| 95 | + "docTypes": { |
| 96 | + "car-maint": { |
| 97 | + "azureBlobFileListSource": { |
| 98 | + "containerUrl": "SAS URL to container", |
| 99 | + "fileList": "sample1/car-maint.jsonl" |
| 100 | + } |
| 101 | + }, |
| 102 | + "cc-auth": { |
| 103 | + "azureBlobFileListSource": { |
| 104 | + "containerUrl": "SAS URL to container", |
| 105 | + "fileList": "sample1/cc-auth.jsonl" |
| 106 | + } |
| 107 | + }, |
| 108 | + "deed-of-trust": { |
| 109 | + "azureBlobFileListSource": { |
| 110 | + "containerUrl": "SAS URL to container", |
| 111 | + "fileList": "sample1/deed-of-trust.jsonl" |
| 112 | + } |
| 113 | + } |
| 114 | + } |
| 115 | +} |
| 116 | +
|
| 117 | +``` |
| 118 | + |
| 119 | +File list `car-maint.jsonl` contains the following files. |
| 120 | + |
| 121 | +```json |
| 122 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Adatum.pdf"} |
| 123 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Fincher.pdf"} |
| 124 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Lamna.pdf"} |
| 125 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Liberty.pdf"} |
| 126 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Trey.pdf"} |
| 127 | +``` |
| 128 | + |
| 129 | +## Next steps |
| 130 | + |
| 131 | +Learn to create custom classifier models: |
| 132 | + |
| 133 | +> [!div class="nextstepaction"] |
| 134 | +> [**Build a custom classifier model**](how-to-guides/build-a-custom-classifier.md) |
| 135 | +> [**Custom models overview**](concept-custom.md) |
0 commit comments