|
| 1 | +--- |
| 2 | +title: Custom classifier model - Form Recognizer |
| 3 | +titleSuffix: Azure Applied AI Services |
| 4 | +description: Use the custom classifier model to train a model to identify and split the documents you process within your application. |
| 5 | +author: vkurpad |
| 6 | +manager: nitinme |
| 7 | +ms.service: applied-ai-services |
| 8 | +ms.subservice: forms-recognizer |
| 9 | +ms.topic: conceptual |
| 10 | +ms.date: 03/03/2023 |
| 11 | +ms.author: lajanuar |
| 12 | +ms.custom: references_regions |
| 13 | +monikerRange: 'form-recog-3.0.0' |
| 14 | +recommendations: false |
| 15 | +--- |
| 16 | + |
| 17 | +# Custom classifier model |
| 18 | + |
| 19 | +**This article applies to:**  **Form Recognizer v3.0**. |
| 20 | + |
| 21 | +Custom classifier models are deep-learning-model types that combine layout and language features to accurately detect and identify documents you process within your application. Custom classifier models can classify each page in an input file to identify the document(s) within and can also identify multiple documents or multiple instances of a single document within an input file. |
| 22 | + |
| 23 | +## Model capabilities |
| 24 | + |
| 25 | +Custom classifier models can analyze a single- or multi-file documents to identify if any of the trained document types are contained within an input file. Here are the currently supported scenarios: |
| 26 | + |
| 27 | +* A single file containing one document. For instance, a loan application form. |
| 28 | + |
| 29 | +* A single file containing multiple documents. For instance, a loan application package containing a loan application form, payslip, and bank statement. |
| 30 | + |
| 31 | +* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices. |
| 32 | + |
| 33 | +Training a custom classifier model requires at least two distinct classes and a minimum of five samples per class. |
| 34 | + |
| 35 | +### Compare custom classifier and composed models |
| 36 | + |
| 37 | +A custom classifier model can replace [a composed model](concept-composed-models.md) in some scenarios but there are a few differences to be aware of: |
| 38 | + |
| 39 | +| Capability | Custom classifier process | Composed model process | |
| 40 | +|--|--|--| |
| 41 | +|Analyze a single document of unknown type belonging to one of the types trained for extraction model processing.| ● Requires multiple calls. </br> ● Call the classifier models based on the document class. This step allows for a confidence-based check before invoking the extraction model analysis.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model containing the model corresponding to the input document type. | |
| 42 | + |Analyze a single document of unknown type belonging to several types trained for extraction model processing.| ●Requires multiple calls.</br> ● Make a call to the classifier that ignores documents not matching a designated type for extraction.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model. The service selects a custom model within the composed model with the highest match.</br> ● A composed model can't ignore documents.| |
| 43 | +|Analyze a file containing multiple documents of known or unknown type belonging to one of the types trained for extraction model processing.| ● Requires multiple calls. </br> ● Call the extraction model for each identified document in the input file.</br> ● Invoke the extraction model. | ● Requires a single call to a composed model.</br> ● The composed model invokes the component model once on the first instance of the document. </br> ●The remaining documents are ignored. | |
| 44 | + |
| 45 | +## Language support |
| 46 | + |
| 47 | +Classifier models currently only support English language documents. |
| 48 | + |
| 49 | +## Best practices |
| 50 | + |
| 51 | +Custom classifier models require a minimum of five samples per class to train. If the classes are similar, adding extra training samples improves model accuracy. |
| 52 | + |
| 53 | +## Training a model |
| 54 | + |
| 55 | +Custom classifier models are only available in the [v3.0 API](v3-migration-guide.md) starting with API version ```2023-02-28-preview```. [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. |
| 56 | + |
| 57 | +When using the REST API, if you've organized your documents by folders, you can use the ```azureBlobSource``` property of the request to train a classifier model. |
| 58 | + |
| 59 | +```rest |
| 60 | +https://{endpoint}/formrecognizer/documentClassifiers:build?api-version=2023-02-28-preview |
| 61 | +
|
| 62 | +{ |
| 63 | + "classifierId": "demo2.1", |
| 64 | + "description": "", |
| 65 | + "docTypes": { |
| 66 | + "car-maint": { |
| 67 | + "azureBlobSource": { |
| 68 | + "containerUrl": "SAS URL to container", |
| 69 | + "prefix": "sample1/car-maint/" |
| 70 | + } |
| 71 | + }, |
| 72 | + "cc-auth": { |
| 73 | + "azureBlobSource": { |
| 74 | + "containerUrl": "SAS URL to container", |
| 75 | + "prefix": "sample1/cc-auth/" |
| 76 | + } |
| 77 | + }, |
| 78 | + "deed-of-trust": { |
| 79 | + "azureBlobSource": { |
| 80 | + "containerUrl": "SAS URL to container", |
| 81 | + "prefix": "sample1/deed-of-trust/" |
| 82 | + } |
| 83 | + } |
| 84 | + } |
| 85 | +} |
| 86 | +
|
| 87 | +``` |
| 88 | + |
| 89 | +Alternatively, if you have a flat list of files or only plan to use a few select files within each folder to train the model, you can use the ```azureBlobFileListSource``` property to train the model. This step requires a ```file list``` in [JSON Lines](https://jsonlines.org/) format. For each class, add a new file with a list of files to be submitted for training. |
| 90 | + |
| 91 | +```rest |
| 92 | +{ |
| 93 | + "classifierId": "demo2", |
| 94 | + "description": "", |
| 95 | + "docTypes": { |
| 96 | + "car-maint": { |
| 97 | + "azureBlobFileListSource": { |
| 98 | + "containerUrl": "SAS URL to container", |
| 99 | + "fileList": "sample1/car-maint.jsonl" |
| 100 | + } |
| 101 | + }, |
| 102 | + "cc-auth": { |
| 103 | + "azureBlobFileListSource": { |
| 104 | + "containerUrl": "SAS URL to container", |
| 105 | + "fileList": "sample1/cc-auth.jsonl" |
| 106 | + } |
| 107 | + }, |
| 108 | + "deed-of-trust": { |
| 109 | + "azureBlobFileListSource": { |
| 110 | + "containerUrl": "SAS URL to container", |
| 111 | + "fileList": "sample1/deed-of-trust.jsonl" |
| 112 | + } |
| 113 | + } |
| 114 | + } |
| 115 | +} |
| 116 | +
|
| 117 | +``` |
| 118 | + |
| 119 | +File list `car-maint.jsonl` contains the following files. |
| 120 | + |
| 121 | +```json |
| 122 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Adatum.pdf"} |
| 123 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Fincher.pdf"} |
| 124 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Lamna.pdf"} |
| 125 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Liberty.pdf"} |
| 126 | +{"file":"sample1/car-maint/Commercial Motor Vehicle - Trey.pdf"} |
| 127 | +``` |
| 128 | + |
| 129 | +## Next steps |
| 130 | + |
| 131 | +Learn to create custom classifier models: |
| 132 | + |
| 133 | +> [!div class="nextstepaction"] |
| 134 | +> [**Build a custom classifier model**](how-to-guides/build-a-custom-classifier.md) |
| 135 | +> [**Custom models overview**](concept-custom.md) |
0 commit comments