Custom pipeline with an MS LMv3 model #142
-
Hello everyone, first post here and a newbie with the library. First of all I just want to thank you Janis for creating this library also I watched your video when talking about. One thing I have noticed is how humbly you spoke about this huge and amazing amount of work you have put out there for everyone to use. This is definitely an underrated library. I am so happy I discovered it. I have been working on a custom pipeline and as a newbie I have faced a few challenges, among them using the HFLayoutLmv3TokenClassifier using the example provided in the docstring of the class. The categories list provided does not match the required structure, we are providing a list instead of a Mapping[str, TypeOrStr] it is true that is optional however if it is not provided categories_bio and categories_semantics have to be provided. The thing when I check the categories from the training data:
That is what I get which is similar to the example provided, however that is not the expected format. Also I have checked an example from the getting started notebook
I get something like this:
I am not entirely sure how to get this working or at least how to create the categories data that is expected. may I kindly ask you to clarify a little bit what is data I should provide HFLayoutLmv3TokenClassifier with to be able to use it in the pipeline? Thank you very much! Reference: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi, thank you very much for your feedback. Really appreciate that! Now for your question: Have you seen the tutorial for training LayoutLMv1 on Funsd? https://deepdoctection.readthedocs.io/en/latest/tutorials/layoutlm_for_token_classification/ This is not the model you want to try but it gives you a starting point. Regarding your issues:
The advantage of the concept: Once you establish a dataflow like
All datapoints dp will already have the intrinsic data structure of the library. You can also retrieve the categories of a labelled dataset.
But Conceptually, there is nothing different to the transformer library because all deepdoctection model classes are wrappers of the model classes of the library that implements them. The only thing, that is not used is |
Beta Was this translation helpful? Give feedback.
Hi,
thank you very much for your feedback. Really appreciate that!
Now for your question:
Have you seen the tutorial for training LayoutLMv1 on Funsd?
https://deepdoctection.readthedocs.io/en/latest/tutorials/layoutlm_for_token_classification/
This is not the model you want to try but it gives you a starting point.
Regarding your issues:
The advantage of the concept: Once you establish a dataf…