Add docs

go-inoue · go-inoue · commit f2ff4ac937bf · 2022-04-05T14:58:46.000+09:00
diff --git a/docs/source/api/disambig/bert.rst b/docs/source/api/disambig/bert.rst
@@ -0,0 +1,37 @@
+camel_tools.disambig.bert
+=========================
+
+.. automodule:: camel_tools.disambig.bert
+
+Classes
+-------
+
+.. autoclass:: camel_tools.disambig.bert.BERTUnfactoredDisambiguator
+   :members:
+
+
+Examples
+--------
+
+Below is an example of how to load and use the default pre-trained CAMeLBERT
+based model to disambiguate words in a sentence.
+
+.. code-block:: python
+
+   from camel_tools.disambig.bert import BERTUnfactoredDisambiguator
+
+   unfactored = BERTUnfactoredDisambiguator.pretrained()
+
+   # We expect a sentence to be whitespace/punctuation tokenized beforehand.
+   # We provide a simple whitespace and punctuation tokenizer as part of camel_tools.
+   # See camel_tools.tokenizers.word.simple_word_tokenize.
+   sentence = ['سوف', 'نقرأ', 'الكتب']
+
+   disambig = unfactored.disambiguate(sentence)
+
+   # Let's, for example, use the top disambiguations to generate a diacritized 
+   # version of the above sentence.
+   # Note that, in practice, you'll need to make sure that each word has a
+   # non-zero list of analyses.
+   diacritized = [d.analyses[0].analysis['diac'] for d in disambig]
+   print(' '.join(diacritized))