Skip to content
This repository was archived by the owner on Dec 9, 2022. It is now read-only.

Commit 3ed74a1

Browse files
committed
Merge branch 'master' of https://github.com/hamelsmu/mdtokenize
2 parents 6822a02 + 4e3cded commit 3ed74a1

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,14 @@ This parser works by converting markdown to HTML then converting the HTML (along
3636

3737
`pip install mdparse`
3838

39+
### Caveats
40+
41+
This library makes extremely opinionated choices on how to parse markdown and filter information. This library is for experimental purposes only, and may not be appropriate for every problem. Please use with caution.
42+
43+
The primary use case of this parser has been to prepare a large corpus of GitHub Issue data for a language model. However, we envision this parser would be applicable to other machine learning tasks involving the extraction of features from the text of GitHub Issues, Readme files, or pull request comments.
44+
3945
# Examples
4046

41-
See [/notebooks/Demo.ipynb](/notebooks/Demo.ipynb) for an example of the transformations this parser does on a markdown file.
47+
See [/notebooks/Demo.ipynb](/notebooks/Demo.ipynb) for an example of the transformations this parser does on a markdown file:
4248

43-
![](images/Demo.png)
49+
<img src="https://github.com/machine-learning-apps/mdparse/blob/master/images/demo.png" width="739" height="1479">

0 commit comments

Comments
 (0)