Skip to content
This repository was archived by the owner on Dec 9, 2022. It is now read-only.

Commit c74f258

Browse files
authored
Create README.md
1 parent f89f8d4 commit c74f258

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# A Parser That Adds Special Identifiers To Markdown Files For Deep Learning
2+
3+
GitHub contains a large corpus data that is amenable for NLP, in the form of Issues, READMEs, pull request comments and other items. However, this text is often accompanies by markdown which allows the user to specify styling (bold, underline, headings) and specialized formatting (code blocks, tables, block quotes, hyperlinks). This library has two goals:
4+
5+
## 1. Insert custom field indicators
6+
7+
This is so markdown information is not lost. For example, a list block is enclosed with `xxxlistB` and `xxxlistE` and a code block is enclosed with `xxxcdb` and `xxxcde`.
8+
9+
## 2. Discard superflous information
10+
11+
Documentation TBD

0 commit comments

Comments
 (0)