Skip to content

Commit 45a7ae0

Browse files
committed
redact identity
1 parent 1f07cf0 commit 45a7ae0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

eval_tokenizer_example.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -514,7 +514,7 @@
514514
"tokens, _, __ = read_cpp_res('wiki')\n",
515515
"gb = greedy_builder.build(tokens)\n",
516516
"\n",
517-
"orig = process_wiki_xml(open(\"/data/jiapeng/wiki/cleaned/AA/wiki_00\"))\n",
517+
"orig = process_wiki_xml(open(\"/data/wiki/cleaned/AA/wiki_00\"))\n",
518518
"print(\"Number of texts:\", len(orig))\n",
519519
"pat_str=r\"\"\"'s|'t|'re|'ve|'m|'ll|'d| ?[\\p{L}]+| ?[\\p{N}]+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)|\\s+\"\"\"\n",
520520
"tokenized = gb.batch_tokenize([[re.sub(' ','Ġ',x) for x in regex.findall(pat_str, doc)] for doc in orig])\n",

0 commit comments

Comments
 (0)