Commit 156697b
committed
Add display cell tokenizer
Fixes #663
Add a new function `display_cell_tokenize` to split Thai text into display cells without splitting tone marks.
* **New Functionality**
- Add `display_cell_tokenize` function in `pythainlp/tokenize/core.py` to handle the splitting of Thai text into display cells.
- Ensure the function does not split tone marks.
* **Initialization**
- Update `pythainlp/tokenize/__init__.py` to include the new `display_cell_tokenize` function in the `__all__` list.
* **Testing**
- Add tests for the `display_cell_tokenize` function in `tests/core/test_tokenize.py`.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/PyThaiNLP/pythainlp/issues/663?shareId=XXXX-XXXX-XXXX-XXXX).1 parent 7332984 commit 156697b
File tree
3 files changed
+49
-0
lines changed- pythainlp/tokenize
- tests/core
3 files changed
+49
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
733 | 733 | | |
734 | 734 | | |
735 | 735 | | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
736 | 775 | | |
737 | 776 | | |
738 | 777 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
604 | 605 | | |
605 | 606 | | |
606 | 607 | | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
0 commit comments