Currently, chunk_by_title in unstructured.chunking.title uses max_characters for chunking, but I need it to support token-based chunking (e.g., 512 tokens per chunk) using tiktoken.
The current implementation isn't ideal for token-based models like OpenAI GPT, which rely on token limits rather than character limits. How can we modify this to chunk by tokens instead of characters?
Any guidance or suggestions to solve this would be appreciated!