Toke is a small and simple tokenizer for text parsing purposes. The entire tokenizer is in the header file that you can easily include in your project.
With Toke, you can tokenize special characters and strings by adding them to context using the IncludeToken and the IncludeFormatToken functions to specify their name and character/string (more on FormatTokens in What's new with toke?). Toke will automatically add a TokenType called NULLTOK with a token of '\0' that will encompass any other remaining text or characters. Then, Toke will automatically consume any whitespace and new line characters but it will keep count of which token belonged to which line.
- Format based tokens!
- With format based tokens you can do a post processing typing operation to all remaining
NULLTOKtokens- This means that the format matching happens AFTER parsing the whole file
- To add a format based token, you have to provide the format validation function which takes in one
T_stringvariable and returns a boolean value
- With format based tokens you can do a post processing typing operation to all remaining
- Format validation functions
- Along with the format based tokens, there are a couple of pre-implemented validation functions like
IsInteger,IsHex,IsFloat, andIsAlphabetic - These function can be used or viewed to understand how the format validation works or used directly for your custom tokens
- Along with the format based tokens, there are a couple of pre-implemented validation functions like
Toke can be used with 4 simple steps:
- Create the context
Context* CTX=CreateContext();- Include your tokens
IncludeToken(CTX,"OPEN TAG","<");
IncludeToken(CTX,"CLOSE TAG",">");
IncludeToken(CTX,"SLASH","/");
IncludeToken(CTX,"ASSIGNMENT","=");
IncludeToken(CTX,"QUOTATION","\"");
IncludeToken(CTX,"SEMICOLON",";");
//String match example
IncludeToken(CTX,"HEADER 1","h1");- Include any format based tokens
// IsHex is a function with the following signature
// bool IsHex(T_string str)
IncludeFormatToken(CTX,"Hex",IsHex)- Tokenize the file!
TokenArray* ta=TokenizeFile(CTX,"./index.html");- Free the context once you're done!
FreeContext(CTX);Check the example in example/example.c where I try to parse an html file for a more "real life" application
