Skip to content

Latest commit

 

History

History
87 lines (59 loc) · 2.78 KB

File metadata and controls

87 lines (59 loc) · 2.78 KB

NLP Token Visualizer

Token visualization for NLP tasks. Inspired by Tiktokenizer. The idea was to use your own vocabularies to visualize encoded text.

Import and required arguments

To use the method process_text, include the following:

import tokenviz
from tokenviz.visualization import process_text

Load your text and define your encode and decode methods. These methods are given as arguments to process_text:

text = "some text ..."

# takes text and converts it to a list of integers according to the encoding scheme
def encode(text_to_encode):
    # some magic happens
    return encoded_text

# takes a list of integers and decodes these into the text according to the decoding scheme
def decode(text_to_decode):
    # some magic happens
    return decoded_text

process_text(text, encode, decode)

Examples

HTML example

Here's a simple example using the predefined encoding/decoding methods with a simple string. Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='html')

generates...

<span style="background-color: Khaki;">H</span><span style="background-color: AliceBlue;">e</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">l</span><span style="background-color: Lavender;">o</span><span style="background-color: Ivory;"> </span><span style="background-color: DarkSalmon;">w</span><span style="background-color: Khaki;">o</span><span style="background-color: AliceBlue;">r</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">d</span><span style="background-color: Lavender;">!</span>

LaTeX example

Add the following imports and definitions to your LaTeX document.

\usepackage{listings}
\usepackage{xcolor}

% Define a custom style for listings
\lstdefinestyle{custom}{
    basicstyle=\small\ttfamily, % Small font size and typewriter style
    escapeinside={(*@}{@*)},    % Escape for inline LaTeX
}

Then add your generated LaTeX code to the listing:

\begin{lstlisting}[caption=My title, label=mylabel, style=custom]
% Your LaTeX code goes here
\end{lstlisting}

Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='latex')

generates...

(*@\colorbox{yellow}{H}@*)(*@\colorbox{pink}{e}@*)(*@\colorbox{lightgray}{l}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{o}@*)(*@\colorbox{magenta}{ }@*)(*@\colorbox{yellow}{w}@*)(*@\colorbox{pink}{o}@*)(*@\colorbox{lightgray}{r}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{d}@*)(*@\colorbox{magenta}{!}@*)