This folder explains the usage of LogitProcessor, demonstrating how it can be used to
manipulate the raw logits before sampling the token (e.g. setting certain tokens to inf or -inf).
We demonstrate how to use it with and without a web worker, which can be toggled with USE_WEB_WORKER
in logit_processor.ts (see worker.ts on how LogitProcessor plays a role there).
We also demonstrate the usage of a low-level API forwardTokenAndSample(), which, unlike chat.completions.create()
that assumes the usage is for autoregressive chatting, here we have more fine-grained control.
See my_logit_processor.ts on how to customize your own logit processor. Here we make the logit
of token 0 100.0 manually, large enough that we should expect to always sample token 0, which
is indeed the case if we observe the console log. We also demonstarte that a LogitProcessor can be
stateful, and the state can also be cleaned with LogitProcessor.resetState().
To try it out, you can do the following steps under this folder
npm install
npm startNote if you would like to hack WebLLM core package, you can change web-llm dependencies as "file:../..", and follow the build from source instruction in the project to build webllm locally. This option is only recommended if you would like to hack WebLLM core package.