Skip to content

Quantum trajectory plot #1

@jeremymanning

Description

@jeremymanning

Let's take a book with long, engaging chapters (e.g., hyperion_djvu.txt). Divide it into chapters/sections (Prologue, Chapter 1, Chapter 2, ..., Chapter 6, Epilogue).

For each chapter (can be parallelized in different threads):

  • Use TinyLlama-1.1b to embed each token. Create a number-of-tokens by number-of-embedding-dimensions matrix (for this chapter).
  • For each of 100 "particles":
    • Project it forward by iteratively predicting next tokens (until we get to a stop token). If we run out of context, just slide the window forward to include only the last <length-of-context-window - 1> tokens.
    • Store the embeddings of the predicted token sequences (in a length number-of-particles list of number-of-tokens-for-that-particle by number-of-embedding-dimensions matrices)

Save everything as a pkl file (one per chapter).

Then, once all chapters' pkl files are saved out:

  • Concatenate all of the embedded tokens into an enormous total-number-of-tokens by number-of-embedding-dimensions matrices (across all chapters and particles)
  • Project into 2D using UMAP
  • Split the concatenated matrix back out into separate chapters/particles
  • Save out a pkl file with the 2D projections

Then make a plot like this (one panel per chapter):
Image

The blue lines are chapter trajectories. The red lines (projecting forward from the end of each chapter) are particles' predictions. The blue dots are the starts/ends of each chapter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions