Experiment: k-mer tokenization for promoters

## Description

Instead of character-level tokenization, try https://huggingface.co/bolinas-dna/tokenizer-4-mer and https://huggingface.co/bolinas-dna/tokenizer-8-mer. Train on promoter dataset and evaluate on zero-shot VEP.

## Hypothesis or Goal

Tokenization can influence both downstream task performance and training/inference speed. Character-level is a good default but worth exploring additional options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: k-mer tokenization for promoters #64

Description

Hypothesis or Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment: k-mer tokenization for promoters #64

Description

Description

Hypothesis or Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions