Skip to content

Commit 9b28614

Browse files
committed
Work on regionset and tokenizers
1 parent 1c607c8 commit 9b28614

File tree

4 files changed

+39
-4
lines changed

4 files changed

+39
-4
lines changed

docs/gtars/models.md

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Models and RegionSet objects in Gtars
1+
# Models and Region Set objects in Gtars
22

33
Gtars has multiple objects (structs/models) for representation of genomic regions and other related data.
44

@@ -37,13 +37,39 @@ Region is Python representation of a genomic region. e.g. `chr1:100-200` + addit
3737

3838
```
3939

40+
=== "TypeScript"
41+
42+
❗ Note: This is test example and may require additional setup to run.
43+
44+
```typescript
45+
import init from '@databio/gtars';
46+
import { RegionSet } from '@databio/gtars';
47+
48+
init();
49+
50+
export type BedEntry1 = [string, number, number, string];
51+
52+
// Define entries (regions)
53+
export const entries1: BedEntry1[] = [
54+
['chr1', 100, 200, 'peak1'],
55+
['chr2', 150, 250, 'peak2'],
56+
['chr3', 300, 400, 'peak3'],
57+
];
58+
59+
// Create a Region
60+
const rs = new RegionSet(entries1);
61+
62+
console.log(rs);
63+
64+
```
65+
4066

4167
### 🟢 RegionSet
4268

4369
RegionSet is Python representation of a genomic region set, commonly named as BED file.
4470

4571

46-
#### 🧪 Quick example
72+
#### Quick example
4773
Open BED file from URL and get its identifier.
4874

4975
=== "Python"
File renamed without changes.

docs/gtars/tokenizers.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
# gtars-tokenizers
22

3-
Genomic region tokenizers for machine learning applications.
3+
The gtars package contains genomic tokenizers module.
4+
These are used to convert genomic interval data from disparate sources into a consistent universe or consensus set.
5+
Primarily, the tokenizers are used to standardize input into our machine learning models.
6+
7+
<p align="center">
8+
<img align="center" src="../img/tokenization.svg" width="600" />
9+
</p>
10+
11+
A minimal tokenizer requires a bedfile.
12+
Once instantiated, this tokenizer can be used to tokenize new genomic interval data into the model's vocabulary.
413

514
## Features
615

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ nav:
184184
- gtars-bbcache: gtars/bbcache.md
185185
- Bindings:
186186
- Python:
187-
- Python: gtars/python.md
187+
- Overview: gtars/python-overview.md
188188
- Digests: gtars/python/digests.md
189189
- RefgetStore: gtars/python/refgetstore.md
190190
- Tokenizers: gtars/python/tokenizers.md

0 commit comments

Comments
 (0)