BiocPy
diff --git a/‎.github/workflows/run-tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/run-tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 13 additions & 12 deletions b/‎README.md‎
Lines changed: 13 additions & 12 deletions
diff --git a/‎docs/conf.py‎
Lines changed: 1 addition & 0 deletions b/‎docs/conf.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorial.md‎
Lines changed: 13 additions & 38 deletions b/‎docs/tutorial.md‎
Lines changed: 13 additions & 38 deletions
diff --git a/‎setup.cfg‎
Lines changed: 4 additions & 3 deletions b/‎setup.cfg‎
Lines changed: 4 additions & 3 deletions
@@ -28,7 +28,7 @@ jobs:
   test:
     strategy:
       matrix:
-        python: ["3.9", "3.10", "3.11", "3.12", "3.13"]
+        python: ["3.10", "3.11", "3.12", "3.13"]
         platform:
           - ubuntu-latest
           - macos-latest
 
@@ -1,5 +1,12 @@
 # Changelog
 
+## Version 0.8.0
+
+- Rename module files to follow PEP guidelines
+- Rename `GenomicRangesList` to `CompressedGenomicRangesList` and now extends compressed-lists
+- Classes extend `BiocObject` from biocutils, provides a default metadata attribute and helper functions.
+- rename `validate` to `_validate` for consistency with the rest of the packages and classes.
+
 ## Version 0.7.0 - 0.7.3
 
 - Changes to switch to LTLA/nclist-cpp in the iranges package for overlap and search operations.
@@ -62,7 +69,6 @@ An rewrite of the package to use the new and improve IRanges packages (>= 0.4.2)
 - Coerce `GenomicRangesList` to `GenomicRanges`.
 - Add tests and documentation.
 
-
 ## Version 0.4.21 - 0.4.24
 
 - Optimize `intersect` operation on large number of genomic regions
 
@@ -48,6 +48,7 @@ print(len(gg), len(df))
 
     ## output
     ## 77 77> [!NOTE]
+
 > `ends` are expected to be inclusive to be consistent with Bioconductor representations. If they are not, we recommend subtracting 1 from the `ends`.
 
 #### UCSC or GTF file
@@ -212,16 +213,16 @@ print(hits)
     [1]                1          1677082
     [2]                2          1003411
 
-## `GenomicRangesList`
+## `CompressedGenomicRangesList`
 
-Just as it sounds, a `GenomicRangesList` is a named-list like object. If you are wondering why you need this class, a `GenomicRanges` object lets us specify multiple genomic elements, usually where the genes start and end. Genes are themselves made of many sub-regions, e.g. exons. `GenomicRangesList` allows us to represent this nested structure.
+Just as it sounds, a `CompressedGenomicRangesList` is a named-list like object. If you are wondering why you need this class, a `GenomicRanges` object lets us specify multiple genomic elements, usually where the genes start and end. Genes are themselves made of many sub-regions, e.g. exons. `CompressedGenomicRangesList` allows us to represent this nested structure.
 
 **Currently, this class is limited in functionality.**
 
-To construct a GenomicRangesList
+To construct a CompressedGenomicRangesList
 
 ```python
-from genomicranges import GenomicRanges, GenomicRangesList
+from genomicranges import GenomicRanges, CompressedGenomicRangesList
 from iranges import IRanges
 from biocframe import BiocFrame
 
@@ -238,12 +239,12 @@ gr2 = GenomicRanges(
     strand=["-", "+", "*"],
     mcols=BiocFrame({"score": [2, 3, 4]}),
 )
-grl = GenomicRangesList(ranges=[gr1, gr2], names=["gene1", "gene2"])
+grl = CompressedGenomicRangesList.from_list(lst=[gr1, gr2], names=["gene1", "gene2"])
 print(grl)
 ```
 
     ## output
-    GenomicRangesList with 2 ranges and 2 metadata columns
+    CompressedGenomicRangesList with 2 ranges and 2 metadata columns
 
     Name: gene1
     GenomicRanges with 4 ranges and 4 metadata columns
@@ -270,12 +271,12 @@ print(grl)
 
 Performance comparison between Python and R GenomicRanges implementations. The query dataset contains approximately 564,000 intervals, while the subject dataset contains approximately 71 million intervals.
 
-| Operation | Python/GenomicRanges | Python/GenomicRanges (5 threads) | R/GenomicRanges |
-|-----------|---------------------|-----------------------------------|-----------------|
-| Overlap | 2.80s | 2.06s | 4.40s |
-| Overlap (single chromosome) | 6.73s | 5.19s | 10.06s |
-| Nearest | 2.27s | 1.5s | 42.16s |
-| Nearest (single chromosome) | 4.7s | 4.67s | 11.01s |
+| Operation                   | Python/GenomicRanges | Python/GenomicRanges (5 threads) | R/GenomicRanges |
+| --------------------------- | -------------------- | -------------------------------- | --------------- |
+| Overlap                     | 2.80s                | 2.06s                            | 4.40s           |
+| Overlap (single chromosome) | 6.73s                | 5.19s                            | 10.06s          |
+| Nearest                     | 2.27s                | 1.5s                             | 42.16s          |
+| Nearest (single chromosome) | 4.7s                 | 4.67s                            | 11.01s          |
 
 > [!NOTE]
 > The single chromosome benchmark ignores chromosome/sequence information and performs overlap operations solely on intervals.
 
@@ -315,6 +315,7 @@
     "biocutils": ("https://biocpy.github.io/BiocUtils", None),
     "iranges": ("https://biocpy.github.io/IRanges", None),
     "polars": ("https://docs.pola.rs/api/python/stable/", None),
+    "compressed-lists": ("https://biocpy.github.io/compressed-lists", None),
 }
 
 print(f"loading configurations for {project} {version} ...", file=sys.stderr)
@@ -10,7 +10,7 @@ kernelspec:
 
 An `IRanges` holds a **start** position and a **width**, and is typically used to represent coordinates along a genomic sequence. The interpretation of the **start** position depends on the application; for sequences, the **start** is usually a 1-based position, but other use cases may allow zero or even negative values, e.g., circular genomes. Ends are considered inclusive. `IRanges` uses [LTLa/nclist-cpp](https://github.com/LTLA/nclist-cpp) under the hood to perform fast overlap and search-based operations.
 
-The package provides a `GenomicRanges` class to specify multiple genomic elements, typically where genes start and end. Genes are themselves made of many subregions, such as exons, and a `GenomicRangesList` enables the representation of this nested structure.
+The package provides a `GenomicRanges` class to specify multiple genomic elements, typically where genes start and end. Genes are themselves made of many subregions, such as exons, and a `CompressedGenomicRangesList` enables the representation of this nested structure.
 
 Moreover, the package also provides a `SeqInfo` class to update or modify sequence information stored in the object. Learn more about this in the [GenomeInfoDb package](https://bioconductor.org/packages/release/bioc/html/GenomeInfoDb.html).
 
@@ -68,10 +68,9 @@ human_gr = genomicranges.read_ucsc(genome="hg19")
 print(human_gr)
 ```
 
-
 ## Preferred way
 
-To construct a `GenomicRanges` object, we need to provide sequence information and genomic coordinates. This is achieved through the combination of the `seqnames` and `ranges` parameters. Additionally, you have the option to specify the `strand`, represented as a list of "+" (or 1) for the forward strand, "-" (or -1) for the reverse strand, or "*" (or 0) if the strand is unknown. You can also provide a NumPy vector that utilizes either the string or numeric representation to specify the `strand`. Optionally, you can use the `mcols` parameter to provide additional metadata about each genomic region.
+To construct a `GenomicRanges` object, we need to provide sequence information and genomic coordinates. This is achieved through the combination of the `seqnames` and `ranges` parameters. Additionally, you have the option to specify the `strand`, represented as a list of "+" (or 1) for the forward strand, "-" (or -1) for the reverse strand, or "\*" (or 0) if the strand is unknown. You can also provide a NumPy vector that utilizes either the string or numeric representation to specify the `strand`. Optionally, you can use the `mcols` parameter to provide additional metadata about each genomic region.
 
 ```{code-cell}
 from genomicranges import GenomicRanges
@@ -427,7 +426,7 @@ print(binned_avg_gr)
 ```
 
 ::: {tip}
-Now you might wonder how can I generate these ***bins***?
+Now you might wonder how can I generate these **_bins_**?
 :::
 
 # Generate tiles or bins
@@ -469,7 +468,7 @@ print(tiles)
 ```{code-cell}
 seqlengths = {"chr1": 100, "chr2": 75, "chr3": 200}
 
-tiles = GenomicRanges.tile_genome(seqlengths=seqlengths, n=10)
+tiles = GenomicRanges.tile_genome(seqlengths=seqlengths, ntile=10)
 print(tiles)
 ```
 
@@ -547,8 +546,6 @@ query_hits = gr.nearest(find_regions)
 
 query_hits = gr.precede(find_regions)
 
-query_hits = gr.follow(find_regions)
-
 print(query_hits)
 ```
 
@@ -609,7 +606,7 @@ print(combined)
 # Misc operations
 
 - **invert_strand**: flip the strand for each interval
-- **sample**: randomly choose ***k*** intervals
+- **sample**: randomly choose **_k_** intervals
 
 ```{code-cell}
 # invert strand
@@ -619,20 +616,22 @@ inv_gr = gr.invert_strand()
 samp_gr = gr.sample(k=4)
 ```
 
-# `GenomicRangesList` class
+# `CompressedGenomicRangesList` class
 
-Just as it sounds, a `GenomicRangesList` is a named-list like object.
+Just as it sounds, a `CompressedGenomicRangesList` is a named-list like object.
 
 If you are wondering why you need this class, a `GenomicRanges` object enables the
 specification of multiple genomic elements, usually where genes start and end.
 Genes, in turn, consist of various subregions, such as exons.
-The `GenomicRangesList` allows us to represent this nested structure.
+The `CompressedGenomicRangesList` allows us to represent this nested structure.
 
 As of now, this class has limited functionality, serving as a read-only class with basic accessors.
 
 ```{code-cell}
+from genomicranges import CompressedGenomicRangesList, GenomicRanges
+from iranges import IRanges
+from biocframe import BiocFrame
 
-from genomicranges import GenomicRangesList
 a = GenomicRanges(
     seqnames=["chr1", "chr2", "chr1", "chr3"],
     ranges=IRanges([1, 3, 2, 4], [10, 30, 50, 60]),
@@ -647,33 +646,17 @@ b = GenomicRanges(
     mcols=BiocFrame({"score": [2, 3, 4]}),
 )
 
-grl = GenomicRangesList(ranges=[a,b], names=["gene1", "gene2"])
+grl = CompressedGenomicRangesList.from_list(lst=[a,b], names=["gene1", "gene2"])
 print(grl)
 ```
 
-
 ## Properties
 
 ```{code-cell}
 grl.start
 grl.width
 ```
 
-## Combine `GenomicRangeslist` object
-
-Similar to the combine function from `GenomicRanges`,
-
-```{code-cell}
-grla = GenomicRangesList(ranges=[a], names=["a"])
-grlb = GenomicRangesList(ranges=[b, a], names=["b", "c"])
-
-# or use the combine generic
-from biocutils.combine import combine
-cgrl = combine(grla, grlb)
-```
-
-The functionality in `GenomicRangesLlist` is limited to read-only and a few methods. Updates are expected to be made as more features become available.
-
 ## Empty ranges
 
 Both of these classes can also contain no range information, and they tend to be useful when incorporates into larger data structures but do not contain any data themselves.
@@ -686,15 +669,7 @@ empty_gr = GenomicRanges.empty()
 print(empty_gr)
 ```
 
-Similarly, an empty `GenomicRangesList` can be created:
-
-```{code-cell}
-empty_grl = GenomicRangesList.empty(n=100)
-
-print(empty_grl)
-```
-
-----
+---
 
 ## Futher reading
 
 
@@ -49,10 +49,11 @@ python_requires = >=3.9
 # For more information, check out https://semver.org/.
 install_requires =
     importlib-metadata; python_version<"3.8"
-    biocframe>=0.6.2
-    iranges>=0.5.4
-    biocutils>=0.2.1
+    biocframe>=0.7.1
+    iranges>=0.7.0
+    biocutils>=0.3.1
     numpy
+    compressed_lists>=0.4.0
 
 [options.packages.find]
 where = src
Original file line number	Diff line number	Diff line change
`@@ -315,6 +315,7 @@`
`315`	`315`	`"biocutils": ("https://biocpy.github.io/BiocUtils", None),`
`316`	`316`	`"iranges": ("https://biocpy.github.io/IRanges", None),`
`317`	`317`	`"polars": ("https://docs.pola.rs/api/python/stable/", None),`
	`318`	`+ "compressed-lists": ("https://biocpy.github.io/compressed-lists", None),`
`318`	`319`	`}`
`319`	`320`
`320`	`321`	`print(f"loading configurations for {project} {version} ...", file=sys.stderr)`