Skip to content

Commit d4e56a6

Browse files
authored
Docs oriented cleanup and some minor refactoring (#2432)
The "major" pieces are are: 1. Pull the README as the main docs for the `vortex` crate. 2. Remove some things from the top level namespace of `vortex-array`, which is still too busy IMO, made a few others private. 3. Remove some unused code, and consolidating module with a bunch of small files and nicely sized files. You can see the current state/changes by running: ```bash cargo doc --no-deps --open -p vortex ```
1 parent 27dd1f2 commit d4e56a6

File tree

20 files changed

+143
-238
lines changed

20 files changed

+143
-238
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
[![Documentation](https://docs.rs/vortex-array/badge.svg)](https://docs.rs/vortex-array)
66
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vortex-array)](https://pypi.org/project/vortex-array/)
77

8-
> [!TIP]
9-
> Check out the [Docs](https://docs.vortex.dev/)
8+
> \[!TIP\]
9+
> Check out our [Docs](https://docs.vortex.dev/)
1010
1111
Vortex is an extensible, state-of-the-art columnar file format, with associated tools for working with compressed Apache
1212
Arrow arrays
@@ -21,7 +21,7 @@ decompression on GPUs.
2121
Vortex is intended to be to columnar file formats what Apache DataFusion is to query engines: highly extensible,
2222
extremely fast, & batteries-included.
2323

24-
> [!CAUTION]
24+
> \[!CAUTION\]
2525
> This library is still under rapid development and is a work in progress!
2626
>
2727
> Some key features are not yet implemented, both the API and the serialized format are likely to change in breaking

vortex-array/src/aliases/mod.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,14 @@
1+
//! Re-exports of third-party crates we use in the API.
2+
//!
3+
//! The HashMap/Set should be preferred over the standard library variants or other alternatives.
4+
//! Currently defers to the excellent [hashbrown](https://docs.rs/hashbrown/latest/hashbrown/) crate.
5+
16
pub mod hash_map;
27
pub mod hash_set;
38

49
pub use hashbrown::DefaultHashBuilder;
10+
11+
pub mod paste {
12+
//! Re-export of [`paste`](https://docs.rs/paste/latest/paste/).
13+
pub use paste::paste;
14+
}

vortex-array/src/arrays/chunked/compute/min_max.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ use vortex_scalar::Scalar;
33

44
use crate::arrays::{ChunkedArray, ChunkedEncoding};
55
use crate::compute::{min_max, MinMaxFn, MinMaxResult};
6-
use crate::{partial_max, partial_min};
6+
use crate::partial_ord::{partial_max, partial_min};
77

88
impl MinMaxFn<ChunkedArray> for ChunkedEncoding {
99
fn min_max(&self, array: &ChunkedArray) -> VortexResult<Option<MinMaxResult>> {

vortex-array/src/arrays/constant/variants.rs

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
use vortex_dtype::FieldName;
2-
use vortex_error::{VortexError, VortexExpect as _, VortexResult};
3-
use vortex_scalar::Scalar;
2+
use vortex_error::VortexResult;
43

54
use crate::arrays::constant::ConstantArray;
65
use crate::arrays::ConstantEncoding;
7-
use crate::iter::Accessor;
86
use crate::variants::{
97
BinaryArrayTrait, BoolArrayTrait, ExtensionArrayTrait, ListArrayTrait, NullArrayTrait,
108
PrimitiveArrayTrait, StructArrayTrait, Utf8ArrayTrait,
@@ -57,16 +55,6 @@ impl NullArrayTrait for ConstantArray {}
5755

5856
impl BoolArrayTrait for ConstantArray {}
5957

60-
impl<T> Accessor<T> for ConstantArray
61-
where
62-
T: Clone,
63-
T: TryFrom<Scalar, Error = VortexError>,
64-
{
65-
fn value_unchecked(&self, _index: usize) -> T {
66-
T::try_from(self.scalar()).vortex_expect("Failed to convert scalar to value")
67-
}
68-
}
69-
7058
impl PrimitiveArrayTrait for ConstantArray {}
7159

7260
impl Utf8ArrayTrait for ConstantArray {}

vortex-array/src/arrays/primitive/mod.rs

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
use std::fmt::{Debug, Display};
2-
use std::{iter, ptr};
2+
use std::iter;
33
mod accessor;
44

55
use arrow_buffer::BooleanBufferBuilder;
@@ -11,7 +11,6 @@ use vortex_mask::Mask;
1111

1212
use crate::builders::ArrayBuilder;
1313
use crate::encoding::encoding_ids;
14-
use crate::iter::Accessor;
1514
use crate::stats::StatsSet;
1615
use crate::validity::{Validity, ValidityMetadata};
1716
use crate::variants::PrimitiveArrayTrait;
@@ -302,31 +301,6 @@ impl VariantsVTable<PrimitiveArray> for PrimitiveEncoding {
302301
}
303302
}
304303

305-
impl<T: NativePType> Accessor<T> for PrimitiveArray {
306-
#[inline]
307-
fn value_unchecked(&self, index: usize) -> T {
308-
self.as_slice::<T>()[index]
309-
}
310-
311-
#[inline]
312-
fn decode_batch(&self, start_idx: usize) -> Vec<T> {
313-
let batch_size = <Self as Accessor<T>>::batch_size(self, start_idx);
314-
let mut v = Vec::<T>::with_capacity(batch_size);
315-
let null_slice = self.as_slice::<T>();
316-
317-
unsafe {
318-
v.set_len(batch_size);
319-
ptr::copy_nonoverlapping(
320-
null_slice.as_ptr().add(start_idx),
321-
v.as_mut_ptr(),
322-
batch_size,
323-
);
324-
}
325-
326-
v
327-
}
328-
}
329-
330304
impl PrimitiveArrayTrait for PrimitiveArray {}
331305

332306
impl<T: NativePType> FromIterator<T> for PrimitiveArray {

vortex-array/src/children.rs

Lines changed: 0 additions & 40 deletions
This file was deleted.

vortex-array/src/compress.rs

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@ use crate::aliases::hash_set::HashSet;
55
use crate::stats::PRUNING_STATS;
66
use crate::{Array, EncodingId};
77

8+
/// Extendable compression interface, allowing implementations to explore different choices.
89
pub trait CompressionStrategy {
10+
/// Compress input array.
911
fn compress(&self, array: &Array) -> VortexResult<Array>;
1012

13+
/// A set of the IDs of the encodings the compressor can choose from.
1114
fn used_encodings(&self) -> HashSet<EncodingId>;
1215
}
1316

14-
/// Check that compression did not alter the length of the validity array.
17+
/// Verify that compression did not alter the length of the validity array.
1518
pub fn check_validity_unchanged(arr: &Array, compressed: &Array) {
1619
let _ = arr;
1720
let _ = compressed;
@@ -37,7 +40,7 @@ pub fn check_validity_unchanged(arr: &Array, compressed: &Array) {
3740
}
3841
}
3942

40-
/// Check that compression did not alter the dtype
43+
/// Verify that compression did not alter the dtype.
4144
pub fn check_dtype_unchanged(arr: &Array, compressed: &Array) {
4245
let _ = arr;
4346
let _ = compressed;
@@ -54,7 +57,7 @@ pub fn check_dtype_unchanged(arr: &Array, compressed: &Array) {
5457
}
5558
}
5659

57-
// Check that compression preserved the statistics.
60+
/// Verify that compression preserved the statistics.
5861
pub fn check_statistics_unchanged(arr: &Array, compressed: &Array) {
5962
let _ = arr;
6063
let _ = compressed;

vortex-array/src/data/mod.rs

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ use crate::encoding::{Encoding, EncodingId};
1818
use crate::iter::{ArrayIterator, ArrayIteratorAdapter};
1919
use crate::stats::{Precision, Stat, StatsSet};
2020
use crate::stream::{ArrayStream, ArrayStreamAdapter};
21+
use crate::visitor::{ChildrenVisitor, NamedChildrenVisitor};
2122
use crate::vtable::{EncodingVTable, VTableRef};
22-
use crate::{ArrayChildrenIterator, ChildrenCollector, ContextRef, NamedChildrenCollector};
23+
use crate::ContextRef;
2324

2425
mod owned;
2526
mod statistics;
@@ -274,22 +275,22 @@ impl Array {
274275
match &self.0 {
275276
InnerArray::Owned(d) => d.children.to_vec(),
276277
InnerArray::Viewed(_) => {
277-
let mut collector = ChildrenCollector::default();
278+
let mut visitor = ChildrenVisitor::default();
278279
self.vtable()
279-
.accept(self, &mut collector)
280+
.accept(self, &mut visitor)
280281
.vortex_expect("Failed to get children");
281-
collector.children()
282+
visitor.children
282283
}
283284
}
284285
}
285286

286287
/// Returns a Vec of Arrays with all the array's child arrays.
287288
pub fn named_children(&self) -> Vec<(String, Array)> {
288-
let mut collector = NamedChildrenCollector::default();
289+
let mut visitor = NamedChildrenVisitor::default();
289290
self.vtable()
290-
.accept(&self.clone(), &mut collector)
291+
.accept(&self.clone(), &mut visitor)
291292
.vortex_expect("Failed to get children");
292-
collector.children()
293+
visitor.children
293294
}
294295

295296
/// Returns the number of child arrays
@@ -300,7 +301,7 @@ impl Array {
300301
}
301302
}
302303

303-
pub fn depth_first_traversal(&self) -> ArrayChildrenIterator {
304+
pub fn depth_first_traversal(&self) -> impl Iterator<Item = Array> {
304305
ArrayChildrenIterator::new(self.clone())
305306
}
306307

@@ -461,3 +462,26 @@ impl Iterator for ArrayChunkIterator {
461462
}
462463
}
463464
}
465+
466+
/// A depth-first pre-order iterator over a Array.
467+
struct ArrayChildrenIterator {
468+
stack: Vec<Array>,
469+
}
470+
471+
impl ArrayChildrenIterator {
472+
pub fn new(array: Array) -> Self {
473+
Self { stack: vec![array] }
474+
}
475+
}
476+
477+
impl Iterator for ArrayChildrenIterator {
478+
type Item = Array;
479+
480+
fn next(&mut self) -> Option<Self::Item> {
481+
let next = self.stack.pop()?;
482+
for child in next.children().into_iter().rev() {
483+
self.stack.push(child);
484+
}
485+
Some(next)
486+
}
487+
}

vortex-array/src/encoding/opaque.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
//! An encoding of an array that we cannot interpret.
2+
//!
3+
//! Vortex allows for pluggable encodings. This can lead to issues when one process produces a file
4+
//! using a custom encoding, and then another process without knowledge of the encoding attempts
5+
//! to read it.
6+
17
use std::any::Any;
28
use std::fmt::{Debug, Formatter};
39

@@ -12,12 +18,6 @@ use crate::vtable::{
1218
};
1319
use crate::{Array, Canonical};
1420

15-
/// An encoding of an array that we cannot interpret.
16-
///
17-
/// Vortex allows for pluggable encodings. This can lead to issues when one process produces a file
18-
/// using a custom encoding, and then another process without knowledge of the encoding attempts
19-
/// to read it.
20-
///
2121
/// `OpaqueEncoding` allows deserializing these arrays. Many common operations will fail, but it
2222
/// allows deserialization and introspection in a type-erased manner on the children and metadata.
2323
///
Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,51 @@
1+
//! Iterator over slices of an array, and related utilities.
2+
13
use itertools::Itertools;
4+
use vortex_dtype::DType;
25
use vortex_error::VortexResult;
36

47
use crate::arrays::ChunkedArray;
5-
use crate::iter::ArrayIterator;
68
use crate::stream::{ArrayStream, ArrayStreamAdapter};
79
use crate::{Array, IntoArray};
810

11+
/// Iterator of array with a known [`DType`].
12+
///
13+
/// Its up to implementations to guarantee all arrays have the same [`DType`].
14+
pub trait ArrayIterator: Iterator<Item = VortexResult<Array>> {
15+
fn dtype(&self) -> &DType;
16+
}
17+
18+
pub struct ArrayIteratorAdapter<I> {
19+
dtype: DType,
20+
inner: I,
21+
}
22+
23+
impl<I> ArrayIteratorAdapter<I> {
24+
pub fn new(dtype: DType, inner: I) -> Self {
25+
Self { dtype, inner }
26+
}
27+
}
28+
29+
impl<I> Iterator for ArrayIteratorAdapter<I>
30+
where
31+
I: Iterator<Item = VortexResult<Array>>,
32+
{
33+
type Item = VortexResult<Array>;
34+
35+
fn next(&mut self) -> Option<Self::Item> {
36+
self.inner.next()
37+
}
38+
}
39+
40+
impl<I> ArrayIterator for ArrayIteratorAdapter<I>
41+
where
42+
I: Iterator<Item = VortexResult<Array>>,
43+
{
44+
fn dtype(&self) -> &DType {
45+
&self.dtype
46+
}
47+
}
48+
949
pub trait ArrayIteratorExt: ArrayIterator {
1050
fn into_stream(self) -> impl ArrayStream
1151
where

0 commit comments

Comments
 (0)