Skip to content

Commit afaa3a3

Browse files
committed
feat(enum): add Enum type (engine+python)
1 parent 3870daa commit afaa3a3

File tree

14 files changed

+132
-20
lines changed

14 files changed

+132
-20
lines changed

docs/docs/core/data_types.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ This is the list of all primitive types supported by CocoIndex:
4646
| *Bytes* | `bytes` | | |
4747
| *Str* | `str` | | |
4848
| *Bool* | `bool` | | |
49+
| *Enum* | `str`, `cocoindex.typing.Enum()` | | |
4950
| *Int64* | `cocoindex.Int64`, `int`, `numpy.int64` | | |
5051
| *Float32* | `cocoindex.Float32`, `numpy.float32` | *Float64* | |
5152
| *Float64* | `cocoindex.Float64`, `float`, `numpy.float64` | | |
@@ -84,6 +85,9 @@ Notes:
8485
In Python, it's represented by `cocoindex.Json`.
8586
It's useful to hold data without fixed schema known at flow definition time.
8687

88+
#### Enum Type
89+
90+
*Enum* represents a string-like enumerated type. In Python, use the helper from `cocoindex.typing`.
8791

8892
#### Vector Types
8993

docs/docs/examples/examples/docs_to_knowledge_graph.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -373,4 +373,4 @@ You can open it at [http://localhost:7474](http://localhost:7474), and run the f
373373
MATCH p=()-->() RETURN p
374374
```
375375
376-
![Neo4j Browser](/img/examples/docs_to_knowledge_graph/neo4j_browser.png)
376+
![Neo4j Browser](/img/examples/docs_to_knowledge_graph/neo4j_browser.png)

docs/docs/sources/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,6 @@ In CocoIndex, a source is the data origin you import from (e.g., files, database
1717
| [Postgres](/docs/sources/postgres) | Relational database (Postgres) |
1818

1919
Related:
20-
- [Life cycle of a indexing flow](/docs/core/basics#life-cycle-of-an-indexing-flow)
21-
- [Live Update Tutorial](/docs/tutorials/live_updates)
20+
- [Life cycle of a indexing flow](/docs/core/basics#life-cycle-of-an-indexing-flow)
21+
- [Live Update Tutorial](/docs/tutorials/live_updates)
2222
for change capture mechanisms.

docs/docs/targets/index.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -334,6 +334,3 @@ You can find end-to-end examples fitting into any of supported property graphs i
334334
* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph" text="Docs to Knowledge Graph" margin="0 0 16px 0" />
335335

336336
* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/product_recommendation" text="Product Recommendation" margin="0 0 16px 0" />
337-
338-
339-

docs/docs/targets/kuzu.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Exports data to a [Kuzu](https://kuzu.com/) graph database.
1313

1414
## Get Started
1515

16-
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
16+
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
1717

1818
## Spec
1919

@@ -59,4 +59,4 @@ You can then access the explorer at [http://localhost:8124](http://localhost:812
5959
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph"
6060
text="Docs to Knowledge Graph"
6161
margin="16px 0 24px 0"
62-
/>
62+
/>

docs/docs/targets/neo4j.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import { ExampleButton } from '../../src/components/GitHubButton';
1111

1212

1313
## Get Started
14-
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
14+
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
1515

1616

1717
## Spec
@@ -59,4 +59,4 @@ If you are building multiple CocoIndex flows from different projects to neo4j, w
5959

6060
This way, you can clean up the data for each flow independently.
6161

62-
In case you need to clean up the data in the same database, you can do it manually by running `cocoindex drop <APP_TARGET>` from the project you want to clean up.
62+
In case you need to clean up the data in the same database, you can do it manually by running `cocoindex drop <APP_TARGET>` from the project you want to clean up.

examples/product_recommendation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Please drop [CocoIndex on Github](https://github.com/cocoindex-io/cocoindex) a s
88

99

1010
## Prerequisite
11-
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres)
11+
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres)
1212
* Install [Neo4j](https://cocoindex.io/docs/targets/neo4j)
1313
* [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
1414

python/cocoindex/typing.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
Literal,
1414
NamedTuple,
1515
Protocol,
16+
Optional,
17+
Sequence,
1618
TypeVar,
1719
overload,
1820
Self,
@@ -64,6 +66,19 @@ def __init__(self, key: str, value: Any):
6466
LocalDateTime = Annotated[datetime.datetime, TypeKind("LocalDateTime")]
6567
OffsetDateTime = Annotated[datetime.datetime, TypeKind("OffsetDateTime")]
6668

69+
70+
def Enum(*, variants: Optional[Sequence[str]] = None) -> Any:
71+
"""
72+
String-like enumerated type. Use `variants` to hint allowed values.
73+
Example:
74+
color: Enum(variants=["red", "green", "blue"])
75+
At runtime this is a plain `str`; `variants` are emitted as schema attrs.
76+
"""
77+
if variants is not None:
78+
return Annotated[str, TypeKind("Enum"), TypeAttr("variants", list(variants))]
79+
return Annotated[str, TypeKind("Enum")]
80+
81+
6782
if TYPE_CHECKING:
6883
T_co = TypeVar("T_co", covariant=True)
6984
Dim_co = TypeVar("Dim_co", bound=int | None, covariant=True, default=None)
@@ -587,6 +602,7 @@ class BasicValueType:
587602
"OffsetDateTime",
588603
"TimeDelta",
589604
"Json",
605+
"Enum",
590606
"Vector",
591607
"Union",
592608
]

src/base/json_schema.rs

Lines changed: 95 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use crate::prelude::*;
2-
32
use crate::utils::immutable::RefList;
3+
use indexmap::IndexMap;
44
use schemars::schema::{
55
ArrayValidation, InstanceType, ObjectValidation, Schema, SchemaObject, SingleOrVec,
66
SubschemaValidation,
@@ -74,6 +74,9 @@ impl JsonSchemaBuilder {
7474
schema::BasicValueType::Str => {
7575
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
7676
}
77+
schema::BasicValueType::Enum => {
78+
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
79+
}
7780
schema::BasicValueType::Bytes => {
7881
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
7982
}
@@ -245,15 +248,34 @@ impl JsonSchemaBuilder {
245248
field_path.prepend(&f.name),
246249
);
247250
if self.options.fields_always_required && f.value_type.nullable {
248-
if let Some(instance_type) = &mut field_schema.instance_type {
249-
let mut types = match instance_type {
250-
SingleOrVec::Single(t) => vec![**t],
251-
SingleOrVec::Vec(t) => std::mem::take(t),
251+
if field_schema.enum_values.is_some() {
252+
// Keep the enum as-is and support null via oneOf
253+
let non_null = Schema::Object(field_schema);
254+
let null_branch = Schema::Object(SchemaObject {
255+
instance_type: Some(SingleOrVec::Single(Box::new(
256+
InstanceType::Null,
257+
))),
258+
..Default::default()
259+
});
260+
field_schema = SchemaObject {
261+
subschemas: Some(Box::new(SubschemaValidation {
262+
one_of: Some(vec![non_null, null_branch]),
263+
..Default::default()
264+
})),
265+
..Default::default()
252266
};
253-
types.push(InstanceType::Null);
254-
*instance_type = SingleOrVec::Vec(types);
267+
} else {
268+
if let Some(instance_type) = &mut field_schema.instance_type {
269+
let mut types = match instance_type {
270+
SingleOrVec::Single(t) => vec![**t],
271+
SingleOrVec::Vec(t) => std::mem::take(t),
272+
};
273+
types.push(InstanceType::Null);
274+
*instance_type = SingleOrVec::Vec(types);
275+
}
255276
}
256277
}
278+
257279
(f.name.to_string(), field_schema.into())
258280
})
259281
.collect(),
@@ -298,9 +320,26 @@ impl JsonSchemaBuilder {
298320
enriched_value_type: &schema::EnrichedValueType,
299321
field_path: RefList<'_, &'_ spec::FieldName>,
300322
) -> SchemaObject {
301-
self.for_value_type(schema_base, &enriched_value_type.typ, field_path)
302-
}
323+
let mut out = self.for_value_type(schema_base, &enriched_value_type.typ, field_path);
324+
325+
if let schema::ValueType::Basic(schema::BasicValueType::Enum) = &enriched_value_type.typ {
326+
if let Some(variants) = enriched_value_type.attrs.get("variants") {
327+
if let Some(arr) = variants.as_array() {
328+
let enum_values: Vec<serde_json::Value> = arr
329+
.iter()
330+
.filter_map(|v| {
331+
v.as_str().map(|s| serde_json::Value::String(s.to_string()))
332+
})
333+
.collect();
334+
if !enum_values.is_empty() {
335+
out.enum_values = Some(enum_values);
336+
}
337+
}
338+
}
339+
}
303340

341+
out
342+
}
304343
fn build_extra_instructions(&self) -> Result<Option<String>> {
305344
if self.extra_instructions_per_field.is_empty() {
306345
return Ok(None);
@@ -458,6 +497,53 @@ mod tests {
458497
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
459498
}
460499

500+
#[test]
501+
fn test_basic_types_enum_without_variants() {
502+
let value_type = EnrichedValueType {
503+
typ: ValueType::Basic(BasicValueType::Enum),
504+
nullable: false,
505+
attrs: Arc::new(BTreeMap::new()),
506+
};
507+
let options = create_test_options();
508+
let result = build_json_schema(value_type, options).unwrap();
509+
let json_schema = schema_to_json(&result.schema);
510+
511+
expect![[r#"
512+
{
513+
"type": "string"
514+
}"#]]
515+
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
516+
}
517+
518+
#[test]
519+
fn test_basic_types_enum_with_variants() {
520+
let mut attrs = BTreeMap::new();
521+
attrs.insert(
522+
"variants".to_string(),
523+
serde_json::json!(["red", "green", "blue"]),
524+
);
525+
526+
let value_type = EnrichedValueType {
527+
typ: ValueType::Basic(BasicValueType::Enum),
528+
nullable: false,
529+
attrs: Arc::new(attrs),
530+
};
531+
let options = create_test_options();
532+
let result = build_json_schema(value_type, options).unwrap();
533+
let json_schema = schema_to_json(&result.schema);
534+
535+
expect![[r#"
536+
{
537+
"enum": [
538+
"red",
539+
"green",
540+
"blue"
541+
],
542+
"type": "string"
543+
}"#]]
544+
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
545+
}
546+
461547
#[test]
462548
fn test_basic_types_bool() {
463549
let value_type = EnrichedValueType {

src/base/schema.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ pub enum BasicValueType {
2323
/// String encoded in UTF-8.
2424
Str,
2525

26+
/// Enumerated symbolic value.
27+
Enum,
28+
2629
/// A boolean value.
2730
Bool,
2831

@@ -71,6 +74,7 @@ impl std::fmt::Display for BasicValueType {
7174
match self {
7275
BasicValueType::Bytes => write!(f, "Bytes"),
7376
BasicValueType::Str => write!(f, "Str"),
77+
BasicValueType::Enum => write!(f, "Enum"),
7478
BasicValueType::Bool => write!(f, "Bool"),
7579
BasicValueType::Int64 => write!(f, "Int64"),
7680
BasicValueType::Float32 => write!(f, "Float32"),

0 commit comments

Comments
 (0)