Skip to content

Commit 8733bc1

Browse files
authored
Merge branch 'main' into qdrant
2 parents d814974 + c3a7e50 commit 8733bc1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1372
-549
lines changed

Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,4 +87,7 @@ hyper-rustls = { version = "0.27.5" }
8787
yup-oauth2 = "12.1.0"
8888
rustls = { version = "0.23.25" }
8989
http-body-util = "0.1.3"
90+
yaml-rust2 = "0.10.0"
91+
urlencoding = "2.1.3"
9092
qdrant-client = "1.13.0"
93+

docs/docs/about/community.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: Join the CocoIndex community
77

88
Welcome with a huge coconut hug 🥥⋆。˚🤗.
99

10-
We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests on [GitHub](https://github.com/cocoIndex/cocoindex), and discussions in our [Discord](https://discord.com/invite/zpA9S2DR7s).
10+
We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests on [GitHub](https://github.com/cocoindex-io/cocoindex), and discussions in our [Discord](https://discord.com/invite/zpA9S2DR7s).
1111

1212
We would love to fostering an inclusive, welcoming, and supportive environment. Contributing to CocoIndex should feel collaborative, friendly and enjoyable for everyone. Together, we can build better AI applications through robust data infrastructure.
1313

docs/docs/about/contributing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ We love contributions from our community! This guide explains how to get involve
3636

3737
To submit your code:
3838

39-
1. Fork the [CocoIndex repository](https://github.com/cocoIndex/cocoindex)
39+
1. Fork the [CocoIndex repository](https://github.com/cocoindex-io/cocoindex)
4040
2. [Create a new branch](https://docs.github.com/en/desktop/making-changes-in-a-branch/managing-branches-in-github-desktop) on your fork
4141
3. Make your changes
4242
4. [Open a Pull Request (PR)](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) when your work is ready for review

docs/docs/core/cli.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ The following subcommands are available:
6565
| `setup` | Check and apply setup changes for flows, including the internal and target storage (to export). |
6666
| `show` | Show the spec for a specific flow. |
6767
| `update` | Update the index defined by the flow. |
68+
| `evaluate` | Evaluate the flow and dump flow outputs to files. Instead of updating the index, it dumps what should be indexed to files. Mainly used for evaluation purpose. |
6869

6970
Use `--help` to see the full list of subcommands, and `subcommand --help` to see the usage of a specific one.
7071

docs/docs/core/flow_methods.mdx

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ After a flow is defined as discussed in [Flow Definition](/docs/core/flow_def),
1212

1313
## update
1414

15-
The `update()` method will update will update the index defined by the flow.
15+
The `update()` method will update the index defined by the flow.
1616

1717
Once the function returns, the indice is fresh up to the moment when the function is called.
1818

@@ -23,5 +23,25 @@ Once the function returns, the indice is fresh up to the moment when the functio
2323
flow.update()
2424
```
2525

26+
</TabItem>
27+
</Tabs>
28+
29+
## evaluate_and_dump
30+
31+
The `evaluate_and_dump()` method evaluates the flow and dump flow outputs to files.
32+
33+
It takes a `EvaluateAndDumpOptions` dataclass as input to configure, with the following fields:
34+
35+
* `output_dir` (type: `str`, required): The directory to dump the result to.
36+
* `use_cache` (type: `bool`, default: `True`): Use already-cached intermediate data if available.
37+
Note that we only reuse existing cached data without updating the cache even if it's turned on.
38+
39+
<Tabs>
40+
<TabItem value="python" label="Python" default>
41+
42+
```python
43+
flow.evaluate_and_dump(EvaluateAndDumpOptions(output_dir="./eval_output"))
44+
```
45+
2646
</TabItem>
2747
</Tabs>

docs/docs/getting_started/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,6 @@ It will ask you to enter a query and it will return the top 10 results.
217217
Next, you may want to:
218218
219219
* Learn about [CocoIndex Basics](../core/basics.md).
220-
* Learn about other examples in the [examples](https://github.com/cocoIndex/cocoindex/tree/main/examples) directory.
220+
* Learn about other examples in the [examples](https://github.com/cocoindex-io/cocoindex/tree/main/examples) directory.
221221
* The `text_embedding` example is this quickstart with some polishing (loading environment variables from `.env` file, extract pieces shared by the indexing flow and query handler into a function).
222222
* Pick other examples to learn upon your interest.

docs/docs/ops/functions.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,17 @@ Return type: `vector[float32; N]`, where `N` is determined by the model
4949
* `output_type` (type: `type`, required): The type of the output. e.g. a dataclass type name. See [Data Types](/docs/core/data_types) for all supported data types. The LLM will output values that match the schema of the type.
5050
* `instruction` (type: `str`, optional): Additional instruction for the LLM.
5151

52+
:::tip Clear type definitions
53+
54+
Definitions of the `output_type` is fed into LLM as guidance to generate the output.
55+
To improve the quality of the extracted information, giving clear definitions for your dataclasses is especially important, e.g.
56+
57+
* Provide readable field names for your dataclasses.
58+
* Provide reasonable docstrings for your dataclasses.
59+
* For any optional fields, clearly annotate that they are optional, by `SomeType | None` or `typing.Optional[SomeType]`.
60+
61+
:::
62+
5263
Input data:
5364

5465
* `text` (type: `str`, required): The text to extract information from.

examples/code_embedding/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,14 @@
1-
Simple example for cocoindex: build embedding index based on local files.
1+
# Build embedding index for codebase
2+
3+
![Build embedding index for codebase](https://cocoindex.io/blogs/assets/images/cover-9bf0a7cff69b66a40918ab2fc1cea0c7.png)
4+
5+
In this example, we will build an embedding index for a codebase using CocoIndex. CocoIndex provides built-in support for code base chunking, with native Tree-sitter support. [Tree-sitter](https://en.wikipedia.org/wiki/Tree-sitter_%28parser_generator%29) is a parser generator tool and an incremental parsing library, it is available in Rust 🦀 - [GitHub](https://github.com/tree-sitter/tree-sitter). CocoIndex has built-in Rust integration with Tree-sitter to efficiently parse code and extract syntax trees for various programming languages.
6+
7+
8+
Please give [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a star to support us if you like our work. Thank you so much with a warm coconut hug 🥥🤗. [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
9+
10+
You can find a detailed blog post with step by step tutorial and explanations [here](https://cocoindex.io/blogs/index-code-base-for-rag).
11+
212

313
## Prerequisite
414
[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
33 KB
Binary file not shown.
48.4 KB
Binary file not shown.

0 commit comments

Comments
 (0)