diff --git a/docs/docs/examples/examples/custom_targets.md b/docs/docs/examples/examples/custom_targets.md index e4292b20a..c0470b3aa 100644 --- a/docs/docs/examples/examples/custom_targets.md +++ b/docs/docs/examples/examples/custom_targets.md @@ -9,19 +9,13 @@ sidebar_custom_props: tags: [custom-building-blocks] tags: [custom-building-blocks] --- -import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton'; +import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton'; ## Overview -Let’s walk through a simple example—exporting `.md` files as `.html` using a custom file-based target. This project monitors folder changes and continuously converts markdown to HTML incrementally. -Check out the full [source code](https://github.com/cocoindex-io/cocoindex/tree/main/examples/custom_output_files). - -The overall flow is simple: -This example focuses on -- how to configure your custom target -- the flow effortless picks up the changes in the source, recomputes only what's changed and export to the target +Let’s walk through a simple example—exporting `.md` files as `.html` using a custom file-based target. This project monitors folder changes and continuously converts markdown to HTML incrementally. The overall flow is simple and primarily focuses on how to configure your custom target. ## Ingest files @@ -33,16 +27,13 @@ Ingest a list of markdown files: def custom_output_files( flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope ) -> None: - """ - Define an example flow that exports markdown files to HTML files. - """ data_scope["documents"] = flow_builder.add_source( cocoindex.sources.LocalFile(path="data", included_patterns=["*.md"]), refresh_interval=timedelta(seconds=5), ) ``` This ingestion creates a table with `filename` and `content` fields. - + ## Process each file and collect @@ -50,11 +41,12 @@ Define custom function that converts markdown to HTML ```python @cocoindex.op.function() - def markdown_to_html(text: str) -> str: return _markdown_it.render(text) ``` + + Define data collector and transform each document to html. ```python @@ -63,12 +55,15 @@ with data_scope["documents"].row() as doc: doc["html"] = doc["content"].transform(markdown_to_html) output_html.collect(filename=doc["filename"], html=doc["html"]) ``` +![Convert markdown to html](/img/examples/custom_targets/convert.png) ## Define the custom target ### Define the target spec + + The target spec contains a directory for output files: ```python @@ -76,8 +71,11 @@ class LocalFileTarget(cocoindex.op.TargetSpec): directory: str ``` + ### Implement the connector + + `get_persistent_key()` defines the persistent key, which uniquely identifies the target for change tracking and incremental updates. Here, we simply use the target directory as the key (e.g., `./data/output`). @@ -180,17 +178,15 @@ def mutate( ### Use it in the Flow ```python - output_html.export( - "OutputHtml", - LocalFileTarget(directory="output_html"), - primary_key_fields=["filename"], - ) +output_html.export( + "OutputHtml", + LocalFileTarget(directory="output_html"), + primary_key_fields=["filename"], +) ``` ## Run the example -Once your pipeline is set up, keeping your knowledge graph updated is simple: - ```bash pip install -e . cocoindex update --setup main.py diff --git a/docs/static/img/examples/codebase_index/chunk.png b/docs/static/img/examples/codebase_index/chunk.png index c731a96f8..e5e7cd42b 100644 Binary files a/docs/static/img/examples/codebase_index/chunk.png and b/docs/static/img/examples/codebase_index/chunk.png differ diff --git a/docs/static/img/examples/custom_targets/convert.png b/docs/static/img/examples/custom_targets/convert.png new file mode 100644 index 000000000..11ce0e679 Binary files /dev/null and b/docs/static/img/examples/custom_targets/convert.png differ diff --git a/examples/custom_output_files/README.md b/examples/custom_output_files/README.md index 7d1df94fd..a747cc11c 100644 --- a/examples/custom_output_files/README.md +++ b/examples/custom_output_files/README.md @@ -1,5 +1,4 @@ -# Build text embedding and semantic search 🔍 -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cocoindex-io/cocoindex/blob/main/examples/text_embedding/Text_Embedding.ipynb) +# Export markdown files to local Html with Custom Targets [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) In this example, we will build index flow to load data from a local directory, convert them to HTML, and save the data to another local directory powered by [CocoIndex Custom Targets](https://cocoindex.io/docs/custom_ops/custom_targets).