Skip to content

Commit b53e2fa

Browse files
authored
[doc]Update hackernews_trending_topics.md link examples to documentation (#1347)
Update hackernews_trending_topics.md
1 parent 7bad4ab commit b53e2fa

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

docs/docs/examples/examples/hackernews_trending_topics.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ tags: [custom-building-blocks, structured-data-extraction]
1212
authors: [linghua]
1313
---
1414

15+
import { DocumentationButton, GitHubButton } from '../../../src/components/GitHubButton';
16+
17+
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/hn_trending_topics" margin="0 0 24px 0" />
18+
1519
![Building a Real-Time HackerNews Trending Topics Detector with CocoIndex: A Deep Dive into Custom Sources and AI](/img/examples/hackernews-trending-topics/cover.png)
1620

1721

@@ -102,6 +106,8 @@ CocoInsight UI / API Clients
102106
103107
## Custom Source
104108
109+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources" text="Custom Sources" margin="0 0 16px 0" />
110+
105111
### Defining the Data Model
106112
![HackerNews Data Model](/img/examples/hackernews-trending-topics/hackernews.png)
107113
@@ -168,6 +174,8 @@ A `SourceSpec` holds config for the source:
168174
169175
When the flow is created, these parameters feed into the connector.
170176
177+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources#source-spec" text="Source Spec" margin="0 0 16px 0" />
178+
171179
#### Defining the Connector
172180
173181
This is the core of the custom source.
@@ -207,6 +215,8 @@ class HackerNewsConnector:
207215
208216
CocoIndex calls `create` once when building the flow.
209217
218+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources#source-connector" text="Source Connector" margin="0 0 16px 0" />
219+
210220
#### Listing Available Threads
211221
212222
The `list()` method in `HackerNewsConnector` is responsible for **discovering all available HackerNews threads** that match the given criteria (tag, max results) and returning metadata about them. CocoIndex uses this to **know which threads exist** and which may have changed.
@@ -255,6 +265,8 @@ This enables incremental refresh:
255265
- CocoIndex remembers ordinals
256266
- Only fetches full items when ordinals change
257267
268+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources#async-def-listoptions-required" text="list() method" margin="8px 0 16px 0" />
269+
258270
#### Fetching Full Thread Content
259271
260272
This async method fetches a **single HackerNews thread** (including its comments) from the **API**, and wraps the result in a `PartialSourceRowData` object — the structure CocoIndex uses for row-level ingestion.
@@ -287,6 +299,9 @@ async def get_value(
287299
- Parses the raw JSON into structured Python objects (`_HackerNewsThread` + `_HackerNewsComment`).
288300
- Returns a `PartialSourceRowData` containing the full thread.
289301
302+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources#async-def-get_valuekey-options-required" text="get_value() method" margin="8px 0 16px 0" />
303+
304+
290305
#### Ordinal Support
291306
292307
Tells CocoIndex that this source provides ordinals. You can use any property that increases monotonically on change as an ordinal. We use a timestamp here. E.g., a timestamp or a version number.
@@ -314,6 +329,8 @@ Sync 2 (30s later):
314329
315330
This is why ordinals (timestamps) matter. Without them, you'd fetch everything every time.
316331

332+
<DocumentationButton url="https://cocoindex.io/docs/custom_ops/custom_sources#def-provides_ordinal-optional" text="provides_ordinal() method" margin="8px 0 16px 0" />
333+
317334
#### Parsing JSON into Structured Data
318335

319336
HackerNews returns comments in a tree structure:
@@ -403,8 +420,12 @@ def hackernews_trending_topics_flow(
403420
404421
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them for indexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the result in `data_scope["threads"]` for downstream steps.
405422
423+
<DocumentationButton url="https://cocoindex.io/docs/core/flow_def" text="Flow Definition Docs" margin="0 0 16px 0" />
424+
406425
Finally, it creates two collectors—one for storing indexed messages and another for extracted topics—providing the core storage layers the rest of the pipeline will build on.
407426
427+
<DocumentationButton url="https://cocoindex.io/docs/core/flow_def#data-collector" text="Data Collector" margin="0 0 16px 0" />
428+
408429
![Ingesting Data](/img/examples/hackernews-trending-topics/ingest.png)
409430
410431
### Process Each Thread
@@ -442,7 +463,8 @@ class Topic:
442463
topic: str
443464
```
444465
445-
This dataclass defines a **Topic**, representing a single normalized concept extracted from text—such as a product, technology, company, person, or domain. It provides a prompt for the LLM to extract topics into structured information. Here we used a simple string. You could also generate [knowledge graphs](https://cocoindex.io/docs/examples/knowledge-graph-for-docs), or use it to extract other information too.
466+
This dataclass defines a **Topic**, representing a single normalized concept extracted from text—such as a product, technology, company, person, or domain. It provides a prompt for the LLM to extract topics into structured information. Here we used a simple string. You could also generate [knowledge graphs](https://cocoindex.io/docs/examples/knowledge-graph-for-docs), or use it to extract other information too.
467+
446468
447469
#### Process Each Thread and Use LLM for Extraction
448470
@@ -486,6 +508,8 @@ This block processes each HackerNews thread as it flows through the pipeline. In
486508
- We use `message_index` to collect relevant metadata for this thread.
487509
- We use `topic_index` to collect extracted topics and their relationships with threads.
488510
511+
<DocumentationButton url="https://cocoindex.io/docs/ops/functions#extractbyllm" text="ExtractByLlm" margin="0 0 16px 0" />
512+
489513
![Extract topic](/img/examples/hackernews-trending-topics/topic.png)
490514
491515
### Index Individual Comments
@@ -552,6 +576,8 @@ In short, this block enriches every comment with LLM-derived topics, indexes the
552576
)
553577
```
554578
579+
<DocumentationButton url="https://cocoindex.io/docs/targets/postgres" text="Postgres Target" margin="0 0 16px 0" />
580+
555581
## Query Handlers
556582
557583
### search_by_topic(query) → Find discussions about X
@@ -610,6 +636,8 @@ The `@hackernews_trending_topics_flow.query_handler()` decorator registers `sear
610636
611637
When a topic string is provided, the function determines the actual database table names for the topics and messages collectors, then connects to the database and runs a SQL query that finds all topic records matching the search term (case-insensitive) and joins them with their corresponding message entries.
612638
639+
<DocumentationButton url="https://cocoindex.io/docs/query#query-handler" text="Query Handler" margin="0 0 16px 0" />
640+
613641
### get_threads_for_topic(topic) → Threads discussing X
614642
615643
```python

0 commit comments

Comments
 (0)