You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert

The `list()` method in `HackerNewsConnector` is responsible for **discovering all available HackerNews threads** that match the given criteria (tag, max results) and returning metadata about them. CocoIndex uses this to **know which threads exist** and which may have changed.
@@ -255,6 +265,8 @@ This enables incremental refresh:
This async method fetches a **single HackerNews thread** (including its comments) from the **API**, and wraps the result in a `PartialSourceRowData` object — the structure CocoIndex uses for row-level ingestion.
@@ -287,6 +299,9 @@ async def get_value(
287
299
- Parses the raw JSON into structured Python objects (`_HackerNewsThread` + `_HackerNewsComment`).
288
300
- Returns a `PartialSourceRowData` containing the full thread.
Tells CocoIndex that this source provides ordinals. You can use any property that increases monotonically on change as an ordinal. We use a timestamp here. E.g., a timestamp or a version number.
@@ -314,6 +329,8 @@ Sync 2 (30s later):
314
329
315
330
This is why ordinals (timestamps) matter. Without them, you'd fetch everything every time.
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them forindexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the resultin`data_scope["threads"]`for downstream steps.
Finally, it creates two collectors—one for storing indexed messages and another for extracted topics—providing the core storage layers the rest of the pipeline will build on.
This dataclass defines a **Topic**, representing a single normalized concept extracted from text—such as a product, technology, company, person, or domain. It provides a prompt for the LLM to extract topics into structured information. Here we used a simple string. You could also generate [knowledge graphs](https://cocoindex.io/docs/examples/knowledge-graph-for-docs), or use it to extract other information too.
466
+
This dataclass defines a **Topic**, representing a single normalized concept extracted from text—such as a product, technology, company, person, or domain. It provides a prompt for the LLM to extract topics into structured information. Here we used a simple string. You could also generate [knowledge graphs](https://cocoindex.io/docs/examples/knowledge-graph-for-docs), or use it to extract other information too.
467
+
446
468
447
469
#### Process Each Thread and Use LLM for Extraction
448
470
@@ -486,6 +508,8 @@ This block processes each HackerNews thread as it flows through the pipeline. In
486
508
- We use `message_index` to collect relevant metadata for this thread.
487
509
- We use `topic_index` to collect extracted topics and their relationships with threads.
### search_by_topic(query) → Find discussions about X
@@ -610,6 +636,8 @@ The `@hackernews_trending_topics_flow.query_handler()` decorator registers `sear
610
636
611
637
When a topic string is provided, the function determines the actual database table names for the topics and messages collectors, then connects to the database and runs a SQL query that finds all topic records matching the search term (case-insensitive) and joins them with their corresponding message entries.
0 commit comments