You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/how-to/document_search/ingest-documents.md
+6-64Lines changed: 6 additions & 64 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
The Ragbits document ingest pipeline consists of four main steps: loading, parsing, enrichment, and indexing. All of these steps can be orchestrated using different strategies, depending on the expected load.
4
4
5
-
## Loading sources
5
+
## Loading dataset
6
6
7
-
Before a document can be processed, it must be defined and downloaded. In Ragbits, there are a few ways to do this: you can specify the source URI, the source instance, the document metadata or the document itself.
7
+
Before processing a document in Ragbits, it must first be defined and downloaded. This can be done in several ways: by specifying a source URI or using an instance of [`Source`][ragbits.core.sources.base.Source], [`DocumentMeta`][ragbits.document_search.documents.document.DocumentMeta]or [`Document`][ragbits.document_search.documents.document.Document].
8
8
9
9
=== "URI"
10
10
@@ -19,7 +19,7 @@ Before a document can be processed, it must be defined and downloaded. In Ragbit
19
19
=== "Source"
20
20
21
21
```python
22
-
from ragbits.document_search.documents.sources import WebSource
22
+
from ragbits.core.sources import WebSource
23
23
from ragbits.document_search import DocumentSearch
24
24
25
25
document_search = DocumentSearch(...)
@@ -49,65 +49,7 @@ Before a document can be processed, it must be defined and downloaded. In Ragbit
There are also other ways to submit jobs to the Ray cluster. For more information, please refer to the [Ray documentation](https://docs.ray.io/en/latest/ray-overview/index.html).
@@ -300,7 +242,7 @@ To define a new ingest strategy, extend the [`IngestStrategy`][ragbits.document_
300
242
```python
301
243
from ragbits.core.vector_stores import VectorStore
302
244
from ragbits.document_search.documents.document import Document, DocumentMeta
303
-
from ragbits.document_search.documents.sources import Source
245
+
from ragbits.core.sources import Source
304
246
from ragbits.document_search.ingestion.enrichers import ElementEnricherRouter
305
247
from ragbits.document_search.ingestion.parsers import DocumentParserRouter
306
248
from ragbits.document_search.ingestion.strategies import (
Ragbits provides an abstraction for handling datasets. The [`Source`][ragbits.core.sources.Source] component is designed to define interactions with any data source, such as downloading and querying.
4
+
5
+
## Supported sources
6
+
7
+
This is the list of currently supported sources by Ragbits.
0 commit comments