Skip to content

Commit df16b58

Browse files
feat: Add staging brick for Datasaur token-based tasks (#50)
* feat: Add staging brick for Datasaur token-based tasks * Added doc string and formatting with flake8,mypy and black * docs: Added documentation for stage_for_datasaur * fix: version sync correction * fix: Corrections to docs fror stage_for_datasaur * fix: changes in naming of example variables * Update docs/source/bricks.rst Co-authored-by: Matt Robinson <[email protected]>
1 parent d5bd44b commit df16b58

File tree

5 files changed

+46
-1
lines changed

5 files changed

+46
-1
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## 0.2.2-dev0
2+
3+
* Add staging brick for Datasaur
4+
15
## 0.2.1
26

37
* Added brick to convert an ISD dictionary to a list of elements

docs/source/bricks.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -750,3 +750,18 @@ files to an S3 bucket.
750750
751751
upload_staged_files()
752752
753+
``stage_for_datasaur``
754+
--------------------------
755+
Formats a list of ``Text`` elements as input to token based tasks in Datasaur.
756+
757+
Example:
758+
759+
.. code:: python
760+
761+
from unstructured.staging.datasaur import stage_for_datasaur
762+
elements = [Text("Text1"),Text("Text2")]
763+
datasaur_data = stage_for_datasaur(elements)
764+
765+
The output is a list of dictionaries, each one with two keys:
766+
"text" with the content of the element and
767+
"entities" with an empty list.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import unstructured.staging.datasaur as datasaur
2+
3+
from unstructured.documents.elements import Text
4+
5+
6+
def test_stage_for_datasaur():
7+
elements = [Text("Text 1"), Text("Text 2"), Text("Text 3")]
8+
result = datasaur.stage_for_datasaur(elements)
9+
assert result[0]["text"] == "Text 1"
10+
assert result[0]["entities"] == []
11+
assert result[1]["text"] == "Text 2"
12+
assert result[1]["entities"] == []
13+
assert result[2]["text"] == "Text 3"
14+
assert result[2]["entities"] == []

unstructured/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.2.1" # pragma: no cover
1+
__version__ = "0.2.2-dev0" # pragma: no cover

unstructured/staging/datasaur.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from typing import Dict, List, Any
2+
from unstructured.documents.elements import Text
3+
4+
5+
def stage_for_datasaur(elements: List[Text]) -> List[Dict[str, Any]]:
6+
"""Convert a list of elements into a list of dictionaries for use in Datasaur"""
7+
result: List[Dict[str, Any]] = list()
8+
for item in elements:
9+
data = dict(text=item.text, entities=[])
10+
result.append(data)
11+
12+
return result

0 commit comments

Comments
 (0)