Simplify seed dataset creation from DataFrames & Datasets

**Is your feature request related to a problem? Please describe.**
Nope

**Describe the solution you'd like**

Presently, if one wants to use a seed dataset that exists as a DataFrame in memory, there are a few hoops to jump through which seem like they could be simplified in the context of local execution. In the example below, I'm doing a common pattern where I'm loading a dataset from HF (but it could also come from anywhere else).

```python
"""Loading A Large Dataset

In this example, I want to load records from the wikipedia dataset, which
is quite large, and I don't want to load it all into RAM. So I'm using 
streaming=True.
"""
doc_iterator = load_dataset(
    "wikimedia/wikipedia",
    "20231101.en",
    split="train",
    streaming=True
)

"""Cast to a DataFrame...

Now, to use with DD today, I need to cast to a fully materialized DataFrame.
This means that I must load materialize all the data I require now, and I cannot,
for instance, progressively generate records from an iterator, like `datasets.IteratedDataset`.
"""
df_documents = pd.DataFrame.from_records(
    [record for record in doc_iterator.take(num_samples)]
)

"""Load into config

Next, I've got to put this into the config builder. However, this requires me to 
make a separate call to a dd.DataDesigner classmethod (??), then give the DF,
and then also, provide a filename -- even though I'm not interested in writing this
to disk.
"""
config_builder.with_seed_dataset(
    dataset_reference=dd.DataDesigner.make_seed_reference_from_dataframe(
        df_documents,
        "wiki.csv"
    )
)

# ... continue with config generation
```

Instead, the desired alternative would be to simply do something like the following as the north star, and have it work for `DataFrames`, `Datasets`, `IteratedDatasets`, or just generic `Iterators` that return dictionaries.

```python
doc_iterator = load_dataset(
    "wikimedia/wikipedia",
    "20231101.en",
    split="train",
    streaming=True
)

config_builder.with_seed_dataset(doc_iterator)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify seed dataset creation from DataFrames & Datasets #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplify seed dataset creation from DataFrames & Datasets #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions