Skip to content

Commit 7cc9c5d

Browse files
Merge pull request #72 from FocoosAI/feat/add-dataset
feat: manage remote datasets
2 parents c4efb45 + 52fa901 commit 7cc9c5d

31 files changed

+2050
-889
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,5 @@ notebooks/.data
9090
.venv
9191
/data
9292
tests/junit.xml
93+
notebooks/datasets
94+
site/

Makefile

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.PHONY: venv test install install-gpu run-pre-commit .uv .pre-commit tox
1+
.PHONY: venv test install install-gpu run-pre-commit .uv .pre-commit tox docs
22

33
.uv: ## Check that uv is installed
44
@uv --version || echo 'Please install uv: https://docs.astral.sh/uv/getting-started/installation/'
@@ -10,13 +10,19 @@ venv:
1010
@uv venv --python=python3.12
1111

1212
install: .uv .pre-commit
13-
@uv pip install -e ".[dev]" --no-cache-dir
13+
@uv pip install -e ".[dev,docs]" --no-cache-dir
1414
@pre-commit install
1515

1616
install-gpu: .uv .pre-commit
17-
@uv pip install -e ".[dev,cuda,tensorrt,torch]" --no-cache-dir
17+
@uv pip install -e ".[dev,cuda,tensorrt,torch,docs]" --no-cache-dir
1818
@pre-commit install
1919

20+
docs:
21+
@mkdocs build --clean
22+
23+
serve-docs:
24+
@mkdocs serve
25+
2026
lint:
2127
@ruff check ./focoos ./tests ./notebooks --fix
2228
@ruff format ./focoos ./tests ./notebooks

docs/api/config.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: focoos.config

docs/api/remote_dataset.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: focoos.remote_dataset.RemoteDataset

docs/datasets.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

docs/how_to/inference.md

Lines changed: 0 additions & 91 deletions
This file was deleted.

docs/how_to/user.md

Lines changed: 0 additions & 47 deletions
This file was deleted.

docs/howto/create_dataset.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Dataset Management
2+
3+
This section covers the steps to create, upload, and manage datasets in Focoos using the SDK.
4+
The `focoos` library supports multiple dataset formats, making it flexible for various machine learning tasks.
5+
6+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FocoosAI/focoos/blob/main/notebooks/dataset.ipynb)
7+
8+
9+
In this guide, we will show the following steps:
10+
11+
1. [🧬 Dataset format](#1-dataset-format)
12+
2. [📸 Create dataset](#2-create-dataset)
13+
3. [📤 Upload data](#3-upload-data)
14+
4. [📥 Download your own dataset from Focoos](#4-download-your-own-dataset-from-focoos-platform)
15+
5. [🌍 Download dataset from external sources](#5-download-dataset-from-external-sources)
16+
6. [🗑️ Delete data](#6-delete-data)
17+
7. [🚮 Delete dataset](#7-delete-dataset)
18+
19+
20+
## 1. Dataset format
21+
The `focoos` library currently supports three distinct dataset layouts, providing seamless compatibility with various machine learning workflows. Below are the supported formats along with their respective folder structures:
22+
23+
- **ROBOFLOW_COCO** (Detection, Instance Segmentation):
24+
```python
25+
root/
26+
train/
27+
- _annotations.coco.json
28+
- img_1.jpg
29+
- img_2.jpg
30+
valid/
31+
- _annotations.coco.json
32+
- img_3.jpg
33+
- img_4.jpg
34+
```
35+
- **ROBOFLOW_SEG** (Semantic Segmentation):
36+
```python
37+
root/
38+
train/
39+
- _classes.csv (comma separated csv)
40+
- img_1.jpg
41+
- img_2.jpg
42+
valid/
43+
- _classes.csv (comma separated csv)
44+
- img_3_mask.png
45+
- img_4_mask.png
46+
```
47+
- **SUPERVISELY** (Semantic Segmentation):
48+
```python
49+
root/
50+
train/
51+
meta.json
52+
img/
53+
ann/
54+
mask/
55+
valid/
56+
meta.json
57+
img/
58+
ann/
59+
mask/
60+
```
61+
62+
!!! Note
63+
More dataset formats will be added soon. If you need support for a specific format, feel free to reach out via email at [support@focoos.ai](mailto:support@focoos.ai)
64+
65+
66+
## 2. Create dataset
67+
The `focoos` library enables you to create datasets tailored for specific deep learning tasks, such as object detection and semantic segmentation. The available computer vision tasks are defined in the [FocoosTask function](../api/ports.md/#focoos.ports.FocoosTask). Each dataset must follow a specific structure to ensure compatibility with the Focoos platform. You can select the appropriate dataset format from the supported options detailed in [Dataset Format](#1-dataset-format).
68+
69+
Use the following code to create a new dataset:
70+
71+
```python
72+
from focoos import DatasetLayout, Focoos, FocoosTask
73+
74+
focoos = Focoos(api_key="<YOUR-API-KEY>")
75+
76+
# Create a new remote dataset
77+
dataset = focoos.add_remote_dataset(
78+
name="my-dataset",
79+
description="My custom dataset for object detection",
80+
layout=DatasetLayout.ROBOFLOW_COCO, # Choose dataset format
81+
task=FocoosTask.DETECTION # Specify the task type
82+
)
83+
```
84+
85+
86+
## 3. Upload data
87+
Once you've created a dataset, you can upload your data as a ZIP archive from your local folder:
88+
89+
```python
90+
dataset.upload_data("./datasets/my_dataset.zip")
91+
```
92+
93+
After the upload, you can check dataset [preview](../api/ports.md/#focoos.ports.DatasetPreview) using:
94+
95+
```python
96+
dataset_info = dataset.get_info()
97+
print(dataset_info)
98+
```
99+
100+
Alternatively, you can list all available datasets (both personal and shared):
101+
102+
```python
103+
datasets = focoos.list_datasets()
104+
for dataset in datasets:
105+
print(f"Name: {dataset.name}")
106+
print(f"Reference: {dataset.ref}")
107+
print(f"Task: {dataset.task}")
108+
print(f"Description: {dataset.description}")
109+
print(f"spec: {dataset.spec}")
110+
print("-" * 50)
111+
```
112+
113+
114+
## 4. Download your own dataset from Focoos platform
115+
If you have previously uploaded a dataset to Focoos platform, you can retrieve it by following these steps.
116+
First, list all your datasets to identify the dataset reference:
117+
118+
119+
```python
120+
datasets = focoos.list_datasets()
121+
122+
for dataset in datasets:
123+
print(f"Name: {dataset.name}")
124+
print(f"Reference: {dataset.ref}")
125+
```
126+
127+
Once you have the dataset reference, use the following code to download the associated data to a predefined local folder:
128+
129+
```python
130+
dataset_ref = "<YOUR-DATASET-REFERENCE>"
131+
dataset = focoos.get_remote_dataset(dataset_ref)
132+
133+
dataset.download_data("./<YOUR-DATA-FOLDER>/")
134+
```
135+
136+
137+
138+
## 5. Download dataset from external sources
139+
You can also download datasets from external sources like Dataset-Ninja (Supervisely) and Roboflow Universe, then upload them to the Focoos platform for use in your projects.
140+
141+
=== "pip"
142+
```bash linenums="0"
143+
pip install dataset-tools roboflow
144+
pip install setuptools
145+
```
146+
147+
- **Dataset Ninja**:
148+
```python
149+
import dataset_tools as dtools
150+
151+
dtools.download(dataset="dacl10k", dst_dir="./datasets/dataset-ninja/")
152+
```
153+
154+
- **Roboflow**:
155+
```python
156+
import os
157+
158+
from roboflow import Roboflow
159+
160+
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))
161+
project = rf.workspace("roboflow-58fyf").project("rock-paper-scissors-sxsw")
162+
version = project.version(14)
163+
dataset = version.download("coco")
164+
```
165+
166+
167+
## 6. Delete data
168+
If you need to remove specific files from an existing dataset without deleting the entire dataset, you can do so by specifying the filename. This is useful when updating or refining your dataset.
169+
170+
Use the following command:
171+
```python
172+
dataset_ref = "<YOUR-DATASET-REFERENCE>"
173+
dataset = focoos.get_remote_dataset(dataset_ref)
174+
dataset.delete_data()
175+
```
176+
!!! warning
177+
This will permanently remove the specified file from your dataset in Focoos platform. Be sure to double-check the filename before executing the command, as deleted data cannot be recovered.
178+
179+
180+
181+
## 7. Delete dataset
182+
If you want to remove an entire dataset from the Focoos platform, use the following command:
183+
184+
```python
185+
dataset_ref = "<YOUR-DATASET-REFERENCE>"
186+
dataset = focoos.get_remote_dataset(dataset_ref)
187+
dataset.delete()
188+
```
189+
!!! warning
190+
Deleting a dataset is irreversible. Once deleted, all data associated with the dataset is permanently lost and cannot be recovered.
191+
192+
##

0 commit comments

Comments
 (0)