Skip to content

Commit 076156e

Browse files
Add cli command to get the list of the jobs (#1088)
* Add cli command to get the list of the jobs This provides a way to get the list of jobs from the studio. Currently, I selected the following field as table headers. We can modify it as we want. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint * Add limit params * Add documentation * Add to sidebar --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 5386def commit 076156e

File tree

5 files changed

+160
-3
lines changed

5 files changed

+160
-3
lines changed

docs/commands/job/ls.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# job ls
2+
3+
List jobs in Studio.
4+
5+
## Synopsis
6+
7+
```usage
8+
usage: datachain job ls [-h] [-v] [-q] [--status STATUS] [--team TEAM] [--limit LIMIT]
9+
```
10+
11+
## Description
12+
13+
This command lists jobs in Studio. You can filter jobs by their status, specify a team, and limit the number of jobs returned. By default, it shows the 20 most recent jobs.
14+
15+
16+
## Options
17+
18+
* `--status STATUS` - Status to filter jobs by
19+
* `--team TEAM` - Team to list jobs for (default: from config)
20+
* `--limit LIMIT` - Limit the number of jobs returned (default: 20)
21+
* `-h`, `--help` - Show the help message and exit
22+
* `-v`, `--verbose` - Be verbose
23+
* `-q`, `--quiet` - Be quiet
24+
25+
## Status options
26+
27+
You will be able to filter the job with following status:
28+
29+
* `CREATED` - Job has been created but not yet scheduled
30+
* `SCHEDULED` - Job is scheduled to run at a future time
31+
* `QUEUED` - Job is in the queue waiting to be executed
32+
* `INIT` - Job is initializing and preparing to run
33+
* `RUNNING` - Job is currently executing
34+
* `COMPLETE` - Job has finished successfully
35+
* `FAILED` - Job has failed during execution
36+
* `CANCELING_SCHEDULED` - A scheduled job is being canceled
37+
* `CANCELING` - A running job is being canceled
38+
* `CANCELED` - Job has been canceled
39+
* `ACTIVE` - Job is in active state.
40+
* `INACTIVE` - Job is in inactive state.
41+
42+
Note: The following statuses are considered active jobs:
43+
44+
* `CREATED`
45+
* `SCHEDULED`
46+
* `QUEUED`
47+
* `INIT`
48+
* `RUNNING`
49+
* `CANCELING_SCHEDULED`
50+
* `CANCELING`
51+
52+
53+
## Examples
54+
55+
1. List all jobs (default limit of 20):
56+
```bash
57+
datachain job ls
58+
```
59+
60+
2. List jobs for a specific team:
61+
```bash
62+
datachain job ls --team my-team
63+
```
64+
65+
3. List jobs with a specific status:
66+
```bash
67+
datachain job ls --status complete
68+
```
69+
70+
4. List more jobs by increasing the limit:
71+
```bash
72+
datachain job ls --limit 50
73+
```
74+
75+
5. List jobs with verbose output:
76+
```bash
77+
datachain job ls -v
78+
```
79+
80+
## Notes
81+
82+
* The default limit of 20 jobs helps manage the output size and performance
83+
* Jobs are typically listed in reverse chronological order (newest first)
84+
* Use the `--status` filter to find jobs in specific states (e.g., running, completed, failed)

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ nav:
9696
- run: commands/job/run.md
9797
- logs: commands/job/logs.md
9898
- cancel: commands/job/cancel.md
99+
- ls: commands/job/ls.md
99100
- 📡 Interacting with remote storage: references/remotes.md
100101
- 🤝 Contributing: contributing.md
101102

src/datachain/cli/parser/job.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,36 @@ def add_jobs_parser(subparsers, parent_parser) -> None:
8383
help="Python package requirements",
8484
)
8585

86+
studio_ls_help = "List jobs in Studio"
87+
studio_ls_description = "List jobs in Studio."
88+
89+
studio_ls_parser = jobs_subparser.add_parser(
90+
"ls",
91+
parents=[parent_parser],
92+
description=studio_ls_description,
93+
help=studio_ls_help,
94+
formatter_class=CustomHelpFormatter,
95+
)
96+
97+
studio_ls_parser.add_argument(
98+
"--status",
99+
action="store",
100+
help="Status to filter jobs by",
101+
)
102+
103+
studio_ls_parser.add_argument(
104+
"--team",
105+
action="store",
106+
default=None,
107+
help="Team to list jobs for (default: from config)",
108+
)
109+
studio_ls_parser.add_argument(
110+
"--limit",
111+
type=int,
112+
default=20,
113+
help="Limit the number of jobs returned (default: 20)",
114+
)
115+
86116
studio_cancel_help = "Cancel a job in Studio"
87117
studio_cancel_description = "Cancel a running job in Studio."
88118

src/datachain/remote/studio.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
DatasetExportSignedUrls = Optional[list[str]]
3030
FileUploadData = Optional[dict[str, Any]]
3131
JobData = Optional[dict[str, Any]]
32-
32+
JobListData = dict[str, Any]
3333
logger = logging.getLogger("datachain")
3434

3535
DATASET_ROWS_CHUNK_SIZE = 8192
@@ -402,6 +402,17 @@ def create_job(
402402
}
403403
return self._send_request("datachain/job", data)
404404

405+
def get_jobs(
406+
self,
407+
status: Optional[str] = None,
408+
limit: int = 20,
409+
) -> Response[JobListData]:
410+
return self._send_request(
411+
"datachain/jobs",
412+
{"status": status, "limit": limit} if status else {"limit": limit},
413+
method="GET",
414+
)
415+
405416
def cancel_job(
406417
self,
407418
job_id: str,

src/datachain/studio.py

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
import sys
44
from typing import TYPE_CHECKING, Optional
55

6+
import tabulate
7+
68
from datachain.config import Config, ConfigLevel
79
from datachain.dataset import QUERY_DATASET_PREFIX
810
from datachain.error import DataChainError
@@ -44,6 +46,10 @@ def process_jobs_args(args: "Namespace"):
4446
return cancel_job(args.id, args.team)
4547
if args.cmd == "logs":
4648
return show_job_logs(args.id, args.team)
49+
50+
if args.cmd == "ls":
51+
return list_jobs(args.status, args.team, args.limit)
52+
4753
raise DataChainError(f"Unknown command '{args.cmd}'.")
4854

4955

@@ -240,13 +246,13 @@ async def _run():
240246
raise DataChainError(response.message)
241247

242248
response_data = response.data
243-
if response_data:
249+
if response_data and response_data.get("dataset_versions"):
244250
dataset_versions = response_data.get("dataset_versions", [])
245251
print("\n\n>>>> Dataset versions created during the job:")
246252
for version in dataset_versions:
247253
print(f" - {version.get('dataset_name')}@v{version.get('version')}")
248254
else:
249-
print("No dataset versions created during the job.")
255+
print("\n\nNo dataset versions created during the job.")
250256

251257

252258
def create_job(
@@ -337,6 +343,31 @@ def cancel_job(job_id: str, team_name: Optional[str]):
337343
print(f"Job {job_id} canceled")
338344

339345

346+
def list_jobs(status: Optional[str], team_name: Optional[str], limit: int):
347+
client = StudioClient(team=team_name)
348+
response = client.get_jobs(status, limit)
349+
if not response.ok:
350+
raise DataChainError(response.message)
351+
352+
jobs = response.data.get("jobs", [])
353+
if not jobs:
354+
print("No jobs found")
355+
return
356+
357+
rows = [
358+
{
359+
"ID": job.get("id"),
360+
"Name": job.get("name"),
361+
"Status": job.get("status"),
362+
"Created at": job.get("created_at"),
363+
"Created by": job.get("created_by"),
364+
}
365+
for job in jobs
366+
]
367+
368+
print(tabulate.tabulate(rows, headers="keys", tablefmt="grid"))
369+
370+
340371
def show_job_logs(job_id: str, team_name: Optional[str]):
341372
token = Config().read().get("studio", {}).get("token")
342373
if not token:

0 commit comments

Comments
 (0)