Skip to content

Commit d78f129

Browse files
authored
Merge pull request #12 from dbpedia/download-capabilities
Download capabilities and docker image
2 parents 82de07f + faf7f65 commit d78f129

File tree

8 files changed

+474
-129
lines changed

8 files changed

+474
-129
lines changed

Dockerfile

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM python:3.10-slim
2+
3+
WORKDIR /data
4+
5+
COPY . .
6+
7+
# Install dependencies
8+
RUN pip install .
9+
10+
# Use ENTRYPOINT for the CLI
11+
ENTRYPOINT ["databusclient"]

README.md

Lines changed: 101 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,18 @@ Options:
2323
2424
Commands:
2525
deploy
26-
downoad
26+
download
2727
```
28+
29+
## Docker Image Usage
30+
31+
A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). See [download section](#usage-of-docker-image) for details.
32+
2833
### Deploy command
2934
```
3035
databusclient deploy --help
3136
```
3237
```
33-
34-
3538
Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...
3639
3740
Arguments:
@@ -40,23 +43,23 @@ Arguments:
4043
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]
4144
4245
Options:
43-
--versionid TEXT target databus version/dataset identifier of the form <h
46+
--version-id TEXT Target databus version/dataset identifier of the form <h
4447
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
4548
RSION> [required]
46-
--title TEXT dataset title [required]
47-
--abstract TEXT dataset abstract max 200 chars [required]
48-
--description TEXT dataset description [required]
49-
--license TEXT license (see dalicc.net) [required]
50-
--apikey TEXT apikey [required]
49+
--title TEXT Dataset title [required]
50+
--abstract TEXT Dataset abstract max 200 chars [required]
51+
--description TEXT Dataset description [required]
52+
--license TEXT License (see dalicc.net) [required]
53+
--apikey TEXT API key [required]
5154
--help Show this message and exit.
5255
```
5356
Examples of using deploy command
5457
```
55-
databusclient deploy --versionid https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
58+
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
5659
```
5760

5861
```
59-
databusclient deploy --versionid https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
62+
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
6063
```
6164

6265
A few more notes for CLI usage:
@@ -65,6 +68,93 @@ A few more notes for CLI usage:
6568
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
6669
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`
6770

71+
### Download command
72+
```
73+
databusclient download --help
74+
```
75+
76+
```
77+
Usage: databusclient download [OPTIONS] DATABUSURIS...
78+
79+
Arguments:
80+
DATABUSURIS... databus uris to download from https://databus.dbpedia.org,
81+
or a query statement that returns databus uris from https://databus.dbpedia.org/sparql
82+
to be downloaded [required]
83+
84+
Download datasets from databus, optionally using vault access if vault
85+
options are provided.
86+
87+
Options:
88+
--localdir TEXT Local databus folder (if not given, databus folder
89+
structure is created in current working directory)
90+
--databus TEXT Databus URL (if not given, inferred from databusuri, e.g.
91+
https://databus.dbpedia.org/sparql)
92+
--token TEXT Path to Vault refresh token file
93+
--authurl TEXT Keycloak token endpoint URL [default:
94+
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
95+
connect/token]
96+
--clientid TEXT Client ID for token exchange [default: vault-token-
97+
exchange]
98+
--help Show this message and exit. Show this message and exit.
99+
```
100+
101+
Examples of using download command
102+
103+
**File**: download of a single file
104+
```
105+
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
106+
```
107+
108+
**Version**: download of all files of a specific version
109+
```
110+
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
111+
```
112+
113+
**Artifact**: download of all files with latest version of an artifact
114+
```
115+
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
116+
```
117+
118+
**Group**: download of all files with lates version of all artifacts of a group
119+
```
120+
databusclient download https://databus.dbpedia.org/dbpedia/mappings
121+
```
122+
123+
If no `--localdir` is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the databus structure, i.e. `./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/`.
124+
125+
**Collection**: download of all files within a collection
126+
```
127+
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
128+
```
129+
130+
**Query**: download of all files returned by a query (sparql endpoint must be provided with `--databus`)
131+
```
132+
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
133+
```
134+
135+
#### Authentication with vault
136+
137+
For downloading files from the vault, you need to provide a vault token. See [getting-the-access-refresh-token](https://github.com/dbpedia/databus-vault-access?tab=readme-ov-file#step-1-getting-the-access-refresh-token) for details. You can come back here once you have a `vault-token.dat` file. To use it, just provide the path to the file with `--token /path/to/vault-token.dat`.
138+
139+
Example:
140+
```
141+
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23 --token vault-token.dat
142+
```
143+
144+
If vault authentication is required for downloading a file, the client will use the token. If no vault authentication is required, the token will not be used.
145+
146+
#### Usage of docker image
147+
148+
A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). You can use it like this:
149+
150+
```
151+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
152+
```
153+
If using vault authentication, make sure the token file is available in the container, e.g. by placing it in the current working directory.
154+
```
155+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat
156+
```
157+
68158
## Module Usage
69159

70160
### Step 1: Create lists of distributions for the dataset

databusclient/cli.py

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,61 @@
11
#!/usr/bin/env python3
2-
import typer
2+
import click
33
from typing import List
44
from databusclient import client
55

6-
app = typer.Typer()
6+
7+
@click.group()
8+
def app():
9+
"""Databus Client CLI"""
10+
pass
711

812

913
@app.command()
10-
def deploy(
11-
version_id: str = typer.Option(
12-
...,
13-
help="target databus version/dataset identifier of the form "
14-
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
15-
),
16-
title: str = typer.Option(..., help="dataset title"),
17-
abstract: str = typer.Option(..., help="dataset abstract max 200 chars"),
18-
description: str = typer.Option(..., help="dataset description"),
19-
license_uri: str = typer.Option(..., help="license (see dalicc.net)"),
20-
apikey: str = typer.Option(..., help="apikey"),
21-
distributions: List[str] = typer.Argument(
22-
...,
23-
help="distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the "
24-
"download URL and CV the "
25-
"key=value pairs (_ separated) content variants of a distribution. filext and compression are optional "
26-
"and if left out inferred from the path. If the sha256sum:contentlength part is left out it will be "
27-
"calcuted by downloading the file.",
28-
),
29-
):
30-
typer.echo(version_id)
31-
dataid = client.create_dataset(
32-
version_id, title, abstract, description, license_uri, distributions
33-
)
14+
@click.option(
15+
"--version-id", "version_id",
16+
required=True,
17+
help="Target databus version/dataset identifier of the form "
18+
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
19+
)
20+
@click.option("--title", required=True, help="Dataset title")
21+
@click.option("--abstract", required=True, help="Dataset abstract max 200 chars")
22+
@click.option("--description", required=True, help="Dataset description")
23+
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
24+
@click.option("--apikey", required=True, help="API key")
25+
@click.argument(
26+
"distributions",
27+
nargs=-1,
28+
required=True,
29+
)
30+
def deploy(version_id, title, abstract, description, license_url, apikey, distributions: List[str]):
31+
"""
32+
Deploy a dataset version with the provided metadata and distributions.
33+
"""
34+
click.echo(f"Deploying dataset version: {version_id}")
35+
dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
3436
client.deploy(dataid=dataid, api_key=apikey)
3537

3638

3739
@app.command()
38-
def download(
39-
localDir: str = typer.Option(..., help="local databus folder"),
40-
databus: str = typer.Option(..., help="databus URL"),
41-
databusuris: List[str] = typer.Argument(...,help="any kind of these: databus identifier, databus collection identifier, query file")
42-
):
43-
client.download(localDir=localDir,endpoint=databus,databusURIs=databusuris)
40+
@click.argument("databusuris", nargs=-1, required=True)
41+
@click.option("--localdir", help="Local databus folder (if not given, databus folder structure is created in current working directory)")
42+
@click.option("--databus", help="Databus URL (if not given, inferred from databusuri, e.g. https://databus.dbpedia.org/sparql)")
43+
@click.option("--token", help="Path to Vault refresh token file")
44+
@click.option("--authurl", default="https://auth.dbpedia.org/realms/dbpedia/protocol/openid-connect/token", show_default=True, help="Keycloak token endpoint URL")
45+
@click.option("--clientid", default="vault-token-exchange", show_default=True, help="Client ID for token exchange")
46+
def download(databusuris: List[str], localdir, databus, token, authurl, clientid):
47+
"""
48+
Download datasets from databus, optionally using vault access if vault options are provided.
49+
"""
50+
client.download(
51+
localDir=localdir,
52+
endpoint=databus,
53+
databusURIs=databusuris,
54+
token=token,
55+
auth_url=authurl,
56+
client_id=clientid,
57+
)
58+
59+
60+
if __name__ == "__main__":
61+
app()

0 commit comments

Comments
 (0)