Skip to content

Commit ace93c5

Browse files
authored
cli: delete datasets (#38)
* feat: databus api key for downloading * refactored README.md * feat: cli delete to delete datasets from databus
1 parent b70f59a commit ace93c5

File tree

5 files changed

+357
-41
lines changed

5 files changed

+357
-41
lines changed

README.md

Lines changed: 81 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Command-line and Python client for downloading and deploying datasets on DBpedia
1717
- [CLI Usage](#cli-usage)
1818
- [Download](#cli-download)
1919
- [Deploy](#cli-deploy)
20+
- [Delete](#cli-delete)
2021
- [Module Usage](#module-usage)
2122
- [Deploy](#module-deploy)
2223

@@ -66,8 +67,8 @@ Commands to download the [DBpedia Knowledge Graphs](#dbpedia-knowledge-graphs) g
6667

6768
To download BUSL 1.1 licensed datasets, you need to register and get an access token.
6869

69-
1. If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
70-
2. Log in at https://account.dbpedia.org and create your token.
70+
1. If you do not have a DBpedia Account yet (Forum/Databus), please register at [https://account.dbpedia.org](https://account.dbpedia.org)
71+
2. Log in at [https://account.dbpedia.org](https://account.dbpedia.org) and create your token.
7172
3. Save the token to a file, e.g. `vault-token.dat`.
7273

7374
### DBpedia Knowledge Graphs
@@ -181,7 +182,7 @@ Options:
181182
--databus TEXT Databus URL (if not given, inferred from databusuri,
182183
e.g. https://databus.dbpedia.org/sparql)
183184
--vault-token TEXT Path to Vault refresh token file
184-
--databus-key TEXT Databus API key to donwload from protected databus
185+
--databus-key TEXT Databus API key to download from protected databus
185186
--authurl TEXT Keycloak token endpoint URL [default:
186187
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
187188
connect/token]
@@ -190,7 +191,7 @@ Options:
190191
--help Show this message and exit.
191192
```
192193
193-
### Examples of using the download command
194+
#### Examples of using the download command
194195
195196
**Download File**: download of a single file
196197
```bash
@@ -396,6 +397,82 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
396397
./data_folder
397398
```
398399
400+
<a id="cli-delete"></a>
401+
### Delete
402+
403+
With the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.
404+
405+
**Note**: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.
406+
407+
```bash
408+
# Python
409+
databusclient delete [OPTIONS] DATABUSURIS...
410+
# Docker
411+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...
412+
```
413+
414+
**Help and further information on delete command:**
415+
```bash
416+
# Python
417+
databusclient delete --help
418+
# Docker
419+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help
420+
421+
# Output:
422+
Usage: databusclient delete [OPTIONS] DATABUSURIS...
423+
424+
Delete a dataset from the databus.
425+
426+
Delete a group, artifact, or version identified by the given databus URI.
427+
Will recursively delete all data associated with the dataset.
428+
429+
Options:
430+
--databus-key TEXT Databus API key to access protected databus [required]
431+
--dry-run Perform a dry run without actual deletion
432+
--force Force deletion without confirmation prompt
433+
--help Show this message and exit.
434+
```
435+
436+
To authenticate the delete request, you need to provide an API key with `--databus-key YOUR_API_KEY`.
437+
438+
If you want to perform a dry run without actual deletion, use the `--dry-run` option. This will show you what would be deleted without making any changes.
439+
440+
As security measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the `--force` option.
441+
442+
#### Examples of using the delete command
443+
444+
**Delete Version**: delete a specific version
445+
```bash
446+
# Python
447+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
448+
# Docker
449+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
450+
```
451+
452+
**Delete Artifact**: delete an artifact and all its versions
453+
```bash
454+
# Python
455+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
456+
# Docker
457+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
458+
```
459+
460+
**Delete Group**: delete a group and all its artifacts and versions
461+
```bash
462+
# Python
463+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
464+
# Docker
465+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
466+
```
467+
468+
**Delete Collection**: delete collection
469+
```bash
470+
# Python
471+
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
472+
# Docker
473+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
474+
```
475+
399476
## Module Usage
400477
401478
<a id="module-deploy"></a>

databusclient/api/delete.py

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
import json
2+
import requests
3+
from typing import List
4+
5+
from databusclient.api.utils import get_databus_id_parts_from_uri, get_json_ld_from_databus
6+
7+
def _confirm_delete(databusURI: str) -> str:
8+
"""
9+
Confirm deletion of a Databus resource with the user.
10+
11+
Parameters:
12+
- databusURI: The full databus URI of the resource to delete
13+
14+
Returns:
15+
- "confirm" if the user confirms deletion
16+
- "skip" if the user chooses to skip deletion
17+
- "cancel" if the user chooses to cancel the entire deletion process
18+
"""
19+
print(f"Are you sure you want to delete: {databusURI}?")
20+
print("\nThis action is irreversible and will permanently remove the resource and all its data.")
21+
while True:
22+
choice = input("Type 'yes'/'y' to confirm, 'skip'/'s' to skip this resource, or 'cancel'/'c' to abort: ").strip().lower()
23+
if choice in ("yes", "y"):
24+
return "confirm"
25+
elif choice in ("skip", "s"):
26+
return "skip"
27+
elif choice in ("cancel", "c"):
28+
return "cancel"
29+
else:
30+
print("Invalid input. Please type 'yes'/'y', 'skip'/'s', or 'cancel'/'c'.")
31+
32+
33+
def _delete_resource(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
34+
"""
35+
Delete a single Databus resource (version, artifact, group).
36+
37+
Equivalent to:
38+
curl -X DELETE "<databusURI>" -H "accept: */*" -H "X-API-KEY: <key>"
39+
40+
Parameters:
41+
- databusURI: The full databus URI of the resource to delete
42+
- databus_key: Databus API key to authenticate the deletion request
43+
- dry_run: If True, do not perform the deletion but only print what would be deleted
44+
- force: If True, skip confirmation prompt and proceed with deletion
45+
"""
46+
47+
# Confirm the deletion request, skip the request or cancel deletion process
48+
if not (dry_run or force):
49+
action = _confirm_delete(databusURI)
50+
if action == "skip":
51+
print(f"Skipping: {databusURI}\n")
52+
return
53+
if action == "cancel":
54+
raise KeyboardInterrupt("Deletion cancelled by user.")
55+
56+
if databus_key is None:
57+
raise ValueError("Databus API key must be provided for deletion")
58+
59+
headers = {
60+
"accept": "*/*",
61+
"X-API-KEY": databus_key
62+
}
63+
64+
if dry_run:
65+
print(f"[DRY RUN] Would delete: {databusURI}")
66+
return
67+
68+
response = requests.delete(databusURI, headers=headers, timeout=30)
69+
70+
if response.status_code in (200, 204):
71+
print(f"Successfully deleted: {databusURI}")
72+
else:
73+
raise Exception(f"Failed to delete {databusURI}: {response.status_code} - {response.text}")
74+
75+
76+
def _delete_list(databusURIs: List[str], databus_key: str, dry_run: bool = False, force: bool = False):
77+
"""
78+
Delete a list of Databus resources.
79+
80+
Parameters:
81+
- databusURIs: List of full databus URIs of the resources to delete
82+
- databus_key: Databus API key to authenticate the deletion requests
83+
"""
84+
for databusURI in databusURIs:
85+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
86+
87+
88+
def _delete_artifact(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
89+
"""
90+
Delete an artifact and all its versions.
91+
92+
This function first retrieves all versions of the artifact and then deletes them one by one.
93+
Finally, it deletes the artifact itself.
94+
95+
Parameters:
96+
- databusURI: The full databus URI of the artifact to delete
97+
- databus_key: Databus API key to authenticate the deletion requests
98+
- dry_run: If True, do not perform the deletion but only print what would be deleted
99+
"""
100+
artifact_body = get_json_ld_from_databus(databusURI, databus_key)
101+
102+
json_dict = json.loads(artifact_body)
103+
versions = json_dict.get("databus:hasVersion")
104+
105+
# Single version case {}
106+
if isinstance(versions, dict):
107+
versions = [versions]
108+
# Multiple versions case [{}, {}]
109+
110+
# If versions is None or empty skip
111+
if versions is None:
112+
print(f"No versions found for artifact: {databusURI}")
113+
else:
114+
version_uris = [v["@id"] for v in versions if "@id" in v]
115+
if not version_uris:
116+
print(f"No version URIs found in artifact JSON-LD for: {databusURI}")
117+
else:
118+
# Delete all versions
119+
_delete_list(version_uris, databus_key, dry_run=dry_run, force=force)
120+
121+
# Finally, delete the artifact itself
122+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
123+
124+
def _delete_group(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
125+
"""
126+
Delete a group and all its artifacts and versions.
127+
128+
This function first retrieves all artifacts of the group, then deletes each artifact (which in turn deletes its versions).
129+
Finally, it deletes the group itself.
130+
131+
Parameters:
132+
- databusURI: The full databus URI of the group to delete
133+
- databus_key: Databus API key to authenticate the deletion requests
134+
- dry_run: If True, do not perform the deletion but only print what would be deleted
135+
"""
136+
group_body = get_json_ld_from_databus(databusURI, databus_key)
137+
138+
json_dict = json.loads(group_body)
139+
artifacts = json_dict.get("databus:hasArtifact", [])
140+
141+
artifact_uris = []
142+
for item in artifacts:
143+
uri = item.get("@id")
144+
if not uri:
145+
continue
146+
_, _, _, _, version, _ = get_databus_id_parts_from_uri(uri)
147+
if version is None:
148+
artifact_uris.append(uri)
149+
150+
# Delete all artifacts (which deletes their versions)
151+
for artifact_uri in artifact_uris:
152+
_delete_artifact(artifact_uri, databus_key, dry_run=dry_run, force=force)
153+
154+
# Finally, delete the group itself
155+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
156+
157+
def delete(databusURIs: List[str], databus_key: str, dry_run: bool, force: bool):
158+
"""
159+
Delete a dataset from the databus.
160+
161+
Delete a group, artifact, or version identified by the given databus URI.
162+
Will recursively delete all data associated with the dataset.
163+
164+
Parameters:
165+
- databusURIs: List of full databus URIs of the resources to delete
166+
- databus_key: Databus API key to authenticate the deletion requests
167+
- dry_run: If True, will only print what would be deleted without performing actual deletions
168+
- force: If True, skip confirmation prompt and proceed with deletion
169+
"""
170+
171+
for databusURI in databusURIs:
172+
_host, _account, group, artifact, version, file = get_databus_id_parts_from_uri(databusURI)
173+
174+
if group == "collections" and artifact is not None:
175+
print(f"Deleting collection: {databusURI}")
176+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
177+
elif file is not None:
178+
print(f"Deleting file is not supported via API: {databusURI}")
179+
continue # skip file deletions
180+
elif version is not None:
181+
print(f"Deleting version: {databusURI}")
182+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
183+
elif artifact is not None:
184+
print(f"Deleting artifact and all its versions: {databusURI}")
185+
_delete_artifact(databusURI, databus_key, dry_run=dry_run, force=force)
186+
elif group is not None and group != "collections":
187+
print(f"Deleting group and all its artifacts and versions: {databusURI}")
188+
_delete_group(databusURI, databus_key, dry_run=dry_run, force=force)
189+
else:
190+
print(f"Deleting {databusURI} is not supported.")

databusclient/api/utils.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import requests
2+
from typing import Tuple, Optional
3+
4+
def get_databus_id_parts_from_uri(uri: str) -> Tuple[Optional[str], Optional[str], Optional[str], Optional[str], Optional[str], Optional[str]]:
5+
"""
6+
Extract databus ID parts from a given databus URI.
7+
8+
Parameters:
9+
- uri: The full databus URI
10+
11+
Returns:
12+
A tuple containing (host, accountId, groupId, artifactId, versionId, fileId).
13+
Each element is a string or None if not present.
14+
"""
15+
uri = uri.removeprefix("https://").removeprefix("http://")
16+
parts = uri.strip("/").split("/")
17+
parts += [None] * (6 - len(parts)) # pad with None if less than 6 parts
18+
return tuple(parts[:6]) # return only the first 6 parts
19+
20+
def get_json_ld_from_databus(uri: str, databus_key: str | None = None) -> str:
21+
"""
22+
Retrieve JSON-LD representation of a databus resource.
23+
24+
Parameters:
25+
- uri: The full databus URI
26+
- databus_key: Optional Databus API key for authentication on protected resources
27+
28+
Returns:
29+
JSON-LD string representation of the databus resource.
30+
"""
31+
headers = {"Accept": "application/ld+json"}
32+
if databus_key is not None:
33+
headers["X-API-KEY"] = databus_key
34+
response = requests.get(uri, headers=headers, timeout=30)
35+
response.raise_for_status()
36+
37+
return response.text

databusclient/cli.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from databusclient import client
88

99
from databusclient.rclone_wrapper import upload
10+
from databusclient.api.delete import delete as api_delete
1011

1112
@click.group()
1213
def app():
@@ -95,7 +96,7 @@ def deploy(version_id, title, abstract, description, license_url, apikey,
9596
@click.option("--localdir", help="Local databus folder (if not given, databus folder structure is created in current working directory)")
9697
@click.option("--databus", help="Databus URL (if not given, inferred from databusuri, e.g. https://databus.dbpedia.org/sparql)")
9798
@click.option("--vault-token", help="Path to Vault refresh token file")
98-
@click.option("--databus-key", help="Databus API key to donwload from protected databus")
99+
@click.option("--databus-key", help="Databus API key to download from protected databus")
99100
@click.option("--authurl", default="https://auth.dbpedia.org/realms/dbpedia/protocol/openid-connect/token", show_default=True, help="Keycloak token endpoint URL")
100101
@click.option("--clientid", default="vault-token-exchange", show_default=True, help="Client ID for token exchange")
101102
def download(databusuris: List[str], localdir, databus, vault_token, databus_key, authurl, clientid):
@@ -112,6 +113,26 @@ def download(databusuris: List[str], localdir, databus, vault_token, databus_key
112113
client_id=clientid,
113114
)
114115

116+
@app.command()
117+
@click.argument("databusuris", nargs=-1, required=True)
118+
@click.option("--databus-key", help="Databus API key to access protected databus", required=True)
119+
@click.option("--dry-run", is_flag=True, help="Perform a dry run without actual deletion")
120+
@click.option("--force", is_flag=True, help="Force deletion without confirmation prompt")
121+
def delete(databusuris: List[str], databus_key: str, dry_run: bool, force: bool):
122+
"""
123+
Delete a dataset from the databus.
124+
125+
Delete a group, artifact, or version identified by the given databus URI.
126+
Will recursively delete all data associated with the dataset.
127+
"""
128+
129+
api_delete(
130+
databusURIs=databusuris,
131+
databus_key=databus_key,
132+
dry_run=dry_run,
133+
force=force,
134+
)
135+
115136

116137
if __name__ == "__main__":
117138
app()

0 commit comments

Comments
 (0)