Skip to content

Commit cb91775

Browse files
committed
feat: cli delete to delete datasets from databus
1 parent 833872d commit cb91775

File tree

5 files changed

+338
-25
lines changed

5 files changed

+338
-25
lines changed

README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Command-line and Python client for downloading and deploying datasets on DBpedia
1717
- [CLI Usage](#cli-usage)
1818
- [Download](#cli-download)
1919
- [Deploy](#cli-deploy)
20+
- [Delete](#cli-delete)
2021
- [Module Usage](#module-usage)
2122
- [Deploy](#module-deploy)
2223

@@ -396,6 +397,84 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
396397
./data_folder
397398
```
398399
400+
<a id="module-delete"></a>
401+
### Delete
402+
403+
With the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.
404+
405+
**Note**: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.
406+
407+
```bash
408+
# Python
409+
databusclient delete [OPTIONS] DATABUSURIS...
410+
# Docker
411+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...
412+
```
413+
414+
**Help and further information on delete command:**
415+
```bash
416+
# Python
417+
databusclient delete --help
418+
# Docker
419+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help
420+
421+
# Output:
422+
Usage: databusclient delete [OPTIONS] DATABUSURIS...
423+
424+
Delete a dataset from the databus.
425+
426+
Delete a group, artifact, or version identified by the given databus URI.
427+
Will recursively delete all data associated with the dataset.
428+
429+
Options:
430+
--databus-key TEXT Databus API key to access protected databus [required]
431+
--dry-run Perform a dry run without actual deletion
432+
--force Force deletion without confirmation prompt
433+
--help Show this message and exit.
434+
```
435+
436+
To authenticate the delete request, you need to provide an API key with `--databus-key YOUR_API_KEY`.
437+
438+
If you want to perform a dry run without actual deletion, use the `--dry-run` option. This will show you what would be deleted without making any changes.
439+
440+
As securety measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the `--force` option.
441+
442+
**Example of using the delete command**
443+
444+
### Examples of using the download command
445+
446+
**Delete Version**: delete a specific version
447+
```bash
448+
# Python
449+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
450+
# Docker
451+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
452+
```
453+
454+
**Delete Artifact**: delete an artifact and all its versions
455+
```bash
456+
# Python
457+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
458+
# Docker
459+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
460+
```
461+
462+
**Delete Group**: delete a group and all its artifacts and versions
463+
```bash
464+
# Python
465+
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
466+
# Docker
467+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
468+
```
469+
470+
**Delete Collection**: delete collection
471+
```bash
472+
# Python
473+
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
474+
# Docker
475+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
476+
```
477+
399478
## Module Usage
400479
401480
<a id="module-deploy"></a>

databusclient/api/delete.py

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
import json
2+
import requests
3+
from typing import List
4+
5+
from databusclient.api.utils import get_databus_id_parts_from_uri, get_json_ld_from_databus
6+
7+
def _confirm_delete(databusURI: str) -> str:
8+
"""
9+
Confirm deletion of a Databus resource with the user.
10+
11+
Parameters:
12+
- databusURI: The full databus URI of the resource to delete
13+
14+
Returns:
15+
- "confirm" if the user confirms deletion
16+
- "skip" if the user chooses to skip deletion
17+
- "cancel" if the user chooses to cancel the entire deletion process
18+
"""
19+
print(f"Are you sure you want to delete: {databusURI}?")
20+
print("\nThis action is irreversible and will permanently remove the resource and all its data.")
21+
while True:
22+
choice = input("Type 'yes'/'y' to confirm, 'skip'/'s' to skip this resource, or 'cancel'/'c' to abort: ").strip().lower()
23+
if choice == "yes" or choice == "y":
24+
return "confirm"
25+
elif choice == "skip" or choice == "s":
26+
return "skip"
27+
elif choice == "cancel" or choice == "c":
28+
return "cancel"
29+
else:
30+
print("Invalid input. Please type 'yes'/'y', 'skip'/'s', or 'cancel'/'c'.")
31+
32+
33+
def _delete_resource(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
34+
"""
35+
Delete a single Databus resource (version, artifact, group).
36+
37+
Equivalent to:
38+
curl -X DELETE "<databusURI>" -H "accept: */*" -H "X-API-KEY: <key>"
39+
40+
Parameters:
41+
- databusURI: The full databus URI of the resource to delete
42+
- databus_key: Databus API key to authenticate the deletion request
43+
- dry_run: If True, do not perform the deletion but only print what would be deleted
44+
- force: If True, skip confirmation prompt and proceed with deletion
45+
"""
46+
47+
# Confirm the deletion request, skip the request or cancel deletion process
48+
if not (dry_run or force):
49+
action = _confirm_delete(databusURI)
50+
if action == "skip":
51+
print(f"Skipping: {databusURI}\n")
52+
return
53+
if action == "cancel":
54+
raise KeyboardInterrupt("Deletion cancelled by user.")
55+
56+
if databus_key is None:
57+
raise ValueError("Databus API key must be provided for deletion")
58+
59+
headers = {
60+
"accept": "*/*",
61+
"X-API-KEY": databus_key
62+
}
63+
64+
if dry_run:
65+
print(f"[DRY RUN] Would delete: {databusURI}")
66+
return
67+
68+
response = requests.delete(databusURI, headers=headers)
69+
70+
if response.status_code in (200, 204):
71+
print(f"Successfully deleted: {databusURI}")
72+
else:
73+
raise Exception(f"Failed to delete {databusURI}: {response.status_code} - {response.text}")
74+
75+
76+
def _delete_list(databusURIs: List[str], databus_key: str, dry_run: bool = False, force: bool = False):
77+
"""
78+
Delete a list of Databus resources.
79+
80+
Parameters:
81+
- databusURIs: List of full databus URIs of the resources to delete
82+
- databus_key: Databus API key to authenticate the deletion requests
83+
"""
84+
for databusURI in databusURIs:
85+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
86+
87+
88+
def _delete_artifact(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
89+
"""
90+
Delete an artifact and all its versions.
91+
92+
This function first retrieves all versions of the artifact and then deletes them one by one.
93+
Finally, it deletes the artifact itself.
94+
95+
Parameters:
96+
- databusURI: The full databus URI of the artifact to delete
97+
- databus_key: Databus API key to authenticate the deletion requests
98+
- dry_run: If True, do not perform the deletion but only print what would be deleted
99+
"""
100+
artifact_body = get_json_ld_from_databus(databusURI, databus_key)
101+
102+
json_dict = json.loads(artifact_body)
103+
versions = json_dict.get("databus:hasVersion")
104+
105+
# Single version case {}
106+
if isinstance(versions, dict):
107+
versions = [versions]
108+
# Multiple versions case [{}, {}]
109+
110+
version_uris = [v["@id"] for v in versions if "@id" in v]
111+
if not version_uris:
112+
raise ValueError("No versions found in artifact JSON-LD")
113+
114+
# Delete all versions
115+
_delete_list(version_uris, databus_key, dry_run=dry_run)
116+
117+
# Finally, delete the artifact itself
118+
_delete_resource(databusURI, databus_key, dry_run=dry_run)
119+
120+
121+
def _delete_group(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
122+
"""
123+
Delete a group and all its artifacts and versions.
124+
125+
This function first retrieves all artifacts of the group, then deletes each artifact (which in turn deletes its versions).
126+
Finally, it deletes the group itself.
127+
128+
Parameters:
129+
- databusURI: The full databus URI of the group to delete
130+
- databus_key: Databus API key to authenticate the deletion requests
131+
- dry_run: If True, do not perform the deletion but only print what would be deleted
132+
"""
133+
group_body = get_json_ld_from_databus(databusURI, databus_key)
134+
135+
json_dict = json.loads(group_body)
136+
artifacts = json_dict.get("databus:hasArtifact", [])
137+
138+
artifact_uris = []
139+
for item in artifacts:
140+
uri = item.get("@id")
141+
if not uri:
142+
continue
143+
_, _, _, _, version, _ = get_databus_id_parts_from_uri(uri)
144+
if version is None:
145+
artifact_uris.append(uri)
146+
147+
# Delete all artifacts (which deletes their versions)
148+
for artifact_uri in artifact_uris:
149+
_delete_artifact(artifact_uri, databus_key, dry_run=dry_run, force=force)
150+
151+
# Finally, delete the group itself
152+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
153+
154+
# TODO: add to README.md
155+
def delete(databusURIs: List[str], databus_key: str, dry_run: bool, force: bool):
156+
"""
157+
Delete a dataset from the databus.
158+
159+
Delete a group, artifact, or version identified by the given databus URI.
160+
Will recursively delete all data associated with the dataset.
161+
162+
Parameters:
163+
- databusURIs: List of full databus URIs of the resources to delete
164+
- databus_key: Databus API key to authenticate the deletion requests
165+
- dry_run: If True, will only print what would be deleted without performing actual deletions
166+
- force: If True, skip confirmation prompt and proceed with deletion
167+
"""
168+
169+
for databusURI in databusURIs:
170+
host, account, group, artifact, version, file = get_databus_id_parts_from_uri(databusURI)
171+
172+
if group == "collections" and artifact is not None:
173+
print(f"Deleting collection: {databusURI}")
174+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
175+
elif file is not None:
176+
print(f"Deleting file is not supported via API: {databusURI}")
177+
continue # skip file deletions
178+
elif version is not None:
179+
print(f"Deleting version: {databusURI}")
180+
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
181+
elif artifact is not None:
182+
print(f"Deleting artifact and all its versions: {databusURI}")
183+
_delete_artifact(databusURI, databus_key, dry_run=dry_run, force=force)
184+
elif group is not None and group != "collections":
185+
print(f"Deleting group and all its artifacts and versions: {databusURI}")
186+
_delete_group(databusURI, databus_key, dry_run=dry_run, force=force)
187+
else:
188+
print(f"Deleting ${databusURI} is not supported.")

databusclient/api/utils.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import requests
2+
from typing import Tuple, Optional
3+
4+
def get_databus_id_parts_from_uri(uri: str) -> Tuple[Optional[str], Optional[str], Optional[str], Optional[str], Optional[str], Optional[str]]:
5+
"""
6+
Extract databus ID parts from a given databus URI.
7+
8+
Parameters:
9+
- uri: The full databus URI
10+
11+
Returns:
12+
A tuple containing (host, accountId, groupId, artifactId, versionId, fileId).
13+
Each element is a string or None if not present.
14+
"""
15+
uri = uri.removeprefix("https://").removeprefix("http://")
16+
parts = uri.strip("/").split("/")
17+
parts += [None] * (6 - len(parts)) # pad with None if less than 6 parts
18+
return tuple(parts[:6]) # return only the first 6 parts
19+
20+
def get_json_ld_from_databus(uri: str, databus_key: str = None) -> str:
21+
"""
22+
Retrieve JSON-LD representation of a databus resource.
23+
24+
Parameters:
25+
- uri: The full databus URI
26+
- databus_key: Optional Databus API key for authentication on protected resources
27+
28+
Returns:
29+
JSON-LD string representation of the databus resource.
30+
"""
31+
headers = {"Accept": "application/ld+json"}
32+
if databus_key is not None:
33+
headers["X-API-KEY"] = databus_key
34+
response = requests.get(uri, headers=headers)
35+
response.raise_for_status()
36+
37+
return response.text

databusclient/cli.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from databusclient import client
88

99
from databusclient.rclone_wrapper import upload
10+
from databusclient.api.delete import delete as api_delete
1011

1112
@click.group()
1213
def app():
@@ -112,6 +113,26 @@ def download(databusuris: List[str], localdir, databus, vault_token, databus_key
112113
client_id=clientid,
113114
)
114115

116+
@app.command()
117+
@click.argument("databusuris", nargs=-1, required=True)
118+
@click.option("--databus-key", help="Databus API key to access protected databus", required=True)
119+
@click.option("--dry-run", is_flag=True, help="Perform a dry run without actual deletion")
120+
@click.option("--force", is_flag=True, help="Force deletion without confirmation prompt")
121+
def delete(databusuris: List[str], databus_key: str, dry_run: bool, force: bool):
122+
"""
123+
Delete a dataset from the databus.
124+
125+
Delete a group, artifact, or version identified by the given databus URI.
126+
Will recursively delete all data associated with the dataset.
127+
"""
128+
129+
api_delete(
130+
databusURIs=databusuris,
131+
databus_key=databus_key,
132+
dry_run=dry_run,
133+
force=force,
134+
)
135+
115136

116137
if __name__ == "__main__":
117138
app()

0 commit comments

Comments
 (0)