You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docker/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,8 +25,8 @@ DataHub Docker Images:
25
25
26
26
Do not use `latest` or `debug` tags for any of the image as those are not supported and present only due to legacy reasons. Please use `head` or tags specific for versions like `v0.8.40`. For production we recommend using version specific tags not `head`.
27
27
28
-
*[linkedin/datahub-ingestion](https://hub.docker.com/r/linkedin/datahub-ingestion/) - This contains the Python CLI. If you are looking for docker image for every minor CLI release you can find them under [acryldata/datahub-ingestion](https://hub.docker.com/r/acryldata/datahub-ingestion/).
Copy file name to clipboardExpand all lines: docs/cli.md
+5-10Lines changed: 5 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,14 +138,9 @@ The `check` command allows you to check if all plugins are loaded correctly as w
138
138
139
139
### delete
140
140
141
-
The `delete` command allows you to delete metadata from DataHub. Read this [guide](./how/delete-metadata.md) to understand how you can delete metadata from DataHub.
142
-
:::info
143
-
Deleting metadata using DataHub's CLI and GraphQL API is a simple, systems-level action. If you attempt to delete an Entity with children, such as a Container, it will not automatically delete the children, you will instead need to delete each child by URN in addition to deleting the parent.
144
-
:::
141
+
The `delete` command allows you to delete metadata from DataHub.
If you don't want to install locally, you can alternatively run metadata ingestion within a Docker container.
541
-
We have prebuilt images available on [Docker hub](https://hub.docker.com/r/linkedin/datahub-ingestion). All plugins will be installed and enabled automatically.
536
+
We have prebuilt images available on [Docker hub](https://hub.docker.com/r/acryldata/datahub-ingestion). All plugins will be installed and enabled automatically.
542
537
543
538
You can use the `datahub-ingestion` docker image as explained in [Docker Images](../docker/README.md). In case you are using Kubernetes you can start a pod with the `datahub-ingestion` docker image, log onto a shell on the pod and you should have the access to datahub CLI in your kubernetes cluster.
To follow this guide, you'll need the [DataHub CLI](../cli.md).
5
+
:::
6
+
3
7
There are a two ways to delete metadata from DataHub:
4
8
5
-
1. Delete metadata attached to entities by providing a specific urn or filters that identify a set of entities
6
-
2. Delete metadata created by a single ingestion run
9
+
1. Delete metadata attached to entities by providing a specific urn or filters that identify a set of urns (delete CLI).
10
+
2. Delete metadata created by a single ingestion run (rollback).
7
11
8
-
To follow this guide you need to use [DataHub CLI](../cli.md).
12
+
:::caution Be careful when deleting metadata
9
13
10
-
Read on to find out how to perform these kinds of deletes.
14
+
- Always use `--dry-run` to test your delete command before executing it.
15
+
- Prefer reversible soft deletes (`--soft`) over irreversible hard deletes (`--hard`).
11
16
12
-
_Note: Deleting metadata should only be done with care. Always use `--dry-run` to understand what will be deleted before proceeding. Prefer soft-deletes (`--soft`) unless you really want to nuke metadata rows. Hard deletes will actually delete rows in the primary store and recovering them will require using backups of the primary metadata store. Make sure you understand the implications of issuing soft-deletes versus hard-deletes before proceeding._
17
+
:::
13
18
19
+
## Delete CLI Usage
14
20
15
21
:::info
16
-
Deleting metadata using DataHub's CLI and GraphQL API is a simple, systems-level action. If you attempt to delete an Entity with children, such as a Domain, it will not delete those children, you will instead need to delete each child by URN in addition to deleting the parent.
22
+
23
+
Deleting metadata using DataHub's CLI is a simple, systems-level action. If you attempt to delete an entity with children, such as a container, it will not delete those children. Instead, you will need to delete each child by URN in addition to deleting the parent.
24
+
17
25
:::
18
-
## Delete By Urn
19
26
20
-
To delete all the data related to a single entity, run
27
+
All the commands below support the following options:
21
28
22
-
### Soft Delete (the default)
29
+
-`-n/--dry-run`: Execute a dry run instead of the actual delete.
30
+
-`--force`: Skip confirmation prompts.
23
31
24
-
This sets the `Status` aspect of the entity to `Removed`, which hides the entity and all its aspects from being returned by the UI.
25
-
```
32
+
### Selecting entities to delete
33
+
34
+
You can either provide a single urn to delete, or use filters to select a set of entities to delete.
35
+
36
+
```shell
37
+
# Soft delete a single urn.
26
38
datahub delete --urn "<my urn>"
39
+
40
+
# Soft delete using a filter.
41
+
datahub delete --platform snowflake
42
+
43
+
# Filters can be combined, which will select entities that match all filters.
When performing hard deletes, you can optionally add the `--only-soft-deleted` flag to only hard delete entities that were previously soft deleted.
49
+
50
+
### Performing the delete
51
+
52
+
#### Soft delete an entity (default)
53
+
54
+
By default, the delete command will perform a soft delete.
34
55
35
-
This physically deletes all rows for all aspects of the entity. This action cannot be undone, so execute this only after you are sure you want to delete all data associated with this entity.
56
+
This will set the `status` aspect's `removed` field to `true`, which will hide the entity from the UI. However, you'll still be able to view the entity's metadata in the UI with a direct link.
36
57
58
+
```shell
59
+
# The `--soft` flag is redundant since it's the default.
60
+
datahub delete --urn "<urn>" --soft
61
+
# or using a filter
62
+
datahub delete --platform snowflake --soft
37
63
```
64
+
65
+
#### Hard delete an entity
66
+
67
+
This will physically delete all rows for all aspects of the entity. This action cannot be undone, so execute this only after you are sure you want to delete all data associated with this entity.
68
+
69
+
```shell
38
70
datahub delete --urn "<my urn>" --hard
71
+
# or using a filter
72
+
datahub delete --platform snowflake --hard
39
73
```
40
74
41
-
As of datahub v0.8.35 doing a hard delete by urn will also provide you with a way to remove references to the urn being deleted across the metadata graph. This is important to use if you don't want to have ghost references in your metadata model and want to save space in the graph database.
42
-
For now, this behaviour must be opted into by a prompt that will appear for you to manually accept or deny.
75
+
As of datahub v0.10.2.3, hard deleting tags, glossary terms, users, and groups will also remove references to those entities across the metadata graph.
43
76
44
-
You can optionally add `-n` or `--dry-run` to execute a dry run before issuing the final delete command.
45
-
You can optionally add `-f` or `--force` to skip confirmations
46
-
You can optionally add `--only-soft-deleted` flag to remove soft-deleted items only.
77
+
#### Hard delete a timeseries aspect
47
78
48
-
:::note
79
+
It's also possible to delete a range of timeseries aspect data for an entity without deleting the entire entity.
49
80
50
-
Make sure you surround your urn with quotes! If you do not include the quotes, your terminal may misinterpret the command._
81
+
For these deletes, the aspect and time ranges are required. You can delete all data for a timeseries aspect by providing `--start-time min --end-time max`.
Finally, once you are sure you want to delete this data forever, run
126
232
127
-
```
233
+
```shell
128
234
datahub ingest rollback --run-id <run-id>
129
235
```
130
236
@@ -133,10 +239,9 @@ This deletes both the versioned and the timeseries aspects associated with these
133
239
134
240
### Unsafe Entities and Rollback
135
241
136
-
> **_NOTE:_** Preservation of unsafe entities has been added in datahub `0.8.32`. Read on to understand what it means and how it works.
137
-
138
242
In some cases, entities that were initially ingested by a run might have had further modifications to their metadata (e.g. adding terms, tags, or documentation) through the UI or other means. During a roll back of the ingestion that initially created these entities (technically, if the key aspect for these entities are being rolled back), the ingestion process will analyse the metadata graph for aspects that will be left "dangling" and will:
139
-
1. Leave these aspects untouched in the database, and soft-delete the entity. A re-ingestion of these entities will result in this additional metadata becoming visible again in the UI, so you don't lose any of your work.
243
+
244
+
1. Leave these aspects untouched in the database, and soft delete the entity. A re-ingestion of these entities will result in this additional metadata becoming visible again in the UI, so you don't lose any of your work.
140
245
2. The datahub cli will save information about these unsafe entities as a CSV for operators to later review and decide on next steps (keep or remove).
141
246
142
247
The rollback command will report how many entities have such aspects and save as a CSV the urns of these entities under a rollback reports directory, which defaults to `rollback_reports` under the current directory where the cli is run, and can be configured further using the `--reports-dir` command line arg.
Copy file name to clipboardExpand all lines: docs/how/updating-datahub.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,8 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
7
7
### Breaking Changes
8
8
9
9
-#7900: The `catalog_pattern` and `schema_pattern` options of the Unity Catalog source now match against the fully qualified name of the catalog/schema instead of just the name. Unless you're using regex `^` in your patterns, this should not affect you.
10
+
-#8068: In the `datahub delete` CLI, if an `--entity-type` filter is not specified, we automatically delete across all entity types. The previous behavior was to use a default entity type of dataset.
11
+
-#8068: In the `datahub delete` CLI, the `--start-time` and `--end-time` parameters are not required for timeseries aspect hard deletes. To recover the previous behavior of deleting all data, use `--start-time min --end-time max`.
0 commit comments