You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,8 +48,8 @@ As an illustration, consider the scenario where you need to retrieve a single ro
48
48
49
49
```
50
50
dx.from_tables("dev_*.*.*sample*")\
51
-
.apply_sql("SELECT to_json(struct(*)) AS row FROM {full_table_name} LIMIT 1")\
52
-
.execute()
51
+
.with_sql("SELECT to_json(struct(*)) AS row FROM {full_table_name} LIMIT 1")\
52
+
.apply()
53
53
```
54
54
55
55
## Available functionality
@@ -59,7 +59,7 @@ The available `dx` functions are
59
59
*`from_tables("<catalog>.<schema>.<table>")` selects tables based on the specified pattern (use `*` as a wildcard). Returns a `DataExplorer` object with methods
60
60
*`having_columns` restricts the selection to tables that have the specified columns
61
61
*`with_concurrency` defines how many queries are executed concurrently (10 by defailt)
62
-
*`apply_sql` applies a SQL template to all tables. After this command you can apply an [action](#from_tables-actions). See in-depth documentation [here](docs/Arbitrary_multi-table_SQL.md).
62
+
*`with_sql` applies a SQL template to all tables. After this command you can apply an [action](#from_tables-actions). See in-depth documentation [here](docs/Arbitrary_multi-table_SQL.md).
63
63
*`unpivot_string_columns` returns a melted (unpivoted) dataframe with all string columns from the selected tables. After this command you can apply an [action](#from_tables-actions)
64
64
*`scan` (experimental) scans the lakehouse with regex expressions defined by the rules and to power the semantic classification.
65
65
*`intro` gives an introduction to the library
@@ -72,12 +72,11 @@ The available `dx` functions are
72
72
73
73
### from_tables Actions
74
74
75
-
After a `apply_sql` or `unpivot_string_columns` command, you can apply the following actions:
75
+
After a `with_sql` or `unpivot_string_columns` command, you can apply the following actions:
76
76
77
77
*`explain` explains the queries that would be executed
78
-
*`execute` executes the queries and shows the result in a unioned dataframe
79
-
*`to_union_dataframe` unions all the dataframes that result from the queries
80
-
78
+
*`display` executes the queries and shows the first 1000 rows of the result in a unioned dataframe
79
+
*`apply` returns a unioned dataframe with the result from the queries
Copy file name to clipboardExpand all lines: docs/Vacuum.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,8 @@ With DiscoverX you can vacuum all the tables at once with the command:
8
8
9
9
```
10
10
dx.from_tables("*.*.*")\
11
-
.apply_sql("VACUUM {full_table_name}")\
12
-
.execute()
11
+
.with_sql("VACUUM {full_table_name}")\
12
+
.display()
13
13
```
14
14
15
15
You can schedule [this example notebook](https://raw.githubusercontent.com/databrickslabs/discoverx/master/examples/vacuum_multiple_tables.py) in your Databricks workflows to run vacuum periodically.
Copy file name to clipboardExpand all lines: examples/detect_small_files.py
+9-12Lines changed: 9 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
# MAGIC As a rule of thumb, if a table has more than `100` files and average file size smaller than `10 MB`, then we can consider it having too many small files.
8
8
# MAGIC
9
9
# MAGIC Some common causes of too many small files are:
10
-
# MAGIC * Overpartitioning: the cardinality of the partition columns is too high
10
+
# MAGIC * Overpartitioning: the cardinality of the partition columns is too high
11
11
# MAGIC * Lack of scheduled maintenance operations like `OPTIMIZE`
0 commit comments