Skip to content

Commit 66a0bc1

Browse files
Merge pull request #560 from databrickslabs/bundle_fix_0.4.2
Candidate v0.4.2 [DBR 13.3 LTS]
2 parents d17916f + 49a7366 commit 66a0bc1

File tree

29 files changed

+444
-480
lines changed

29 files changed

+444
-480
lines changed

.github/workflows/build_main.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,7 @@ jobs:
2828
uses: ./.github/actions/scala_build
2929
- name: build python
3030
uses: ./.github/actions/python_build
31-
# CRAN FLAKY (502 'Bad Gateway' ERRORS)
32-
# - name: build R
33-
# uses: ./.github/actions/r_build
31+
- name: build R
32+
uses: ./.github/actions/r_build
3433
- name: upload artefacts
3534
uses: ./.github/actions/upload_artefacts

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
## v0.4.2 [DBR 13.3 LTS]
2+
- Geopandas now fixed to "<0.14.4,>=0.14" due to conflict with minimum numpy version in geopandas 0.14.4.
3+
- H3 python changed from "==3.7.0" to "<4.0,>=3.7" to pick up patches.
4+
- Fixed an issue with fallback logic when deserializing subdatasets from a zip.
5+
- Adjusted data used to speed up a long-running test.
6+
- Streamlines setup_gdal and setup_fuse_install:
7+
- init script and resource copy logic fixed to repo "main" (.so) / "latest" (.jar).
8+
- added apt-get lock handling for native installs.
9+
- removed support for UbuntuGIS PPA as GDAL version no longer compatible with jammy default (3.4.x).
10+
111
## v0.4.1 [DBR 13.3 LTS]
212
- Fixed python bindings for MosaicAnalyzer functions.
313
- Added tiller functions, ST_AsGeoJSONTile and ST_AsMVTTile, for creating GeoJSON and MVT tiles as aggregations of geometries.

CONTRIBUTING.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,9 @@ The repository is structured as follows:
8383

8484
## Test & build Mosaic
8585

86+
Given that DBR 13.3 is Ubuntu 22.04, we recommend using docker,
87+
see [mosaic-docker.sh](https://github.com/databrickslabs/mosaic/blob/main/scripts/mosaic-docker.sh).
88+
8689
### Scala JAR
8790

8891
We use the [Maven](https://maven.apache.org/install.html) build tool to manage and build the Mosaic scala project.
@@ -115,6 +118,8 @@ To build the docs:
115118
- Install the pandoc library (follow the instructions for your platform [here](https://pandoc.org/installing.html)).
116119
- Install the python requirements from `docs/docs-requirements.txt`.
117120
- Build the HTML documentation by running `make html` from `docs/`.
121+
- For nbconvert you may have to symlink your jupyter share folder,
122+
e.g. `sudo ln -s /opt/homebrew/share/jupyter /usr/local/share`.
118123
- You can locally host the docs by running the `reload.py` script in the `docs/source/` directory.
119124

120125
## Style

R/sparkR-mosaic/sparkrMosaic/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: sparkrMosaic
22
Title: SparkR bindings for Databricks Mosaic
3-
Version: 0.4.1
3+
Version: 0.4.2
44
Authors@R:
55
person("Robert", "Whiffin", , "robert.whiffin@databricks.com", role = c("aut", "cre")
66
)

R/sparklyr-mosaic/sparklyrMosaic/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: sparklyrMosaic
22
Title: sparklyr bindings for Databricks Mosaic
3-
Version: 0.4.1
3+
Version: 0.4.2
44
Authors@R:
55
person("Robert", "Whiffin", , "robert.whiffin@databricks.com", role = c("aut", "cre")
66
)

R/sparklyr-mosaic/tests.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ library(sparklyr.nested)
99
spark_home <- Sys.getenv("SPARK_HOME")
1010
spark_home_set(spark_home)
1111

12-
install.packages("sparklyrMosaic_0.4.1.tar.gz", repos = NULL)
12+
install.packages("sparklyrMosaic_0.4.2.tar.gz", repos = NULL)
1313
library(sparklyrMosaic)
1414

1515
# find the mosaic jar in staging

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ An extension to the [Apache Spark](https://spark.apache.org/) framework that all
88
[![codecov](https://codecov.io/gh/databrickslabs/mosaic/branch/main/graph/badge.svg?token=aEzZ8ITxdg)](https://codecov.io/gh/databrickslabs/mosaic)
99
[![build](https://github.com/databrickslabs/mosaic/actions/workflows/build_main.yml/badge.svg)](https://github.com/databrickslabs/mosaic/actions?query=workflow%3A%22build+main%22)
1010
[![docs](https://github.com/databrickslabs/mosaic/actions/workflows/docs.yml/badge.svg)](https://github.com/databrickslabs/mosaic/actions/workflows/docs.yml)
11-
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/databrickslabs/mosaic.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/databrickslabs/mosaic/context:python)
1211
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1312
[![lines of code](https://tokei.rs/b1/github/databrickslabs/mosaic)]([https://codecov.io/github/databrickslabs/mosaic](https://github.com/databrickslabs/mosaic))
1413

@@ -33,7 +32,8 @@ The supported languages are Scala, Python, R, and SQL.
3332

3433
## How does it work?
3534

36-
The Mosaic library is written in Scala (JVM) to guarantee maximum performance with Spark and when possible, it uses code generation to give an extra performance boost.
35+
The Mosaic library is written in Scala (JVM) to guarantee maximum performance with Spark and when possible,
36+
it uses code generation to give an extra performance boost.
3737

3838
__The other supported languages (Python, R and SQL) are thin wrappers around the Scala (JVM) code.__
3939

@@ -42,6 +42,13 @@ Image1: Mosaic logical design.
4242

4343
## Getting started
4444

45+
:warning: **geopandas 0.14.4 not supported**
46+
47+
For Mosaic <= 0.4.1 `%pip install databricks-mosaic` will no longer install "as-is" in DBRs due to the fact that Mosaic
48+
left geopandas unpinned in those versions. With geopandas 0.14.4, numpy dependency conflicts with the limits of
49+
scikit-learn in DBRs. The workaround is `%pip install geopandas==0.14.3 databricks-mosaic`.
50+
Mosaic 0.4.2+ limits the geopandas version.
51+
4552
### Mosaic 0.4.x Series [Latest]
4653

4754
We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
@@ -56,18 +63,21 @@ We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
5663
5764
__Language Bindings__
5865

59-
As of Mosaic 0.4.0 (subject to change in follow-on releases)...
66+
As of Mosaic 0.4.0 / DBR 13.3 LTS (subject to change in follow-on releases)...
6067

61-
* [Assigned Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes): Mosaic Python, SQL, R, and Scala APIs.
62-
* [Shared Access Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes): Mosaic Scala API (JVM) with Admin [allowlisting](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html); _Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters._
68+
* [Assigned Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes)
69+
* Mosaic Python, SQL, R, and Scala APIs.
70+
* [Shared Access Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes)
71+
* Mosaic Scala API (JVM) with Admin [allowlisting](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html).
72+
* Mosaic Python bindings (to Mosaic Scala APIs) are blocked by Py4J Security on Shared Access Clusters.
6373
* Mosaic SQL expressions cannot yet be registered with [Unity Catalog](https://www.databricks.com/product/unity-catalog) due to API changes affecting DBRs >= 13, more [here](https://docs.databricks.com/en/udf/index.html).
6474

6575
__Additional Notes:__
6676

67-
As of Mosaic 0.4.0 (subject to change in follow-on releases)...
77+
Mosaic is a custom JVM library that extends spark, which has the following implications in DBR 13.3 LTS:
6878

6979
1. [Unity Catalog](https://www.databricks.com/product/unity-catalog): Enforces process isolation which is difficult to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other supported languages in Shared Access Clusters.
70-
2. [Volumes](https://docs.databricks.com/en/connect/unity-catalog/volumes.html): Along the same principle of isolation, clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom python calls which do not involve any custom JVM code.
80+
2. [Volumes](https://docs.databricks.com/en/connect/unity-catalog/volumes.html): Along the same principle of isolation, clusters can read Volumes via relevant built-in (aka platform provided) readers and writers or via custom python calls which do not involve any custom JVM code.
7181

7282
### Mosaic 0.3.x Series
7383

@@ -142,7 +152,7 @@ import com.databricks.labs.mosaic.JTS
142152
val mosaicContext = MosaicContext.build(H3, JTS)
143153
mosaicContext.register(spark)
144154
```
145-
__Note: Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Hive UDFs), but not Shared Access due to API changes, more [here](https://docs.databricks.com/en/udf/index.html).__
155+
__Note: Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Spark Expressions), but not Shared Access due to API changes, more [here](https://docs.databricks.com/en/udf/index.html).__
146156

147157
## Examples
148158

docs/source/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@
1818
# -- Project information -----------------------------------------------------
1919

2020
project = 'Mosaic'
21-
copyright = '2022, Databricks Inc'
22-
author = 'Stuart Lynn, Milos Colic, Erni Durdevic, Robert Whiffin, Timo Roest'
21+
copyright = '2024, Databricks Inc'
22+
author = 'Milos Colic, Stuart Lynn, Michael Johns, Robert Whiffin'
2323

2424
# The full version, including alpha/beta/rc tags
25-
release = "v0.4.1"
25+
release = "v0.4.2"
2626

2727

2828
# -- General configuration ---------------------------------------------------

docs/source/index.rst

Lines changed: 37 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -29,84 +29,73 @@
2929
:target: https://github.com/databrickslabs/mosaic/actions/workflows/docs.yml
3030
:alt: Mosaic sphinx docs
3131

32-
.. image:: https://img.shields.io/lgtm/grade/python/g/databrickslabs/mosaic.svg?logo=lgtm&logoWidth=18
33-
:target: https://lgtm.com/projects/g/databrickslabs/mosaic/context:python
34-
:alt: Language grade: Python
35-
3632
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
3733
:target: https://github.com/psf/black
3834
:alt: Code style: black
3935

40-
41-
42-
Mosaic is an extension to the `Apache Spark <https://spark.apache.org/>`_ framework that allows easy and fast processing of very large geospatial datasets.
43-
44-
We currently recommend using Databricks Runtime with Photon enabled;
45-
this will leverage the Databricks H3 expressions when using H3 grid system.
46-
47-
Mosaic provides:
48-
49-
* easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);
50-
* constructors to easily generate new geometries from Spark native data types;
51-
* many of the OGC SQL standard :code:`ST_` functions implemented as Spark Expressions for transforming, aggregating and joining spatial datasets;
52-
* high performance through implementation of Spark code generation within the core Mosaic functions;
53-
* optimisations for performing point-in-polygon joins using an approach we co-developed with Ordnance Survey (`blog post <https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html>`_); and
54-
* the choice of a Scala, SQL and Python API.
36+
| Mosaic is an extension to the `Apache Spark <https://spark.apache.org/>`_ framework for fast + easy processing
37+
of very large geospatial datasets. It provides:
38+
|
39+
| [1] The choice of a Scala, SQL and Python language bindings (written in Scala).
40+
| [2] Raster and Vector APIs.
41+
| [3] Easy conversion between common spatial data encodings (WKT, WKB and GeoJSON).
42+
| [4] Constructors to easily generate new geometries from Spark native data types.
43+
| [5] Many of the OGC SQL standard :code:`ST_` functions implemented as Spark Expressions for transforming,
44+
| aggregating and joining spatial datasets.
45+
| [6] High performance through implementation of Spark code generation within the core Mosaic functions.
46+
| [7] Performing point-in-polygon joins using an approach we co-developed with Ordnance Survey
47+
(`blog post <https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html>`_).
5548
5649
.. note::
57-
For Mosaic versions < 0.4 please use the `0.3 docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`_.
58-
59-
.. warning::
60-
At times, it is useful to "hard refresh" pages to ensure your cached local version matches the latest live,
61-
more `here <https://www.howtogeek.com/672607/how-to-hard-refresh-your-web-browser-to-bypass-your-cache/>`_.
50+
We recommend using Databricks Runtime with Photon enabled to leverage the Databricks H3 expressions.
6251

6352
Version 0.4.x Series
6453
====================
6554

66-
We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
55+
.. warning::
56+
For Mosaic <= 0.4.1 :code:`%pip install databricks-mosaic` will no longer install "as-is" in DBRs due to the fact that Mosaic
57+
left geopandas unpinned in those versions. With geopandas 0.14.4, numpy dependency conflicts with the limits of
58+
scikit-learn in DBRs. The workaround is :code:`%pip install geopandas==0.14.3 databricks-mosaic`.
59+
Mosaic 0.4.2+ limits the geopandas version.
6760

6861
Mosaic 0.4.x series only supports DBR 13.x DBRs. If running on a different DBR it will throw an exception:
6962

70-
DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13.
71-
You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.
63+
DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13.
64+
You can specify :code:`%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.
7265

7366
Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
7467
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
7568
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_:
7669

77-
DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for
78-
spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.
79-
80-
As of Mosaic 0.4.0 (subject to change in follow-on releases)
70+
DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for
71+
spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.
8172

82-
* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_: Mosaic Python, SQL, R, and Scala APIs.
83-
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_: Mosaic Scala API (JVM) with
84-
Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_;
85-
Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters.
73+
As of Mosaic 0.4.0 / DBR 13.3 LTS (subject to change in follow-on releases):
8674

87-
.. warning::
88-
Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Hive UDFs), but not Shared Access due
89-
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
75+
* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_
76+
* Mosaic Python, SQL, R, and Scala APIs.
77+
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_
78+
* Mosaic Scala API (JVM) with Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_.
79+
* Mosaic Python bindings (to Mosaic Scala APIs) are blocked by Py4J Security on Shared Access Clusters.
80+
* Mosaic SQL expressions cannot yet be registered due to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_.
81+
API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
9082

9183
.. note::
92-
As of Mosaic 0.4.0 (subject to change in follow-on releases)
84+
Mosaic is a custom JVM library that extends spark, which has the following implications in DBR 13.3 LTS:
9385

9486
* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ enforces process isolation which is difficult
9587
to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from
9688
other supported languages in Shared Access Clusters.
97-
* Along the same principle of isolation, clusters (both Assigned and Shared Access) can read
98-
`Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ via relevant built-in readers and
99-
writers or via custom python calls which do not involve any custom JVM code.
89+
* Clusters can read `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ via relevant
90+
built-in (aka platform provided) readers and writers or via custom python calls which do not involve any custom JVM code.
91+
10092

10193
Version 0.3.x Series
10294
====================
10395

10496
We recommend using Databricks Runtime versions 12.2 LTS with Photon enabled.
10597
For Mosaic versions < 0.4.0 please use the `0.3.x docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`_.
10698

107-
.. warning::
108-
Mosaic 0.3.x series does not support DBR 13.x DBRs.
109-
11099
As of the 0.3.11 release, Mosaic issues the following WARNING when initialized on a cluster that is neither Photon Runtime
111100
nor Databricks Runtime ML `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
112101
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
@@ -120,6 +109,9 @@ making this change is that we are streamlining Mosaic internals to be more align
120109
powered by Photon. Along this direction of change, Mosaic has standardized to JTS as its default and supported Vector
121110
Geometry Provider.
122111

112+
.. note::
113+
For Mosaic versions < 0.4 please use the `0.3 docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`_.
114+
123115

124116
Documentation
125117
=============

docs/source/usage/automatic-sql-registration.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ to your Spark / Databricks cluster to perform spatial queries or integrating Spa
1111
with a geospatial middleware component such as [Geoserver](https://geoserver.org/).
1212

1313
.. warning::
14-
Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Hive UDFs), but not Shared Access due
14+
Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Spark Expressions), but not Shared Access due
1515
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
1616

1717
Pre-requisites

0 commit comments

Comments
 (0)