You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[]([https://codecov.io/github/databrickslabs/mosaic](https://github.com/databrickslabs/mosaic))
14
13
@@ -33,7 +32,8 @@ The supported languages are Scala, Python, R, and SQL.
33
32
34
33
## How does it work?
35
34
36
-
The Mosaic library is written in Scala (JVM) to guarantee maximum performance with Spark and when possible, it uses code generation to give an extra performance boost.
35
+
The Mosaic library is written in Scala (JVM) to guarantee maximum performance with Spark and when possible,
36
+
it uses code generation to give an extra performance boost.
37
37
38
38
__The other supported languages (Python, R and SQL) are thin wrappers around the Scala (JVM) code.__
39
39
@@ -42,6 +42,13 @@ Image1: Mosaic logical design.
42
42
43
43
## Getting started
44
44
45
+
:warning:**geopandas 0.14.4 not supported**
46
+
47
+
For Mosaic <= 0.4.1 `%pip install databricks-mosaic` will no longer install "as-is" in DBRs due to the fact that Mosaic
48
+
left geopandas unpinned in those versions. With geopandas 0.14.4, numpy dependency conflicts with the limits of
49
+
scikit-learn in DBRs. The workaround is `%pip install geopandas==0.14.3 databricks-mosaic`.
50
+
Mosaic 0.4.2+ limits the geopandas version.
51
+
45
52
### Mosaic 0.4.x Series [Latest]
46
53
47
54
We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
@@ -56,18 +63,21 @@ We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
56
63
57
64
__Language Bindings__
58
65
59
-
As of Mosaic 0.4.0 (subject to change in follow-on releases)...
66
+
As of Mosaic 0.4.0 / DBR 13.3 LTS (subject to change in follow-on releases)...
60
67
61
-
*[Assigned Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes): Mosaic Python, SQL, R, and Scala APIs.
62
-
*[Shared Access Clusters](https://docs.databricks.com/en/compute/configure.html#access-modes): Mosaic Scala API (JVM) with Admin [allowlisting](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html); _Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters._
* Mosaic Scala API (JVM) with Admin [allowlisting](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html).
72
+
* Mosaic Python bindings (to Mosaic Scala APIs) are blocked by Py4J Security on Shared Access Clusters.
63
73
* Mosaic SQL expressions cannot yet be registered with [Unity Catalog](https://www.databricks.com/product/unity-catalog) due to API changes affecting DBRs >= 13, more [here](https://docs.databricks.com/en/udf/index.html).
64
74
65
75
__Additional Notes:__
66
76
67
-
As of Mosaic 0.4.0 (subject to change in follow-on releases)...
77
+
Mosaic is a custom JVM library that extends spark, which has the following implications in DBR 13.3 LTS:
68
78
69
79
1.[Unity Catalog](https://www.databricks.com/product/unity-catalog): Enforces process isolation which is difficult to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other supported languages in Shared Access Clusters.
70
-
2.[Volumes](https://docs.databricks.com/en/connect/unity-catalog/volumes.html): Along the same principle of isolation, clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom python calls which do not involve any custom JVM code.
80
+
2.[Volumes](https://docs.databricks.com/en/connect/unity-catalog/volumes.html): Along the same principle of isolation, clusters can read Volumes via relevant built-in (aka platform provided) readers and writers or via custom python calls which do not involve any custom JVM code.
__Note: Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Hive UDFs), but not Shared Access due to API changes, more [here](https://docs.databricks.com/en/udf/index.html).__
155
+
__Note: Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Spark Expressions), but not Shared Access due to API changes, more [here](https://docs.databricks.com/en/udf/index.html).__
Mosaic is an extension to the `Apache Spark <https://spark.apache.org/>`_ framework that allows easy and fast processing of very large geospatial datasets.
43
-
44
-
We currently recommend using Databricks Runtime with Photon enabled;
45
-
this will leverage the Databricks H3 expressions when using H3 grid system.
46
-
47
-
Mosaic provides:
48
-
49
-
* easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);
50
-
* constructors to easily generate new geometries from Spark native data types;
51
-
* many of the OGC SQL standard :code:`ST_` functions implemented as Spark Expressions for transforming, aggregating and joining spatial datasets;
52
-
* high performance through implementation of Spark code generation within the core Mosaic functions;
53
-
* optimisations for performing point-in-polygon joins using an approach we co-developed with Ordnance Survey (`blog post <https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html>`_); and
54
-
* the choice of a Scala, SQL and Python API.
36
+
|Mosaic is an extension to the `Apache Spark <https://spark.apache.org/>`_ framework for fast + easy processing
37
+
of very large geospatial datasets. It provides:
38
+
|
39
+
|[1] The choice of a Scala, SQL and Python language bindings (written in Scala).
40
+
|[2] Raster and Vector APIs.
41
+
|[3] Easy conversion between common spatial data encodings (WKT, WKB and GeoJSON).
42
+
|[4] Constructors to easily generate new geometries from Spark native data types.
43
+
|[5] Many of the OGC SQL standard :code:`ST_` functions implemented as Spark Expressions for transforming,
44
+
|aggregating and joining spatial datasets.
45
+
|[6] High performance through implementation of Spark code generation within the core Mosaic functions.
46
+
|[7] Performing point-in-polygon joins using an approach we co-developed with Ordnance Survey
47
+
(`blog post <https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html>`_).
55
48
56
49
.. note::
57
-
For Mosaic versions < 0.4 please use the `0.3 docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`_.
58
-
59
-
.. warning::
60
-
At times, it is useful to "hard refresh" pages to ensure your cached local version matches the latest live,
61
-
more `here <https://www.howtogeek.com/672607/how-to-hard-refresh-your-web-browser-to-bypass-your-cache/>`_.
50
+
We recommend using Databricks Runtime with Photon enabled to leverage the Databricks H3 expressions.
62
51
63
52
Version 0.4.x Series
64
53
====================
65
54
66
-
We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.
55
+
.. warning::
56
+
For Mosaic <= 0.4.1 :code:`%pip install databricks-mosaic` will no longer install "as-is" in DBRs due to the fact that Mosaic
57
+
left geopandas unpinned in those versions. With geopandas 0.14.4, numpy dependency conflicts with the limits of
58
+
scikit-learn in DBRs. The workaround is :code:`%pip install geopandas==0.14.3 databricks-mosaic`.
59
+
Mosaic 0.4.2+ limits the geopandas version.
67
60
68
61
Mosaic 0.4.x series only supports DBR 13.x DBRs. If running on a different DBR it will throw an exception:
69
62
70
-
DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13.
71
-
You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.
63
+
DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13.
64
+
You can specify :code:`%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.
72
65
73
66
Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
0 commit comments