You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/install.md
+31-14Lines changed: 31 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Spark NLP {{ site.sparknlp_version }} is built with ONNX 1.17.0 and TensorFlow 2
47
47
48
48
### Scala 2.13
49
49
50
-
Note that Spark NLP from PyPI can not start a PySpark Scala 2.13 session. Please use the instructions above.
50
+
**NOTE**: PySpark from PyPI is based on Scala 2.12 by default, and you can use our Scala 2.12 version. If you need to start a Scala 2.13 instance, you can set the `SPARK_HOME` environment variable to a Spark Scala 2.13 installation, or install PySpark from the official Spark archives.
If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your projects [Spark NLP SBT Starter](https://github.com/maziyarpanahi/spark-nlp-starter)
270
271
271
-
#### Scala 2.13 Support
272
+
### Scala 2.13 Support
273
+
274
+
**NOTE**: PyPi installed Pyspark only runs on Scala 2.12, so the following section will not apply for it. If you need to start a Scala 2.13 instance, you can set the `SPARK_HOME` environment variable to a Spark Scala 2.13 installation, or install PySpark from the official Spark archives.
275
+
276
+
If you are using `DependencyParserModel` or `TextMatcherModel` in your pipelines and wish to import from the Scala 2.12 version to 2.13, then you will need to export them manually. For this, please see the example notebook [Converting Spark NLP Scala 2.12 models to Scala 2.13](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/scala213/converting_models_from_212.ipynb).
277
+
278
+
`spark-nlp` with Scala 2.13 support has been published to [Maven Central](https://central.sonatype.com/artifact/com.johnsnowlabs.nlp/spark-nlp_2.13). You can use these coordinates to set up your Spark instance with config `--packages` or download the jar directly. For example:
**NOTE**: PyPi installed Pyspark only runs on Scala 2.12, so the following section will not apply for it. If you need to start a Scala 2.13 instance, you can set the `SPARK_HOME` environment variable to a Spark Scala 2.13 installation.
285
+
See our [cheat sheet](#spark-nlp-cheatsheet)for more examples.
274
286
275
-
The `spark-nlp` with Scala 2.13 support has been published to
276
-
the [Maven Central](https://central.sonatype.com/artifact/com.johnsnowlabs.nlp/spark-nlp_2.13).
287
+
To use spark-nlp Scala 2.13 as a dependency, change the `2.12` string in our dependencies to `2.13`.
277
288
278
-
For Scala 2.13 support, change the `2.12` string in our dependencies to `2.13`.
289
+
**spark-nlp:**
279
290
280
291
```xml
281
292
<dependency>
@@ -317,6 +328,8 @@ For Scala 2.13 support, change the `2.12` string in our dependencies to `2.13`.
317
328
318
329
If you are running an sbt project in Scala 2.13, then you you don't require any changes, as the sbt syntax handles it automatically:
@@ -727,9 +740,10 @@ Note: You can import these notebooks by using their URLs.
727
740
Microsoft Fabric notebooks run on managed Spark 3.4 clusters, so you need to provide the Spark NLP fat JARs through OneLake/ABFSS and wire them into the runtime via Spark properties.
728
741
729
742
### Spark NLP on Microsoft Fabric
743
+
730
744
1. Inside Fabric go to a workspace and click on `+New Item` button, type `lake` on the search bar and chose `Lakehouse` and type a name for it.
3. Choose **Fabric Runtime 1.2** (Spark 3.4 + Delta 2.4) then go to `Spark properties` and set `spark.jars`
@@ -738,7 +752,9 @@ Microsoft Fabric notebooks run on managed Spark 3.4 clusters, so you need to pro
738
752
5. Create a Notebook and attach it to the environment you created before.
739
753
740
754
### Spark NLP ONNX compatibility on Microsoft Fabric
755
+
741
756
Follow the steps above to set up Spark NLP, then add the following additional steps to enable ONNX inference support:
757
+
742
758
1. On `Spark properties` point `spark.executor.extraClassPath` and `spark.driver.extraClassPath` to the ABFSS jar directory to ensure ONNX classes are visible `abfss://workspace@storage.dfs.core.windows.net/jars/spark-nlp-assembly-{{ site.sparknlp_version }}.jar`.
743
759
2. On `Spark properties` enable `spark.executor.userClassPathFirst=true` and `spark.driver.userClassPathFirst=true` so the Spark NLP/ONNX classes take precedence over the Fabric runtime defaults.
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
871
-
872
-
3. Now, you can attach your notebook to the cluster and use the Spark NLP!
886
+
1. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
873
887
888
+
2. Now, you can attach your notebook to the cluster and use the Spark NLP!
874
889
875
890
## Apache Spark Support
876
891
877
892
Spark NLP *{{ site.sparknlp_version }}* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
0 commit comments