Merge pull request #239406 from ruixinxu/master

PMEds28 · web-flow · commit 868bb7883aa8 · 2023-05-26T11:25:59.000+01:00
update text analytics
diff --git a/articles/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark.md b/articles/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark.md
@@ -54,39 +54,6 @@ cognitive_service_name = "<Your linked service for text analytics>"
 ## Text Sentiment
 The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral" and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=sentiment-analysis) for the list of enabled languages.
 
-### V2
-
-```python
-
-# Create a dataframe that's tied to it's column names
-df = spark.createDataFrame([
-  ("I am so happy today, its sunny!", "en-US"),
-  ("I am frustrated by this rush hour traffic", "en-US"),
-  ("The cognitive services on spark aint bad", "en-US"),
-], ["text", "language"])
-
-# Run the Text Analytics service with options
-sentimentv2 = (TextSentimentV2()
-    .setLinkedService(linked_service_name)
-    .setTextCol("text")
-    .setOutputCol("sentiment")
-    .setErrorCol("error")
-    .setLanguageCol("language"))
-
-# Show the results of your text query in a table format
-display(sentimentv2.transform(df).select("text", col("sentiment")[0].getItem("score").alias("positive score")))
-
-```
-### Expected results
-
-|text|positive score|
-|---|---|
-|I am so happy today, its sunny!|0.99511755|
-|I am frustrated by this rush hour traffic|0.007274598|
-|The cognitive services on spark aint bad|0.9144157|
-
-### V3.1
-
 ```python
 
 # Create a dataframe that's tied to it's column names
@@ -105,7 +72,11 @@ sentiment = (TextSentiment()
     .setLanguageCol("language"))
 
 # Show the results of your text query in a table format
-display(sentiment.transform(df).select("text", col("sentiment")[0].getItem("sentiment").alias("sentiment")))
+results = sentiment.transform(df)
+
+display(results
+    .withColumn("sentiment", col("sentiment").getItem("document").getItem("sentences")[0].getItem("sentiment"))
+    .select("text", "sentiment"))
 
 ```
 ### Expected results
@@ -122,32 +93,6 @@ display(sentiment.transform(df).select("text", col("sentiment")[0].getItem("sent
 
 The Language Detector evaluates text input for each document and returns language identifiers with a score that indicates the strength of the analysis. This capability is useful for content stores that collect arbitrary text, where language is unknown. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=language-detection) for the list of enabled languages.
 
-### V2
-```python
-# Create a dataframe that's tied to it's column names
-df = spark.createDataFrame([
-  ("Hello World",),
-  ("Bonjour tout le monde",),
-  ("La carretera estaba atascada. Había mucho tráfico el día de ayer.",),
-  ("你好",),
-  ("こんにちは",),
-  (":) :( :D",)
-], ["text",])
-
-# Run the Text Analytics service with options
-languagev2 = (LanguageDetectorV2()
-    .setLinkedService(linked_service_name)
-    .setTextCol("text")
-    .setOutputCol("language")
-    .setErrorCol("error"))
-
-# Show the results of your text query in a table format
-display(languagev2.transform(df))
-```
-### Expected results
-![Expected results for language detector v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-language-detector-v-2.png)
-
-### V3.1
 ```python
 # Create a dataframe that's tied to it's column names
 df = spark.createDataFrame([
@@ -176,28 +121,6 @@ display(language.transform(df))
 ## Entity Detector
 The Entity Detector returns a list of recognized entities with links to a well-known knowledge base. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=entity-linking) for the list of enabled languages.
 
-### V2
-
-```python
-df = spark.createDataFrame([
-    ("1", "Microsoft released Windows 10"),
-    ("2", "In 1975, Bill Gates III and Paul Allen founded the company.")
-], ["if", "text"])
-
-entityv2 = (EntityDetectorV2()
-    .setLinkedService(linked_service_name)
-    .setLanguage("en")
-    .setOutputCol("replies")
-    .setErrorCol("error"))
-
-display(entityv2.transform(df).select("if", "text", col("replies")[0].getItem("entities").alias("entities")))
-```
-### Expected results
-![Expected results for entity detector v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-entity-detector-v-2.png)
-
-
-### V3.1
-
 ```python
 df = spark.createDataFrame([
     ("1", "Microsoft released Windows 10"),
@@ -210,7 +133,7 @@ entity = (EntityDetector()
     .setOutputCol("replies")
     .setErrorCol("error"))
 
-display(entity.transform(df).select("if", "text", col("replies")[0].getItem("entities").alias("entities")))
+display(entity.transform(df).select("if", "text", col("replies").getItem("document").getItem("entities").alias("entities")))
 ```
 ### Expected results
 ![Expected results for entity detector v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-entity-detector-v-31.png)
@@ -221,34 +144,6 @@ display(entity.transform(df).select("if", "text", col("replies")[0].getItem("ent
 
 The Key Phrase Extraction evaluates unstructured text and returns a list of key phrases. This capability is useful if you need to quickly identify the main points in a collection of documents. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=key-phrase-extraction) for the list of enabled languages.
 
-### V2
-```python
-df = spark.createDataFrame([
-    ("en", "Hello world. This is some input text that I love."),
-    ("fr", "Bonjour tout le monde"),
-    ("es", "La carretera estaba atascada. Había mucho tráfico el día de ayer.")
-], ["lang", "text"])
-
-keyPhrasesv2 = (KeyPhraseExtractorV2()
-    .setLinkedService(linked_service_name)
-    .setLanguageCol("lang")
-    .setOutputCol("replies")
-    .setErrorCol("error"))
-
-display(keyPhrasesv2.transform(df).select("text", col("replies")[0].getItem("keyPhrases").alias("keyPhrases")))
-```
-
-### Expected results
-
-|text|keyPhrases|
-|---|---|
-|Hello world. This is some input text that I love.|"["input text","world"]"|
-|Bonjour tout le monde|"["monde"]"|
-|La carretera estaba atascada. Había mucho tráfico el día de ayer.|"["carretera","tráfico","día"]"|
-
-
-### V3.1
-
 ```python
 df = spark.createDataFrame([
     ("en", "Hello world. This is some input text that I love."),
@@ -262,7 +157,7 @@ keyPhrase = (KeyPhraseExtractor()
     .setOutputCol("replies")
     .setErrorCol("error"))
 
-display(keyPhrase.transform(df).select("text", col("replies")[0].getItem("keyPhrases").alias("keyPhrases")))
+display(keyPhrase.transform(df).select("text", col("replies").getItem("document").getItem("keyPhrases").alias("keyPhrases")))
 ```
 
 ### Expected results
@@ -279,26 +174,6 @@ display(keyPhrase.transform(df).select("text", col("replies")[0].getItem("keyPhr
 
 Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product, and organization. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=named-entity-recognition) for the list of enabled languages.
 
-### V2
-```python
-df = spark.createDataFrame([
-    ("1", "en", "I had a wonderful trip to Seattle last week."),
-    ("2", "en", "I visited Space Needle 2 times.")
-], ["id", "language", "text"])
-
-nerv2 = (NERV2()
-    .setLinkedService(linked_service_name)
-    .setLanguageCol("language")
-    .setOutputCol("replies")
-    .setErrorCol("error"))
-
-display(nerv2.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
-```
-### Expected results
-![Expected results for named entity recognition v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-ner-v-2.png)
-
-### V3.1
-
 ```python
 df = spark.createDataFrame([
     ("1", "en", "I had a wonderful trip to Seattle last week."),
@@ -311,7 +186,7 @@ ner = (NER()
     .setOutputCol("replies")
     .setErrorCol("error"))
 
-display(ner.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
+display(ner.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
 ```
 ### Expected results
 ![Expected results for named entity recognition v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-ner-v-31.png)
@@ -321,8 +196,6 @@ display(ner.transform(df).select("text", col("replies")[0].getItem("entities").a
 ## Personally Identifiable Information (PII) V3.1
 The PII feature is part of NER and it can identify and redact sensitive entities in text that are associated with an individual person such as: phone number, email address, mailing address, passport number. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=pii) for the list of enabled languages.
 
-### V3.1
-
 ```python
 df = spark.createDataFrame([
     ("1", "en", "My SSN is 859-98-0987"),
@@ -336,7 +209,7 @@ pii = (PII()
     .setOutputCol("replies")
     .setErrorCol("error"))
 
-display(pii.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
+display(pii.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
 ```
 ### Expected results
 ![Expected results for personal identifiable information v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-pii-v-31.png)