Skip to content

Commit 0c47b17

Browse files
committed
update text analytics
1 parent 13cc429 commit 0c47b17

File tree

1 file changed

+9
-136
lines changed

1 file changed

+9
-136
lines changed

articles/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark.md

Lines changed: 9 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -54,39 +54,6 @@ cognitive_service_name = "<Your linked service for text analytics>"
5454
## Text Sentiment
5555
The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral" and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=sentiment-analysis) for the list of enabled languages.
5656

57-
### V2
58-
59-
```python
60-
61-
# Create a dataframe that's tied to it's column names
62-
df = spark.createDataFrame([
63-
("I am so happy today, its sunny!", "en-US"),
64-
("I am frustrated by this rush hour traffic", "en-US"),
65-
("The cognitive services on spark aint bad", "en-US"),
66-
], ["text", "language"])
67-
68-
# Run the Text Analytics service with options
69-
sentimentv2 = (TextSentimentV2()
70-
.setLinkedService(linked_service_name)
71-
.setTextCol("text")
72-
.setOutputCol("sentiment")
73-
.setErrorCol("error")
74-
.setLanguageCol("language"))
75-
76-
# Show the results of your text query in a table format
77-
display(sentimentv2.transform(df).select("text", col("sentiment")[0].getItem("score").alias("positive score")))
78-
79-
```
80-
### Expected results
81-
82-
|text|positive score|
83-
|---|---|
84-
|I am so happy today, its sunny!|0.99511755|
85-
|I am frustrated by this rush hour traffic|0.007274598|
86-
|The cognitive services on spark aint bad|0.9144157|
87-
88-
### V3.1
89-
9057
```python
9158

9259
# Create a dataframe that's tied to it's column names
@@ -105,7 +72,11 @@ sentiment = (TextSentiment()
10572
.setLanguageCol("language"))
10673

10774
# Show the results of your text query in a table format
108-
display(sentiment.transform(df).select("text", col("sentiment")[0].getItem("sentiment").alias("sentiment")))
75+
results = sentiment.transform(df)
76+
77+
display(results
78+
.withColumn("sentiment", col("sentiment").getItem("document").getItem("sentences")[0].getItem("sentiment"))
79+
.select("text", "sentiment"))
10980

11081
```
11182
### Expected results
@@ -122,32 +93,6 @@ display(sentiment.transform(df).select("text", col("sentiment")[0].getItem("sent
12293

12394
The Language Detector evaluates text input for each document and returns language identifiers with a score that indicates the strength of the analysis. This capability is useful for content stores that collect arbitrary text, where language is unknown. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=language-detection) for the list of enabled languages.
12495

125-
### V2
126-
```python
127-
# Create a dataframe that's tied to it's column names
128-
df = spark.createDataFrame([
129-
("Hello World",),
130-
("Bonjour tout le monde",),
131-
("La carretera estaba atascada. Había mucho tráfico el día de ayer.",),
132-
("你好",),
133-
("こんにちは",),
134-
(":) :( :D",)
135-
], ["text",])
136-
137-
# Run the Text Analytics service with options
138-
languagev2 = (LanguageDetectorV2()
139-
.setLinkedService(linked_service_name)
140-
.setTextCol("text")
141-
.setOutputCol("language")
142-
.setErrorCol("error"))
143-
144-
# Show the results of your text query in a table format
145-
display(languagev2.transform(df))
146-
```
147-
### Expected results
148-
![Expected results for language detector v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-language-detector-v-2.png)
149-
150-
### V3.1
15196
```python
15297
# Create a dataframe that's tied to it's column names
15398
df = spark.createDataFrame([
@@ -176,28 +121,6 @@ display(language.transform(df))
176121
## Entity Detector
177122
The Entity Detector returns a list of recognized entities with links to a well-known knowledge base. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=entity-linking) for the list of enabled languages.
178123

179-
### V2
180-
181-
```python
182-
df = spark.createDataFrame([
183-
("1", "Microsoft released Windows 10"),
184-
("2", "In 1975, Bill Gates III and Paul Allen founded the company.")
185-
], ["if", "text"])
186-
187-
entityv2 = (EntityDetectorV2()
188-
.setLinkedService(linked_service_name)
189-
.setLanguage("en")
190-
.setOutputCol("replies")
191-
.setErrorCol("error"))
192-
193-
display(entityv2.transform(df).select("if", "text", col("replies")[0].getItem("entities").alias("entities")))
194-
```
195-
### Expected results
196-
![Expected results for entity detector v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-entity-detector-v-2.png)
197-
198-
199-
### V3.1
200-
201124
```python
202125
df = spark.createDataFrame([
203126
("1", "Microsoft released Windows 10"),
@@ -210,7 +133,7 @@ entity = (EntityDetector()
210133
.setOutputCol("replies")
211134
.setErrorCol("error"))
212135

213-
display(entity.transform(df).select("if", "text", col("replies")[0].getItem("entities").alias("entities")))
136+
display(entity.transform(df).select("if", "text", col("replies").getItem("document").getItem("entities").alias("entities")))
214137
```
215138
### Expected results
216139
![Expected results for entity detector v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-entity-detector-v-31.png)
@@ -221,34 +144,6 @@ display(entity.transform(df).select("if", "text", col("replies")[0].getItem("ent
221144

222145
The Key Phrase Extraction evaluates unstructured text and returns a list of key phrases. This capability is useful if you need to quickly identify the main points in a collection of documents. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=key-phrase-extraction) for the list of enabled languages.
223146

224-
### V2
225-
```python
226-
df = spark.createDataFrame([
227-
("en", "Hello world. This is some input text that I love."),
228-
("fr", "Bonjour tout le monde"),
229-
("es", "La carretera estaba atascada. Había mucho tráfico el día de ayer.")
230-
], ["lang", "text"])
231-
232-
keyPhrasesv2 = (KeyPhraseExtractorV2()
233-
.setLinkedService(linked_service_name)
234-
.setLanguageCol("lang")
235-
.setOutputCol("replies")
236-
.setErrorCol("error"))
237-
238-
display(keyPhrasesv2.transform(df).select("text", col("replies")[0].getItem("keyPhrases").alias("keyPhrases")))
239-
```
240-
241-
### Expected results
242-
243-
|text|keyPhrases|
244-
|---|---|
245-
|Hello world. This is some input text that I love.|"["input text","world"]"|
246-
|Bonjour tout le monde|"["monde"]"|
247-
|La carretera estaba atascada. Había mucho tráfico el día de ayer.|"["carretera","tráfico","día"]"|
248-
249-
250-
### V3.1
251-
252147
```python
253148
df = spark.createDataFrame([
254149
("en", "Hello world. This is some input text that I love."),
@@ -262,7 +157,7 @@ keyPhrase = (KeyPhraseExtractor()
262157
.setOutputCol("replies")
263158
.setErrorCol("error"))
264159

265-
display(keyPhrase.transform(df).select("text", col("replies")[0].getItem("keyPhrases").alias("keyPhrases")))
160+
display(keyPhrase.transform(df).select("text", col("replies").getItem("document").getItem("keyPhrases").alias("keyPhrases")))
266161
```
267162

268163
### Expected results
@@ -279,26 +174,6 @@ display(keyPhrase.transform(df).select("text", col("replies")[0].getItem("keyPhr
279174

280175
Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product, and organization. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=named-entity-recognition) for the list of enabled languages.
281176

282-
### V2
283-
```python
284-
df = spark.createDataFrame([
285-
("1", "en", "I had a wonderful trip to Seattle last week."),
286-
("2", "en", "I visited Space Needle 2 times.")
287-
], ["id", "language", "text"])
288-
289-
nerv2 = (NERV2()
290-
.setLinkedService(linked_service_name)
291-
.setLanguageCol("language")
292-
.setOutputCol("replies")
293-
.setErrorCol("error"))
294-
295-
display(nerv2.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
296-
```
297-
### Expected results
298-
![Expected results for named entity recognition v2](./media/tutorial-text-analytics-use-mmlspark/expected-output-ner-v-2.png)
299-
300-
### V3.1
301-
302177
```python
303178
df = spark.createDataFrame([
304179
("1", "en", "I had a wonderful trip to Seattle last week."),
@@ -311,7 +186,7 @@ ner = (NER()
311186
.setOutputCol("replies")
312187
.setErrorCol("error"))
313188

314-
display(ner.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
189+
display(ner.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
315190
```
316191
### Expected results
317192
![Expected results for named entity recognition v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-ner-v-31.png)
@@ -321,8 +196,6 @@ display(ner.transform(df).select("text", col("replies")[0].getItem("entities").a
321196
## Personally Identifiable Information (PII) V3.1
322197
The PII feature is part of NER and it can identify and redact sensitive entities in text that are associated with an individual person such as: phone number, email address, mailing address, passport number. See the [Supported languages in Text Analytics API](../../cognitive-services/text-analytics/language-support.md?tabs=pii) for the list of enabled languages.
323198

324-
### V3.1
325-
326199
```python
327200
df = spark.createDataFrame([
328201
("1", "en", "My SSN is 859-98-0987"),
@@ -336,7 +209,7 @@ pii = (PII()
336209
.setOutputCol("replies")
337210
.setErrorCol("error"))
338211

339-
display(pii.transform(df).select("text", col("replies")[0].getItem("entities").alias("entities")))
212+
display(pii.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
340213
```
341214
### Expected results
342215
![Expected results for personal identifiable information v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-pii-v-31.png)

0 commit comments

Comments
 (0)