Skip to content

Commit 3df66aa

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr into ds-issue46663
2 parents 6613a93 + 143021f commit 3df66aa

File tree

11 files changed

+218
-32
lines changed

11 files changed

+218
-32
lines changed

articles/hdinsight/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,8 @@
372372
href: ./spark/apache-spark-run-machine-learning-automl.md
373373
- name: Troubleshoot
374374
items:
375+
- name: Can't create Jupyter notebook
376+
href: ./spark/troubleshoot-jupyter-notebook-convert.md
375377
- name: OutOfMemoryError exception
376378
href: ./spark/apache-spark-troubleshoot-outofmemory.md
377379
- name: Apache Spark job fails - NoClassDefFoundError
@@ -396,6 +398,8 @@
396398
href: ./spark/apache-spark-troubleshoot-event-log-requestbodytoolarge.md
397399
- name: Debug Apache Spark jobs
398400
href: ./spark/apache-spark-job-debugging.md
401+
- name: Debug WASB file operations
402+
href: ./spark/troubleshoot-debug-wasb.md
399403
- name: Use IntelliJ to debug Apache Spark job
400404
href: ./spark/apache-spark-intellij-tool-debug-remotely-through-ssh.md
401405
- name: Apache Spark troubleshooting

articles/hdinsight/hdinsight-hadoop-architecture.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ author: ashishthaps
55
ms.author: ashishth
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 10/28/2019
9+
ms.custom: hdinsightactive
10+
ms.date: 02/07/2020
1111
---
1212

1313
# Apache Hadoop architecture in HDInsight
@@ -42,6 +42,27 @@ All HDInsight cluster types deploy YARN. The ResourceManager is deployed for hig
4242

4343
![Apache YARN on Azure HDInsight](./media/hdinsight-hadoop-architecture/apache-yarn-on-hdinsight.png)
4444

45+
## Soft delete
46+
47+
To undelete a file from your Storage Account, see:
48+
49+
### Azure Storage
50+
51+
* [Soft delete for Azure Storage blobs](../storage/blobs/storage-blob-soft-delete.md)
52+
* [Undelete Blob](https://docs.microsoft.com/rest/api/storageservices/undelete-blob)
53+
54+
### Azure Data Lake Storage Gen 1
55+
56+
[Restore-AzDataLakeStoreDeletedItem](https://docs.microsoft.com/powershell/module/az.datalakestore/restore-azdatalakestoredeleteditem)
57+
58+
### Azure Data Lake Storage Gen 2
59+
60+
[Known issues with Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-known-issues.md)
61+
62+
## Trash purging
63+
64+
The `fs.trash.interval` property from **HDFS** > **Advanced core-site** should remain at the default value `0` because you shouldn't store any data on the local file system. This value doesn't affect remote storage accounts(WASB, ADLS GEN1, ABFS)
65+
4566
## Next steps
4667

4768
* [Use MapReduce in Apache Hadoop on HDInsight](hadoop/hdinsight-use-mapreduce.md)

articles/hdinsight/spark/apache-spark-shell.md

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 12/12/2019
10+
ms.date: 02/10/2020
1111
---
1212

1313
# Run Apache Spark from the Spark Shell
@@ -22,31 +22,76 @@ An interactive [Apache Spark](https://spark.apache.org/) Shell provides a REPL (
2222
2323
```
2424
25-
1. Spark provides shells for Scala (spark-shell), and Python (pyspark). In your SSH session, enter one of the following commands:
25+
1. Spark provides shells for Scala (spark-shell), and Python (pyspark). In your SSH session, enter *one* of the following commands:
2626
2727
```bash
2828
spark-shell
29+
30+
# Optional configurations
31+
# spark-shell --num-executors 4 --executor-memory 4g --executor-cores 2 --driver-memory 8g --driver-cores 4
32+
```
33+
34+
```bash
2935
pyspark
36+
37+
# Optional configurations
38+
# pyspark --num-executors 4 --executor-memory 4g --executor-cores 2 --driver-memory 8g --driver-cores 4
3039
```
3140
32-
Now you can enter Spark commands in the appropriate language.
41+
If you intend to use any optional configuration, ensure you first review [OutOfMemoryError exception for Apache Spark](./apache-spark-troubleshoot-outofmemory.md).
42+
43+
1. A few basic example commands. Choose the relevant language:
3344
34-
1. A few basic example commands:
45+
```spark-shell
46+
val textFile = spark.read.textFile("/example/data/fruits.txt")
47+
textFile.first()
48+
textFile.filter(line => line.contains("apple")).show()
49+
```
50+
51+
```pyspark
52+
textFile = spark.read.text("/example/data/fruits.txt")
53+
textFile.first()
54+
textFile.filter(textFile.value.contains("apple")).show()
55+
```
56+
57+
1. Query a CSV file. Note the language below works for `spark-shell` and `pyspark`.
3558
3659
```scala
37-
// Load data
60+
spark.read.csv("/HdiSamples/HdiSamples/SensorSampleData/building/building.csv").show()
61+
```
62+
63+
1. Query a CSV file and store results in variable:
64+
65+
```spark-shell
3866
var data = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/HdiSamples/HdiSamples/SensorSampleData/building/building.csv")
67+
```
3968
40-
// Show data
41-
data.show()
69+
```pyspark
70+
data = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/HdiSamples/HdiSamples/SensorSampleData/building/building.csv")
71+
```
4272
43-
// Select certain columns
73+
1. Display results:
74+
75+
```spark-shell
76+
data.show()
4477
data.select($"BuildingID", $"Country").show(10)
78+
```
4579
46-
// exit shell
80+
```pyspark
81+
data.show()
82+
data.select("BuildingID", "Country").show(10)
83+
```
84+
85+
1. Exit
86+
87+
```spark-shell
4788
:q
4889
```
4990
91+
```pyspark
92+
exit()
93+
```
94+
5095
## SparkSession and SparkContext instances
5196
5297
By default when you run the Spark Shell, instances of SparkSession and SparkContext are automatically instantiated for you.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: Debug WASB file operations in Azure HDInsight
3+
description: Describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: troubleshooting
9+
ms.date: 02/07/2020
10+
---
11+
12+
# Debug WASB file operations in Azure HDInsight
13+
14+
There are times when you may want to understand what operations the WASB driver started with Azure Storage. For the client side, the WASB driver produces logs for each file system operation at **DEBUG** level. WASB driver uses log4j to control logging level and the default is **INFO** level. For Azure Storage server-side analytics logs, see [Azure Storage analytics logging](../../storage/common/storage-analytics-logging.md).
15+
16+
A produced log will look similar to:
17+
18+
```log
19+
18/05/13 04:15:55 DEBUG NativeAzureFileSystem: Moving wasb://[email protected]/user/livy/ulysses.txt/_temporary/0/_temporary/attempt_20180513041552_0000_m_000000_0/part-00000 to wasb://[email protected]/user/livy/ulysses.txt/part-00000
20+
```
21+
22+
## Turn on WASB debug log for file operations
23+
24+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your Spark cluster.
25+
26+
1. Navigate to **Spark2** > **Configs** > **advanced spark2-log4j-properties**.
27+
28+
1. Modify `log4j.appender.console.Threshold=INFO` to `log4j.appender.console.Threshold=DEBUG`.
29+
30+
1. Navigate to **Advanced livy2-log4j-properties**.
31+
32+
1. Add the following property:
33+
34+
```
35+
log4j.logger.org.apache.hadoop.fs.azure.NativeAzureFileSystem=DEBUG
36+
```
37+
38+
1. Save changes.
39+
40+
## Additional logging
41+
42+
The above logs should provide high-level understanding of the file system operations. If the above logs are still not providing useful information, or if you want to investigate blob storage api calls, add `fs.azure.storage.client.logging=true` to the `core-site`. This setting will enable the java sdk logs for wasb storage driver and will print each call to blob storage server. Remove the setting after investigations because it could fill up the disk quickly and could slow down the process.
43+
44+
If the backend is Azure Data Lake based, then use the following log4j setting for the component(for example, spark/tez/hdfs):
45+
46+
```
47+
log4j.logger.com.microsoft.azure.datalake.store=ALL,adlsFile
48+
log4j.additivity.com.microsoft.azure.datalake.store=true
49+
log4j.appender.adlsFile=org.apache.log4j.FileAppender
50+
log4j.appender.adlsFile.File=/var/log/adl/adl.log
51+
log4j.appender.adlsFile.layout=org.apache.log4j.PatternLayout
52+
log4j.appender.adlsFile.layout.ConversionPattern=%p\t%d{ISO8601}\t%r\t%c\t[%t]\t%m%n
53+
```
54+
55+
Look for the logs in `/var/log/adl/adl.log` for the logs.
56+
57+
## Next steps
58+
59+
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
60+
61+
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
62+
63+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
64+
65+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Unable to create Jupyter notebook in Azure HDInsight
3+
description: Describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: troubleshooting
9+
ms.date: 02/11/2020
10+
---
11+
12+
# Unable to create Jupyter notebook in Azure HDInsight
13+
14+
This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
15+
16+
## Issue
17+
18+
When starting a Jupyter notebook, you receive an error message that contains:
19+
20+
```error
21+
Cannot convert notebook to v5 because that version doesn't exist
22+
```
23+
24+
## Cause
25+
26+
A version mismatch.
27+
28+
## Resolution
29+
30+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
31+
32+
```cmd
33+
34+
```
35+
36+
1. Open `_version.py` by executing the following command:
37+
38+
```bash
39+
sudo nano /usr/bin/anaconda/lib/python2.7/site-packages/nbformat/_version.py
40+
```
41+
42+
1. Change **5** to **4** so the modified line appears as follows:
43+
44+
```python
45+
version_info = (4, 0, 3)
46+
```
47+
48+
Save changes by entering **Ctrl + X**, **Y**, **Enter**.
49+
50+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
51+
52+
1. Select **Jupyter** and then restart the service.
53+
54+
## Next steps
55+
56+
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
57+
58+
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
59+
60+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
61+
62+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).

articles/virtual-machine-scale-sets/use-spot.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ tags: azure-resource-manager
66
ms.service: virtual-machine-scale-sets
77
ms.workload: infrastructure-services
88
ms.topic: conceptual
9-
ms.date: 10/23/2019
9+
ms.date: 02/11/2020
1010
ms.author: cynthn
1111
---
1212

@@ -21,7 +21,6 @@ The amount of available capacity can vary based on size, region, time of day, an
2121
> This preview version is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
2222
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2323
>
24-
> For the early part of the public preview, Spot instances will have a fixed price, so there will not be any price-based evictions.
2524
2625
## Pricing
2726

articles/virtual-machines/linux/spot-cli.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.workload: infrastructure-services
1313
ms.tgt_pltfrm: na
1414
ms.devlang: na
1515
ms.topic: article
16-
ms.date: 11/20/2019
16+
ms.date: 02/11/2020
1717
ms.author: cynthn
1818
---
1919

@@ -32,7 +32,7 @@ The process to create a VM with Spot using the Azure CLI is the same as detailed
3232
> This preview version is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
3333
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
3434
>
35-
> For the early part of the public preview, Spot instances will have a fixed price, so there will not be any price-based evictions.
35+
3636

3737

3838
## Install Azure CLI

articles/virtual-machines/linux/spot-template.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.workload: infrastructure-services
1313
ms.tgt_pltfrm: na
1414
ms.devlang: na
1515
ms.topic: article
16-
ms.date: 10/14/2019
16+
ms.date: 02/11/2020
1717
ms.author: cynthn
1818
---
1919

@@ -30,7 +30,7 @@ You have option to set a max price you are willing to pay, per hour, for the VM.
3030
> This preview version is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
3131
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
3232
>
33-
> For the early part of the public preview, Spot instances will have a fixed price, so there will not be any price-based evictions.
33+
3434

3535

3636
## Use a template
@@ -46,9 +46,6 @@ For Spot template deployments, use`"apiVersion": "2019-03-01"` or later. Add the
4646
```
4747

4848

49-
> [!IMPORTANT]
50-
> For the early part of the public preview, you can set a max price, but it will be ignored. Spot VMs will have a fixed price, so there will not be any price-based evictions.
51-
5249

5350
Here is a sample template with the added properties for a Spot VM. Replace the resource names with your own and `<password>` with a password for the local administrator account on the VM.
5451

articles/virtual-machines/windows/spot-portal.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: gwallace
88
ms.service: virtual-machines-windows
99
ms.workload: infrastructure-services
1010
ms.topic: article
11-
ms.date: 11/20/2019
11+
ms.date: 02/11/2020
1212
ms.author: cynthn
1313
---
1414

@@ -25,7 +25,7 @@ You have option to set a max price you are willing to pay, per hour, for the VM.
2525
> This preview version is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
2626
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2727
>
28-
> For the early part of the public preview, Spot instances will have a fixed price, so there will not be any price-based evictions.
28+
2929

3030
## Create the VM
3131

articles/virtual-machines/windows/spot-powershell.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: gwallace
88
ms.service: virtual-machines-windows
99
ms.workload: infrastructure-services
1010
ms.topic: article
11-
ms.date: 10/14/2019
11+
ms.date: 02/11/2020
1212
ms.author: cynthn
1313
---
1414

@@ -26,7 +26,7 @@ You have option to set a max price you are willing to pay, per hour, for the VM.
2626
> This preview version is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
2727
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2828
>
29-
> For the early part of the public preview, Spot instances will have a fixed price, so there will not be any price-based evictions.
29+
3030

3131

3232
## Create the VM
@@ -35,9 +35,6 @@ Create a spotVM using [New-AzVmConfig](/powershell/module/az.compute/new-azvmcon
3535
- `-1` so the VM is not evicted based on price.
3636
- a dollar amount, up to 5 digits. For example, `-MaxPrice .98765` means that the VM will be deallocated once the price for a spotVM goes about $.98765 per hour.
3737

38-
> [!IMPORTANT]
39-
> For the early part of the public preview, you can set a max price, but it will be ignored. Spot VMs will have a fixed price, so there will not be any price-based evictions.
40-
4138

4239
This example creates a spotVM that will not be deallocated based on pricing (only when Azure needs the capacity back).
4340

0 commit comments

Comments
 (0)