diff --git a/1-Introduction/01-defining-data-science/notebook.ipynb b/1-Introduction/01-defining-data-science/notebook.ipynb index cf3988e85..c7740cb8a 100644 --- a/1-Introduction/01-defining-data-science/notebook.ipynb +++ b/1-Introduction/01-defining-data-science/notebook.ipynb @@ -2,55 +2,50 @@ "cells": [ { "cell_type": "markdown", + "metadata": {}, "source": [ - "# Challenge: Analyzing Text about Data Science\r\n", - "\r\n", - "In this example, let's do a simple exercise that covers all steps of a traditional data science process. You do not have to write any code, you can just click on the cells below to execute them and observe the result. As a challenge, you are encouraged to try this code out with different data. \r\n", - "\r\n", - "## Goal\r\n", - "\r\n", - "In this lesson, we have been discussing different concepts related to Data Science. Let's try to discover more related concepts by doing some **text mining**. We will start with a text about Data Science, extract keywords from it, and then try to visualize the result.\r\n", - "\r\n", + "# Challenge: Analyzing Text about Data Science\n", + "\n", + "In this example, let's do a simple exercise that covers all steps of a traditional data science process. You do not have to write any code, you can just click on the cells below to execute them and observe the result. As a challenge, you are encouraged to try this code out with different data. \n", + "\n", + "## Goal\n", + "\n", + "In this lesson, we have been discussing different concepts related to Data Science. Let's try to discover more related concepts by doing some **text mining**. We will start with a text about Data Science, extract keywords from it, and then try to visualize the result.\n", + "\n", "As a text, I will use the page on Data Science from Wikipedia:" - ], - "metadata": {} + ] }, { "cell_type": "markdown", - "source": [], - "metadata": {} + "metadata": {}, + "source": [] }, { "cell_type": "code", "execution_count": 62, + "metadata": {}, + "outputs": [], "source": [ "url = 'https://en.wikipedia.org/wiki/Data_science'" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "## Step 1: Getting the Data\r\n", - "\r\n", + "## Step 1: Getting the Data\n", + "\n", "First step in every data science process is getting the data. We will use `requests` library to do that:" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": 63, - "source": [ - "import requests\r\n", - "\r\n", - "text = requests.get(url).content.decode('utf-8')\r\n", - "print(text[:1000])" - ], + "metadata": {}, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "\n", "\n", @@ -61,77 +56,79 @@ ] } ], - "metadata": {} + "source": [ + "import requests\n", + "\n", + "text = requests.get(url).content.decode('utf-8')\n", + "print(text[:1000])" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "## Step 2: Transforming the Data\r\n", - "\r\n", - "The next step is to convert the data into the form suitable for processing. In our case, we have downloaded HTML source code from the page, and we need to convert it into plain text.\r\n", - "\r\n", + "## Step 2: Transforming the Data\n", + "\n", + "The next step is to convert the data into the form suitable for processing. In our case, we have downloaded HTML source code from the page, and we need to convert it into plain text.\n", + "\n", "There are many ways this can be done. We will use the simplest built-in [HTMLParser](https://docs.python.org/3/library/html.parser.html) object from Python. We need to subclass the `HTMLParser` class and define the code that will collect all text inside HTML tags, except `