|
19 | 19 | },
|
20 | 20 | {
|
21 | 21 | "cell_type": "markdown",
|
22 |
| - "metadata": {}, |
| 22 | + "metadata": { |
| 23 | + "editable": true, |
| 24 | + "slideshow": { |
| 25 | + "slide_type": "" |
| 26 | + }, |
| 27 | + "tags": [] |
| 28 | + }, |
23 | 29 | "source": [
|
24 | 30 | "## Step 1: Project set up\n",
|
25 | 31 | "\n",
|
26 |
| - "* Create a new folder on your local machine in which your new data analysis project will be stored\n", |
| 32 | + "* Create a new folder on your local machine in which your new data analysis project will be stored.\n", |
27 | 33 | "* Create a virtual environment for the project so that we can install packages into it. Note that you will need to use a shell, or terminal window for this.\n",
|
28 |
| - "* Install Pandas into your virtual environment (if not already present), again using the shell\n", |
29 |
| - "* Create a new notebook for your analysis (feel free to copy this one)\n", |
30 |
| - "* Import Pandas to ensure you have installed it correctly" |
| 34 | + "* Install Pandas into your virtual environment (if not already present), again using the shell.\n", |
| 35 | + "* Create a new notebook for your analysis (feel free to copy this one).\n", |
| 36 | + "* Import Pandas to ensure you have installed it correctly." |
31 | 37 | ]
|
32 | 38 | },
|
33 | 39 | {
|
|
57 | 63 | "source": [
|
58 | 64 | "## Step 2: Get and load a data set\n",
|
59 | 65 | "\n",
|
60 |
| - "* Import Pandas into your analysis notebook\n", |
61 |
| - "* Find a .CSV file online, and save it into your project folder. Ideally, save this in a new folder called `data` within your project folder to stay organised. You are welcome to use the hills data set above if desired.\n", |
62 |
| - "* Load the data into a DataFrame using Pandas" |
| 66 | + "* Import Pandas into your analysis notebook.\n", |
| 67 | + "* Find a .CSV file online and save it into your project folder. Ideally, save this in a new folder called 'data' within your project folder to stay organised. You are welcome to use the hills data set above if desired.\n", |
| 68 | + "* Load the data into a DataFrame using Pandas." |
63 | 69 | ]
|
64 | 70 | },
|
65 | 71 | {
|
|
271 | 277 | },
|
272 | 278 | {
|
273 | 279 | "cell_type": "markdown",
|
274 |
| - "metadata": {}, |
| 280 | + "metadata": { |
| 281 | + "editable": true, |
| 282 | + "slideshow": { |
| 283 | + "slide_type": "" |
| 284 | + }, |
| 285 | + "tags": [] |
| 286 | + }, |
275 | 287 | "source": [
|
276 |
| - "* This is strange: why are there 7 countries in the `Country` column? Lets investigate this later." |
| 288 | + "* This is strange: why are there 7 countries in the `Country` column? Let's investigate this later." |
277 | 289 | ]
|
278 | 290 | },
|
279 | 291 | {
|
|
395 | 407 | },
|
396 | 408 | {
|
397 | 409 | "cell_type": "markdown",
|
398 |
| - "metadata": {}, |
| 410 | + "metadata": { |
| 411 | + "editable": true, |
| 412 | + "slideshow": { |
| 413 | + "slide_type": "" |
| 414 | + }, |
| 415 | + "tags": [] |
| 416 | + }, |
399 | 417 | "source": [
|
400 |
| - "* Lets answer another question:\n", |
| 418 | + "* Let's answer another question:\n", |
401 | 419 | "\n",
|
402 | 420 | "* `\"Which country has the highest mean hill height?\"`"
|
403 | 421 | ]
|
|
466 | 484 | },
|
467 | 485 | {
|
468 | 486 | "cell_type": "markdown",
|
469 |
| - "metadata": {}, |
| 487 | + "metadata": { |
| 488 | + "editable": true, |
| 489 | + "slideshow": { |
| 490 | + "slide_type": "" |
| 491 | + }, |
| 492 | + "tags": [] |
| 493 | + }, |
470 | 494 | "source": [
|
471 |
| - "* Most of the difficulty is knowing which Pandas functions to use! With small data sets aspects like computational speed do not matter as much as with large data sets. For larger data sets, using Pandas built in methods as much as possible is an easy way to ensure your code runs quickly." |
| 495 | + "* Most of the difficulty is knowing which Pandas functions to use! With small data sets aspects, like computational speed do not matter as much as with large data sets. For larger data sets, using Pandas built in methods as much as possible is an easy way to ensure your code runs quickly." |
472 | 496 | ]
|
473 | 497 | },
|
474 | 498 | {
|
475 | 499 | "cell_type": "markdown",
|
476 |
| - "metadata": {}, |
| 500 | + "metadata": { |
| 501 | + "editable": true, |
| 502 | + "slideshow": { |
| 503 | + "slide_type": "" |
| 504 | + }, |
| 505 | + "tags": [] |
| 506 | + }, |
477 | 507 | "source": [
|
478 |
| - "## Step 6: Plotting with Pandas matplotlib\n", |
| 508 | + "## Step 6: Plotting with Pandas Matplotlib\n", |
479 | 509 | "\n",
|
480 |
| - "* Use the built in Pandas plotting functions to plot an aspect of your data\n", |
481 |
| - "* Pandas plotting functions are based on the matplotlib API. These are mostly a 1:1 mapping, but there are some differences.\n", |
482 |
| - "* In general, Pandas plotting functions can be useful for quickly creating plots. For detailed customisation, plotting with matplotlib directly might save time.\n", |
| 510 | + "* Use the built in Pandas plotting functions to plot an aspect of your data.\n", |
| 511 | + "* Pandas plotting functions are based on the Matplotlib API. These are mostly a 1:1 mapping, but there are some differences.\n", |
| 512 | + "* In general, Pandas plotting functions can be useful for quickly creating plots. For detailed customisation, plotting with Matplotlib directly might save time.\n", |
483 | 513 | "\n",
|
484 | 514 | "Tips:\n",
|
485 |
| - "* First install matplotlib in your virtual environment: Pandas needs access to this.\n", |
486 |
| - "* In the hills example, lets plot the number of hills in our data set.\n" |
| 515 | + "* First install Matplotlib in your virtual environment: Pandas needs access to this.\n", |
| 516 | + "* In the hills example, let's plot the number of hills in our data set.\n" |
487 | 517 | ]
|
488 | 518 | },
|
489 | 519 | {
|
|
511 | 541 | "tags": []
|
512 | 542 | },
|
513 | 543 | "source": [
|
514 |
| - "* In the hills example, lets plot the number of hills above or equal to a threshold height" |
| 544 | + "* In the hills example, let's plot the number of hills above or equal to a threshold height." |
515 | 545 | ]
|
516 | 546 | },
|
517 | 547 | {
|
|
531 | 561 | },
|
532 | 562 | {
|
533 | 563 | "cell_type": "markdown",
|
534 |
| - "metadata": {}, |
| 564 | + "metadata": { |
| 565 | + "editable": true, |
| 566 | + "slideshow": { |
| 567 | + "slide_type": "" |
| 568 | + }, |
| 569 | + "tags": [] |
| 570 | + }, |
535 | 571 | "source": [
|
536 |
| - "* In the hills example, we have some lat, lon data. Lets plot this using a scatter plot." |
| 572 | + "* In the hills example, we have some lat, lon data. Let's plot this using a scatter plot." |
537 | 573 | ]
|
538 | 574 | },
|
539 | 575 | {
|
|
553 | 589 | },
|
554 | 590 | {
|
555 | 591 | "cell_type": "markdown",
|
556 |
| - "metadata": {}, |
| 592 | + "metadata": { |
| 593 | + "editable": true, |
| 594 | + "slideshow": { |
| 595 | + "slide_type": "" |
| 596 | + }, |
| 597 | + "tags": [] |
| 598 | + }, |
557 | 599 | "source": [
|
558 |
| - "* In the hills example, lets colour the points by country." |
| 600 | + "* In the hills example, let's colour the points by country." |
559 | 601 | ]
|
560 | 602 | },
|
561 | 603 | {
|
|
575 | 617 | },
|
576 | 618 | {
|
577 | 619 | "cell_type": "markdown",
|
578 |
| - "metadata": {}, |
| 620 | + "metadata": { |
| 621 | + "editable": true, |
| 622 | + "slideshow": { |
| 623 | + "slide_type": "" |
| 624 | + }, |
| 625 | + "tags": [] |
| 626 | + }, |
579 | 627 | "source": [
|
580 |
| - "* To add a legend, things are getting sufficiently complicated that moving to a more verbose matplotlib plotting structure is helpful.\n", |
| 628 | + "* To add a legend, things are getting sufficiently complicated that moving to a more verbose Matplotlib plotting structure is helpful.\n", |
581 | 629 | "* In the hills example, we will plot in a loop."
|
582 | 630 | ]
|
583 | 631 | },
|
|
0 commit comments