|
8 | 8 | ] |
9 | 9 | }, |
10 | 10 | { |
| 11 | + "attachments": {}, |
11 | 12 | "cell_type": "markdown", |
12 | 13 | "metadata": {}, |
13 | 14 | "source": [ |
14 | | - "Before running this code, make sure you have downloaded the data CSV file from https://www.kaggle.com/austinreese/craigslist-carstrucks-data.\n", |
| 15 | + "Before running this code, make sure you've downloaded the data CSV file from https://www.kaggle.com/austinreese/craigslist-carstrucks-data.\n", |
15 | 16 | "\n", |
16 | 17 | "You may have to create a Kaggle account to download the data.\n", |
17 | 18 | "\n", |
18 | | - "After downloading it, extract the zip file and make sure you have a file named `vehicles.csv` in the current directory" |
| 19 | + "After downloading it, extract the ZIP file and make sure you have a file named `vehicles.csv` in the current directory." |
19 | 20 | ] |
20 | 21 | }, |
21 | 22 | { |
|
271 | 272 | ] |
272 | 273 | }, |
273 | 274 | { |
| 275 | + "attachments": {}, |
274 | 276 | "cell_type": "markdown", |
275 | 277 | "metadata": {}, |
276 | 278 | "source": [ |
277 | | - "It's difficult to build a model that generalizes to a wide range of types of car. So, you'll consider the only the samples with more usual features:" |
| 279 | + "It's difficult to build a model that generalizes to a wide range of types of car. So, you'll consider only the samples with more usual features:" |
278 | 280 | ] |
279 | 281 | }, |
280 | 282 | { |
|
310 | 312 | ] |
311 | 313 | }, |
312 | 314 | { |
| 315 | + "attachments": {}, |
313 | 316 | "cell_type": "markdown", |
314 | 317 | "metadata": {}, |
315 | 318 | "source": [ |
316 | | - "Another problem is that the price is set to zero for some rows. Also, there are rows with very high prices, some of these due to input errors. Here, you'll just filter out these rows, considering just the ones in which 0<price<40000:" |
| 319 | + "Another problem is that the price is set to zero for some rows. Also, there are rows with very high prices, some of these due to input errors. Here, you'll just filter out these rows, considering just the ones in which 0 < price < 40000:" |
317 | 320 | ] |
318 | 321 | }, |
319 | 322 | { |
|
347 | 350 | ] |
348 | 351 | }, |
349 | 352 | { |
| 353 | + "attachments": {}, |
350 | 354 | "cell_type": "markdown", |
351 | 355 | "metadata": {}, |
352 | 356 | "source": [ |
353 | | - "The `odometer` columns also includes several rows with very high values. For this model, you'll consider only rows with `odometer`<100000" |
| 357 | + "The `odometer` column also includes several rows with very high values. For this model, you'll consider only rows with `odometer` < 100000" |
354 | 358 | ] |
355 | 359 | }, |
356 | 360 | { |
|
462 | 466 | ] |
463 | 467 | }, |
464 | 468 | { |
| 469 | + "attachments": {}, |
465 | 470 | "cell_type": "markdown", |
466 | 471 | "metadata": {}, |
467 | 472 | "source": [ |
468 | | - "As you can notice, `year` and `odometer` are set as `float64` columns, which is not ideal. The data type of these columns can be converted to `int` with the following:" |
| 473 | + "As you can notice, `year` and `odometer` are set as `float64` columns, which isn't ideal. The data type of these columns can be converted to `int` with the following:" |
469 | 474 | ] |
470 | 475 | }, |
471 | 476 | { |
|
479 | 484 | ] |
480 | 485 | }, |
481 | 486 | { |
| 487 | + "attachments": {}, |
482 | 488 | "cell_type": "markdown", |
483 | 489 | "metadata": {}, |
484 | 490 | "source": [ |
485 | | - "Finally, save the filtered dataset to a csv:" |
| 491 | + "Finally, save the filtered dataset to a CSV file:" |
486 | 492 | ] |
487 | 493 | }, |
488 | 494 | { |
|
0 commit comments