Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions _sources/DataEthics/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,14 @@ merging data from multiple sources. They should ask questions such as:
- Will the ethics of the project stay the same after the share and/or merge?
- Does the share and/or merger reveal information that could be harmful to the privacy of people involved in the data?

Learning Goals
--------------
Learning Goals and Objectives
-----------------------------
Learning goals:

- Learn about ethical practices when using data.

Learning Objectives
-------------------
Learning objectives:

- Learn to identify data that are free to use.
- Learn to check if screen scraping is legal on different websites.
- Understand the ethical wrongs of misrepresenting data.
Expand Down
9 changes: 5 additions & 4 deletions _sources/Instacart/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,16 @@ to purchase. We will also explore tools such as recommender systems, which are a
that predict consumer’s preferences. We will learn how to use Python libraries to find common purchasing
combinations and consumer’s purchasing preferences in a large data set.

Learning Goals
---------------
Learning Goals and Objectives
-----------------------------

Learning goals:

- Look for common patterns in a large data set
- Analyze data and determine if it is sparse
- Visualize associations between different items in a large data set

Learning Objectives
---------------------
Learning objectives:

- Find relationships between a particular group of items in an extensive data set using the Market basket analysis technique
- Construct, analyze, and retrieve information from an item-item matrix
Expand Down
2 changes: 1 addition & 1 deletion _sources/Instacart/market_basket.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1200,7 +1200,7 @@ used in industry.
From 0 to 200 (on the x-axis) there is one bar that goes to 5,000 and from 200 to
400 the bar goes up to less than 200.

Experimenting with Item-Item Recommendations
Experimenting With Item-Item Recommendations
--------------------------------------------

- The histogram above shows that the vast majority of the items are in the
Expand Down
28 changes: 21 additions & 7 deletions _sources/Introduction/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,31 @@ It will explore the history and current state of the discipline, explaining
how data science began and where it will be going in the future. We will also
explore how data science leverages data analysis and data visualization.

Learning Goals
--------------
Learning Goals and Objectives
-----------------------------

Goals and objectives refer to the desired outcomes that an entity, person or business,
wants to achieve. However, there is a significant difference between them.

A **goal** is a broad, often non-measurable, and intangible statement focusing on the desired
outcome. A goal seeks to achieve the larger, conceptual mission of the individual or the
business, and it does not define methods for attaining that intended outcome. Generally,
goals span over a long time frame.

An **objective**, on the other hand, is a specific, actionable, and tangible target that can be
achieved within a shorter specified time frame. While goals target the larger mission,
objectives are involved in achieving the goals.

Learning goals:

- Understand the importance of data collection and its implementation.
- Gain awareness of how broad data collection is in all subjects.
- Gain an introduction to what a Data Scientist does.
- Understand and know the purpose of all the steps in the **Data Science Pipeline**.
- Understand what it takes to gain and better skills: **Learning Zone** vs **Performance Zone**.
- Learn the difference between Data Science and Data Analytics.

Learning Objectives
---------------------
Learning objectives:

- Be able to identify different steps along different data science pipelines, recognizing all the previous and future steps.
- Be able to identify if you are in a performance zone or learning zone and transition between them as necessary.
Expand All @@ -48,7 +62,7 @@ you: what you buy, what you read, where you eat, where you stay, how and when
you travel, and so much more. By 2025, it is estimated that 463 exabytes of data will be created each day globally, and the entire digital universe is expected to reach 44 zettabytes by 2020. `This would mean there would be 40 times more bytes than there are stars in the observable universe. <https://www.visualcapitalist.com/how-much-data-is-generated-each-day/>`_


What does it all mean?
What Does It All Mean?
----------------------

Often, this data is collected and stored with little idea about how to use it,
Expand Down Expand Up @@ -79,7 +93,7 @@ efficient and comfortable footwear
`understanding this data <https://www.tekscan.com/product-group/medical/in-shoe>`_.


What does a data scientist do?
What Does a Data Scientist Do?
------------------------------

.. youtube:: 0tuEEnL61HM
Expand Down Expand Up @@ -284,7 +298,7 @@ ability to join these solutions together to solve increasingly challenging
problems with real-world applications.


Datasets in this Book
Datasets in This Book
---------------------

Every chapter in this book uses data. The data that we use is real world data
Expand Down
4 changes: 2 additions & 2 deletions _sources/MovieData/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
http://creativecommons.org/licenses/by-sa/4.0/.


Numbers as Indices
Numbers As Indices
==================

Enough about movie budgets, it's time to budget my time instead. Because I
Expand Down Expand Up @@ -167,4 +167,4 @@ But what is the 155th shortest movie in this collection?
:option_2: Within reach if I try my hardest
:option_3: Out of reach no matter how hard I try

For me to master the things taught in this lesson feels...
For me to master the things taught in this lesson feels...
11 changes: 7 additions & 4 deletions _sources/MovieData/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,16 @@ and Excel are great ways to visualize and manipulate data, they are not as versa
as we might want. That is where **data frames** come in. In this chapter, you will be introduced
to data frames and learn to use them to obtain information from data.

Learning Goals
---------------
Learning Goals and Objectives
-----------------------------

Learning goals:

- Learn how to create and manipulate a ``DataFrame``.
- Learn how to use data from multiple ``DataFrames``.

Learning Objectives
--------------------
Learning objectives:

- Be able to use Jupyter Notebooks and Pandas.
- Be able to import data into a ``DataFrame``.
- Be able to manipulate ``DataFrames`` to gain specific information.
Expand Down
4 changes: 2 additions & 2 deletions _sources/MovieData/multiple_df.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
http://creativecommons.org/licenses/by-sa/4.0/.


Dealing with Multiple DataFrames
Dealing With Multiple DataFrames
================================

Forget about budget or runtimes as criteria for selecting a movie, let's take a
Expand Down Expand Up @@ -76,7 +76,7 @@ difference between the ``vote_average`` and ``my_vote`` and divide it by

.. fillintheblank:: mov_star_wars_difference

What's the percentage difference between the popular rating for Star Wars and my vote
What's the percentage difference between the popular rating for Star Wars and my vote
for it? |blank|

- :-10: Is the correct answer
Expand Down
2 changes: 1 addition & 1 deletion _sources/MovieData/toctree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
http://creativecommons.org/licenses/by-sa/4.0/.


Learning Pandas with Movie Data
Learning Pandas With Movie Data
===============================

.. toctree::
Expand Down
4 changes: 2 additions & 2 deletions _sources/PredictiveAnalytics/bike_data_starter.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Getting Started with the Bike Data
Getting Started With the Bike Data
==================================

In this Lesson, we will be hands on and try out SQL with the Capital
Expand All @@ -7,7 +7,7 @@ bike sharing dataset, hosted on a SQLLite database. You don't have to do anythin



Verify access to the dataset
Verify Access to The Dataset
----------------------------

Let’s verify that you have access to the dataset by running a simple SQL
Expand Down
11 changes: 7 additions & 4 deletions _sources/PredictiveAnalytics/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,17 @@ common ways of storing data is in a database. In this chapter, we will use SQLli
send queries to different databases, and import data from those databases into a pandas
DataFrame. Then we will use the data to model different situations and predict outcomes.

Learning Goals
--------------
Learning Goals and Objectives
-----------------------------

Learning goals:

- Manipulate data from a database using Structured Query Language.
- Use linear regression to model the relationship between predicted data and actual data.
- Create models that we can use to predict outcomes.

Learning Objectives
-------------------
Learning objectives:

- Import a SQL database into a Pandas DataFrame.
- Retrieve, sort, and aggregate data from a database.
- Join and extract data from multiple databases.
Expand Down
2 changes: 1 addition & 1 deletion _sources/PredictiveAnalytics/introduction_to_SQL.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/.

Exploring Bike Rental Data with SQL
Exploring Bike Rental Data With SQL
===================================

.. figure:: https://imgs.xkcd.com/comics/exploits_of_a_mom.png
Expand Down
8 changes: 4 additions & 4 deletions _sources/PredictiveAnalytics/predicting_rentals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ Compare your graph to this one after you have made it.
:modaltitle: Predicted Versus Actual Daily Rentals V1

.. image:: Figures/regression_compare_1.png
:alt: Linear Regression model with ride_count as the y axis and daynum as the x axis.
:alt: Linear Regression model with ride_count as the y axis and daynum as the x axis.


What do you think of the model so far? You are probably a bit disappointed, both
Expand Down Expand Up @@ -237,7 +237,7 @@ look at the time series of daily rentals.


.. figure:: Figures/year_one_ts.png
:alt: Line graph of bike rentals with duration (0 to 6,000) as the y axis and start_date (by months of first year) as the x axis.
:alt: Line graph of bike rentals with duration (0 to 6,000) as the y axis and start_date (by months of first year) as the x axis.


The representation of the date we chose is simple, but you know from
Expand Down Expand Up @@ -357,7 +357,7 @@ matches this one.
.. reveal:: modelv25_viz

.. image:: Figures/modelv25_compare.png
:alt: Scatter plot with y-axis set as actual (shown in blue) and preds (shown in red), and x-axis as the number of days.
:alt: Scatter plot with y-axis set as actual (shown in blue) and preds (shown in red), and x-axis as the number of days.


Version 3.0
Expand Down Expand Up @@ -509,7 +509,7 @@ One really common method for transforming the data is to use min-max scaling.
This will ensure that all of your values are between 0 and 1.


Where to go from here?
Where To Go From Here?
----------------------

In the introduction to this textbook, we showed you this diagram. Take a look at
Expand Down
4 changes: 2 additions & 2 deletions _sources/PredictiveAnalytics/time_series_bikes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ of the week with just the business days.
d+b


Indexing with a DatetimeIndex
Indexing With a DatetimeIndex
-----------------------------

Using a timestamp as an index gives you some additional power. For example, you
Expand Down Expand Up @@ -260,7 +260,7 @@ common models to start with for making predictions is "linear regression". But
first, let's take a break for some pizza.


Working with ZIP Files (Optional)
Working With ZIP Files (Optional)
---------------------------------

In many cases, large data files are available in compressed format. Usually, this
Expand Down
11 changes: 7 additions & 4 deletions _sources/PythonReview/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@ languages in the data science field. You will also set up the tools you will be
using throughout the entire course including a type of scientific notebook that
allows for a mix of text and code.

Learning Goals
--------------
Learning Goals and Objectives
-----------------------------

Learning goals:

- Review the fundamental constructs of programming in Python.
- Learn to use a type of programming notebook that mixes text and code.

Learning Objectives
-------------------
Learning objectives:

- Recall the fundamentals of programming in Python.
- Learn to use the Markdown language.
- Learn to set up a Jupyter Notebook or a Google Colaboratory Notebook.
10 changes: 6 additions & 4 deletions _sources/Solver/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,15 @@ We will use a tool for linear programming called Solver in Google Sheets. Solver
can also be used with Microsoft Excel and many other spreadsheet programs.


Learning Goals
--------------
Learning Goals and Objectives
-----------------------------

Learning goals:

- Explore the ideas and techniques of **optimization**.
- Learn how to optimize an **objective function** under specific **constraints**.

Learning Objectives
-------------------
Learning objectives:

- Be able to recognize an **objective function** and any **constraints** in a specific problem.
- Learn to apply optimization concepts to maximize or minimize an **objective function** or to set it to a specified value.
Expand Down
4 changes: 2 additions & 2 deletions _sources/Statistics/cs1_more_happiness.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ mean by dividing our two columns.
- :Taiwan.*: Is the correct answer
:x: Keep checking


Joining Data from Other Sources
S
Joining Data From Other Sources
-------------------------------

So far, we have limited our analysis to the data provided for us in the original
Expand Down
2 changes: 1 addition & 1 deletion _sources/Statistics/cs1_yearly_happiness.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

.. _CSHappinessComparingYears:

Case Study 1: Comparing Happiness Data across Years
Case Study 1: Comparing Happiness Data Across Years
===================================================

We have two files of happiness data, one for 2017 which you have been using, and
Expand Down
10 changes: 6 additions & 4 deletions _sources/Statistics/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,19 @@ So, if you are familiar with Microsoft Excel, you will find Google Sheets very e
Lastly, we will discuss the difference between **correlation** and **causation** and explore why correlation does not imply causation. Understanding
this concept is crucially important for making correct assumptions and decisions when analyzing data.

Learning Goals
--------------
Learning Goals and Objectives
-----------------------------
Learning goals:

- Explore the concepts of descriptive statistics and data visualization.
- Distinguish between descriptive and inferential statistics.
- Learn to apply the various measures of central tendency and the measures of variability.
- Become familiar with several descriptive statistics and data visualization spreadsheet operations.
- Addressing cells: relative versus absolute, on the same sheet versus across sheets.
- Use a spreadsheet to explore data.

Learning Objectives
-------------------
Learning objectives:

- Become adept at importing, organizing, and visualizing data using Google Sheets.
- Extrapolate measures of central tendencies and measures of variability of a given data set.
- Combine different datasets and use them to extract new data.
Expand Down
4 changes: 2 additions & 2 deletions _sources/TextualAnalysis/graph_relations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ code.
:x: This is the same as before, but with more years


Visualizing the Relationships with a Heatmap
Visualizing the Relationships With a Heatmap
--------------------------------------------

We will now look at a way to get a better visual representation of the table we
Expand Down Expand Up @@ -499,7 +499,7 @@ very satisfying part of programming and data analysis! You have to enjoy your
victories while you can.


Visualizing the Relationships with a Graph
Visualizing the Relationships With a Graph
------------------------------------------

The good news is that we have already done most of the hard work in the last
Expand Down
Loading