Skip to content

Commit 2e431b6

Browse files
rebuild
1 parent 8074e61 commit 2e431b6

11 files changed

+822
-796
lines changed

docs/classification-continued.html

Lines changed: 138 additions & 138 deletions
Large diffs are not rendered by default.

docs/classification.html

Lines changed: 127 additions & 127 deletions
Large diffs are not rendered by default.

docs/clustering.html

Lines changed: 49 additions & 49 deletions
Large diffs are not rendered by default.

docs/index.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -437,10 +437,10 @@ <h2><span class="header-section-number">1.1</span> Chapter learning objectives</
437437
<li>use a Jupyter notebook to execute provided R code</li>
438438
<li>edit code and markdown cells in a Jupyter notebook</li>
439439
<li>create new code and markdown cells in a Jupyter notebook</li>
440-
<li>load the <code>tidyverse</code> library into R</li>
440+
<li>load the <code>tidyverse</code> package into R</li>
441441
<li>create new variables and objects in R using the assignment symbol</li>
442442
<li>use the help and documentation tools in R</li>
443-
<li>match the names of the following functions from the <code>tidyverse</code> library to their documentation descriptions:
443+
<li>match the names of the following functions from the <code>tidyverse</code> package to their documentation descriptions:
444444
<ul>
445445
<li><code>read_csv</code></li>
446446
<li><code>select</code></li>
@@ -502,8 +502,8 @@ <h2><span class="header-section-number">1.3</span> Loading a spreadsheet-like da
502502
<li>does not have row names.</li>
503503
</ul>
504504
<p>Below you’ll see the code used to load the data into R using the <code>read_csv</code> function. But there is one extra step we need to do first. Since <code>read_csv</code> is not included in the base installation of R,
505-
to be able to use it we have to load it from somewhere else: a collection of useful functions known as a <em>library</em>. The <code>read_csv</code> function in particular
506-
is in the <code>tidyverse</code> library (more on this later), which we load using the <code>library</code> function.</p>
505+
to be able to use it we have to load it from somewhere else: a collection of useful functions known as a <em>package</em>. The <code>read_csv</code> function in particular
506+
is in the <code>tidyverse</code> package (more on this later), which we load using the <code>library</code> function.</p>
507507
<p>Next, we call the <code>read_csv</code> function and pass it a single argument: the name of the file, <code>"can_lang.csv"</code>. We have to put quotes around filenames and other letters and words that we
508508
use in our code to distinguish it from the special words that make up R programming language. This is the only argument we need to provide for this file, because our file satifies everthing else
509509
the <code>read_csv</code> function expects in the default use-case (which we just discussed). Later in the course, we’ll learn more about how to deal with more complicated files where the default arguments are not

docs/inference.html

Lines changed: 111 additions & 111 deletions
Large diffs are not rendered by default.

docs/reading.html

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -397,11 +397,11 @@ <h2><span class="header-section-number">2.2</span> Chapter learning objectives</
397397
<li><code>skip</code></li>
398398
</ul></li>
399399
<li><p>choose the appropriate <code>tidyverse</code> <code>read_*</code> function and function arguments to load a given plain text tabular data set into R</p></li>
400-
<li><p>use <code>readxl</code> library’s <code>read_excel</code> function and arguments to load a sheet from an excel file into R</p></li>
401-
<li><p>connect to a database using the <code>DBI</code> library’s <code>dbConnect</code> function</p></li>
402-
<li><p>list the tables in a database using the <code>DBI</code> library’s <code>dbListTables</code> function</p></li>
403-
<li><p>create a reference to a database table that is queriable using the <code>tbl</code> from the <code>dbplyr</code> library</p></li>
404-
<li><p>retrieve data from a database query and bring it into R using the <code>collect</code> function from the <code>dbplyr</code> library</p></li>
400+
<li><p>use <code>readxl</code> package’s <code>read_excel</code> function and arguments to load a sheet from an excel file into R</p></li>
401+
<li><p>connect to a database using the <code>DBI</code> package’s <code>dbConnect</code> function</p></li>
402+
<li><p>list the tables in a database using the <code>DBI</code> package’s <code>dbListTables</code> function</p></li>
403+
<li><p>create a reference to a database table that is queriable using the <code>tbl</code> from the <code>dbplyr</code> package</p></li>
404+
<li><p>retrieve data from a database query and bring it into R using the <code>collect</code> function from the <code>dbplyr</code> package</p></li>
405405
<li><p>use <code>write_csv</code> to save a data frame to a <code>.csv</code> file</p></li>
406406
<li><p>(<em>optional</em>) scrape data from the web</p>
407407
<ul>
@@ -471,19 +471,19 @@ <h2><span class="header-section-number">2.4</span> Reading tabular data from a p
471471
Non-Official &amp; Non-Aboriginal languages,American Sign Language,2685,3020,1145,21930
472472
Non-Official &amp; Non-Aboriginal languages,Amharic,22465,12785,200,33670</code></pre>
473473
<p>And here is a review of how we can use <code>read_csv</code> to load it into R. First we
474-
load the <code>tidyverse</code> library to gain access to useful functions for reading the
474+
load the <code>tidyverse</code> package to gain access to useful functions for reading the
475475
data.</p>
476476
<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="reading.html#cb34-1"></a><span class="kw">library</span>(tidyverse)</span></code></pre></div>
477477
<blockquote>
478478
<p>Note: it is normal and expected that a message is printed out after
479-
loading the <code>tidyverse</code> and some libraries. Generally, this message let’s you
480-
know if functions from the different libraries were loaded share the same name
479+
loading the <code>tidyverse</code> and some packages. Generally, this message let’s you
480+
know if functions from the different packages were loaded share the same name
481481
(which is confusing to R), and if so, which one you can access using just it’s
482-
name (and which one you need to refer the library name and the function name to
482+
name (and which one you need to refer the package name and the function name to
483483
refer to it, this is called masking). Additionally, the <code>tidyverse</code> is a special
484-
R library - it is a meta-library or meta-package that bundles together several
485-
related and commonly used packages. Because of this it lists the libraries it
486-
does the job of loading. In future when we load this library in this book we
484+
R package - it is a meta-package that bundles together several
485+
related and commonly used packages. Because of this it lists the packages it
486+
does the job of loading. In future when we load this package in this book we
487487
will silence these messages to help with readability of the book.</p>
488488
</blockquote>
489489
<p>Next we use <code>read_csv</code> to load the data into R, and in that call we specify the
@@ -769,7 +769,7 @@ <h4><span class="header-section-number">2.6.1.1</span> Reading data from a SQLit
769769
<p>Although it looks like we just got a data frame from the database, we didn’t! It’s a <em>reference</em>, showing us data that is still in the SQLite database (note the first two lines of the output).
770770
It does this because databases are often more efficient at selecting, filtering and joining large data sets than R. And typically, the database will not even be
771771
stored on your computer, but rather a more powerful machine somewhere on the web. So R is lazy and waits to bring this data into memory until you explicitly tell
772-
it to do so using the <code>collect</code> function from the <code>dbplyr</code> library.</p>
772+
it to do so using the <code>collect</code> function from the <code>dbplyr</code> package.</p>
773773
<p>Here we will filter for only rows in the Aboriginal languages category according to the 2016 Canada Census, and then use <code>collect</code> to finally bring this data into R as a data frame.</p>
774774
<div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb59-1"><a href="reading.html#cb59-1"></a>aboriginal_lang_db &lt;-<span class="st"> </span><span class="kw">filter</span>(lang_db, category <span class="op">==</span><span class="st"> &quot;Aboriginal languages&quot;</span>)</span>
775775
<span id="cb59-2"><a href="reading.html#cb59-2"></a>aboriginal_lang_db</span></code></pre></div>
@@ -831,7 +831,7 @@ <h4><span class="header-section-number">2.6.1.2</span> Reading data from a Postg
831831
<li><code>user</code> - the username for accessing the database</li>
832832
<li><code>password</code> - the password for accessing the database</li>
833833
</ul>
834-
<p>Additionally, we must use the <code>RPostgres</code> library instead of <code>RSQLite</code> in the <code>dbConnect</code> function call.
834+
<p>Additionally, we must use the <code>RPostgres</code> package instead of <code>RSQLite</code> in the <code>dbConnect</code> function call.
835835
Below we demonstrate how to connect to a version of the <code>can_mov_db</code> database, which contains information about Canadian movies (<em>note - this is a synthetic, or artificial, database</em>).</p>
836836
<pre><code>library(RPostgres)
837837
can_mov_db_con &lt;- dbConnect(RPostgres::Postgres(), dbname = &quot;can_mov_db&quot;,
@@ -905,7 +905,7 @@ <h3><span class="header-section-number">2.6.2</span> Interacting with a database
905905
<div id="writing-data-from-r-to-a-.csv-file" class="section level2">
906906
<h2><span class="header-section-number">2.7</span> Writing data from R to a <code>.csv</code> file</h2>
907907
<p>At the middle and end of a data analysis, we often want to write a data frame that has changed (either through filtering, selecting, mutating or summarizing) to a file
908-
to share it with others or use it for another step in the analysis. The most straightforward way to do this is to use the <code>write_csv</code> function from the <code>tidyverse</code> library.
908+
to share it with others or use it for another step in the analysis. The most straightforward way to do this is to use the <code>write_csv</code> function from the <code>tidyverse</code> package.
909909
The default arguments for this file are to use a comma (<code>,</code>) as the delimiter and include column names. Below we demonstrate creating a new version of the Canadian languages data set without the official languages category according to the Canadian 2016 Census, and then writing this to a <code>.csv</code> file:</p>
910910
<pre><code>no_official_lang_data &lt;- filter(can_lang, category != &quot;Official languages&quot;)
911911
write_csv(no_official_lang_data, &quot;data/no_official_languages.csv&quot;)</code></pre>
@@ -991,6 +991,7 @@ <h3><span class="header-section-number">2.8.3</span> Using <code>rvest</code></h
991991
<p>Next, we tell R what page we want to scrape by providing the webpage’s URL in quotations to the function <code>read_html</code>:</p>
992992
<div class="sourceCode" id="cb83"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb83-1"><a href="reading.html#cb83-1"></a>page &lt;-<span class="st"> </span><span class="kw">read_html</span>(<span class="st">&quot;https://en.wikipedia.org/wiki/Canada&quot;</span>)</span></code></pre></div>
993993
<p>Then we send the page object to the <code>html_nodes</code> function. We also provide that function with the CSS selectors we obtained from the selectorgadget tool. These should be surrounded by quotations. The <code>html_nodes</code> function select nodes from the HTML document using CSS selectors. Nodes are the HTML tag pairs as well as the content between the tags. For our CSS selector <code>td:nth-child(5)</code> and example node that would be selected would be: <code>&lt;td style="text-align:left;background:#f0f0f0;"&gt;&lt;a href="/wiki/London,_Ontario" title="London, Ontario"&gt;London&lt;/a&gt;&lt;/td&gt;</code></p>
994+
<p>We will use <code>head()</code> here to limit the print output of these vectors to 6 lines.</p>
994995
<div class="sourceCode" id="cb84"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb84-1"><a href="reading.html#cb84-1"></a>population_nodes &lt;-<span class="st"> </span><span class="kw">html_nodes</span>(page, <span class="st">&quot;td:nth-child(5) , td:nth-child(7) , .infobox:nth-child(122) td:nth-child(1) , .infobox td:nth-child(3)&quot;</span>)</span>
995996
<span id="cb84-2"><a href="reading.html#cb84-2"></a><span class="kw">head</span>(population_nodes)</span></code></pre></div>
996997
<pre><code>## {xml_nodeset (6)}

0 commit comments

Comments
 (0)