libjohn
diff --git a/‎01_scrape_case-study_exercise.Rmd‎
Lines changed: 33 additions & 0 deletions b/‎01_scrape_case-study_exercise.Rmd‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎delme_DESCRIPTION_old‎
Lines changed: 0 additions & 28 deletions b/‎delme_DESCRIPTION_old‎
Lines changed: 0 additions & 28 deletions
@@ -458,10 +458,43 @@ children_name <- emanuel %>%
   html_text()
 children_name
 ```
+
 #### Iterate
 
 There now. I just scraped and parsed data for one target, one person in my list of target URLs.  Now use purrr to iterate over each target URL in the list. **Do not forget to pause, `Sys.sleep(2)`,** between each iteration of the `read_html()` function.
 
+
+## Refined Code
+
+O.K. so I didn't really explain iteration with the {purrr} package and the `map()` family of functions.  If you want to learn more about that, check out this [workshop on iteration](https://github.com/libjohn/workshop_rfun_iterate).  In the meantime, it might be important to point out that the `html_nodes()` function has been renamed as of {rvest} 1.0.0.  `html_nodes()` is now  `html_elements()`.  Anyway, if you want to see how this scraping operation can be done with less code, at least starting with the last manipulation of the `nav_df` tibble, here's some updated and refined code....
+
+```{r}
+get_name_html <- function(url) {
+  Sys.sleep(2)
+  url |> 
+    read_html()
+}
+
+name_urls_df <- nav_df |>
+  slice(1:3) |> 
+  mutate(html_results = map(url, get_name_html)) |> 
+  mutate(name_url = map(html_results, ~ .x |> html_elements("#setwidth li a") |> html_attr("href"))) |> 
+  mutate(name = map(html_results, ~ .x |> html_elements("#setwidth li a") |> html_text())) |> 
+  unnest(cols = c(name_url, name)) |> 
+  mutate(name_url = str_replace(name_url, "\\.\\.", "ecartico")) |> 
+  mutate(name_full_url = str_glue("http://www.vondel.humanities.uva.nl/{name_url}"))
+name_urls_df
+```
+
+```{r}
+name_urls_df |> 
+  filter(str_detect(name, regex("Boudewijn ", ignore_case = TRUE))) |> 
+  mutate(children_names_html = map(name_full_url, get_name_html)) |> 
+  mutate(children_names = map(children_names_html, ~ .x |> html_elements("ul~ h2+ ul li > a") |> html_text())) |> 
+  unnest(children_names)
+```
+
+
 ## Resources
 
 - https://rvest.tidyverse.org