Skip to content

Commit 2cae851

Browse files
authored
v0.5 Release.
Merge pull request #20 from dev branch. Separate append and summarize workflows into 2 scripts.
2 parents d16bbe0 + 2580745 commit 2cae851

16 files changed

+992
-512
lines changed

CHANGELOG.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,22 @@
55

66
### Unreleased <To become the release notes for the next version>
77
[//]: # (tagging git releases https://stackoverflow.com/questions/18216991/create-a-tag-in-a-github-repository)
8-
* Create a full summary statistics Excel file from scratch.
8+
* *No future enhancements are planned. Successful and consistent execution of `v0.5` will yield the release of `v1.0`.*
9+
10+
## 0.5
11+
**5 Jan 2024**
12+
* Separates out the [append](append.py) and [summarize](summarize.py) workflows into two separate scripts
13+
that produce two separate output files in the output directory.
14+
This distinguishes the distinct workflows and nullifies multi-tab output summary XLSX files.
15+
* Adds a `full_append_and_summary_run()` function in the [summarize](summarize.py) script
16+
to create a full summary statistics Excel file from scratch.
17+
* Enhance and fix bugs in the [2024 paper data collection process](late2022_datapulls.py).
18+
* Upgrade the [virtual environment creation process](py_venv).
919

1020
## 0.4
1121
**8 Nov 2023**
1222
* Add ability to mathematically summarize pulled data.
1323
* Adds a `dma_id_name_converter()` function to convert DMA ID ~ Name data.
14-
* Move `metadata.xlsx` out of this repository and into the Research Team sharepoint.
1524
* Sort testing and extra files into the `./extras` subdirectory.
1625

1726
## 0.3

README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Google Trends of Child Care
22

33
[//]: # (Embedding badges: https://naereen.github.io/badges/)
4-
![Generic badge](https://img.shields.io/badge/version-0.0.4-blue.svg)
4+
![Generic badge](https://img.shields.io/badge/version-0.5.0-blue.svg)
55
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
66

77
This is a repository that utilizes the [unofficial python API for Google Trends](https://github.com/GeneralMills/pytrends)
@@ -41,17 +41,25 @@ This file allows for easy and custom pulls of Google Trends data for different t
4141
* **`extract_data_try`** - Extract any Google Trends data you want: any time, any place.
4242
Returns a pandas DataFrame.
4343
* Note: You MUST provide a payload into this function.
44+
4445
### [`store_data.py`](store_data.py)
4546
This file allows for easy storage of already pulled Google Trends data.
4647
#### Key functions:
4748
* **`store_data`** - Store trends data in a particular location with specific location and time naming parameters.
4849
Ensure the output file name includes metadata on the time period of the data, date of data pull, and the dataset's name.
4950
Data can be stored as `.xlsx` or `.csv` (default).
51+
52+
### [`append.py`](append.py)
53+
This file produces a single dataset that contains all the previously pulled Google Trends data records per area and time of interest unit.
54+
#### Key functions:
55+
* **`append_raw_data_from_files`** - Appends a list of Google trends XLSX datasets into a compiled all-data XLSX
56+
in preparation for the calculation of summary statistics.
57+
* **`append_all_raw_files`** - Wrapper for `append_raw_data_from_files()` that appends all Google trends XLSX datasets
58+
in a passed directory into a compiled all-data XLSX in preparation for the calculation of summary statistics.
59+
5060
### [`summarize.py`](summarize.py)
5161
This file, using `trend_calculations.py` formulas, produces summary files that calculate statistics from previously pulled Google Trends data.
5262
#### Key functions:
53-
* **`append_raw_files_from_list`** - Appends a list of trends XLSX datasets into the summary XLSX
54-
in preparation for the calculation of summary statistics.
5563
* **`calc_sumstats`** - Calculates the following statistics for the already-stored xlsx GTrends data:
5664
* Average Interest Score (GTIS)
5765
* Std Dev
@@ -63,13 +71,15 @@ in preparation for the calculation of summary statistics.
6371
* Rebased GTIS
6472
* **`summarize_collected_data`** - Wrapper for `calc_sumstats()` that summarizes appended data
6573
and writes the results to the summary XLSX.
74+
* **`summarize_collected_data`** - Wrapper for `summarize_collected_data()` that summarizes all already appended data
75+
in a passed directory and writes the results to the summary XLSX.
6676

6777
### [`late2022_datapulls.py`](late2022_datapulls.py)
6878
This file pulls specific pieces of Google Trends ECE data for 23 AOIs in preparation for a publication on this method.
6979
#### Key functions:
70-
* **`append_raw_files_from_list`** - Appends a list of trends XLSX datasets into the summary XLSX
71-
in preparation for the calculation of summary statistics.
80+
* **`full_gtrends_pull`** - Runs only the data pull (without summarization) for the 23 specified datasets.
7281
* **`full_run_gtrends`** - Runs the data pull and summarization for the 23 specified datasets.
82+
7383
#### Other helping files:
7484
* [`datapulls22.bat`](datapulls22.bat) - Windows Batch file wrapper for executing [`late2022_datapulls.py`](late2022_datapulls.py).
7585
* [`schedule_gtrends_daily_run.bat`](schedule_gtrends_daily_run.bat) - Schedules a daily execution of [`datapulls22.bat`](datapulls22.bat) for automatic data pulls on Windos OS.

0 commit comments

Comments
 (0)