Skip to content

Commit b726d80

Browse files
committed
Merge branch 'main' into wikipedia
2 parents 43540e8 + 9aa5a8f commit b726d80

15 files changed

+4476
-2250
lines changed

Pipfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,19 @@ flickrapi = "*"
88
GitPython = "*"
99
google-api-python-client = "*"
1010
h11 = ">=0.16.0" # Ensure dependency is secure
11-
internetarchive = "*"
11+
internetarchive = ">=5.5.1"
1212
jupyterlab = ">=3.6.7"
1313
matplotlib = "*"
1414
numpy = "*"
1515
pandas = "*"
1616
plotly = "*"
17+
pillow = ">=11.3.0" # Ensure dependency is secure
1718
Pyarrow = "*"
1819
Pygments = "*"
1920
python-dotenv = "*"
2021
requests = ">=2.31.0"
2122
seaborn = "*"
22-
urllib3 = ">=1.26.18"
23+
urllib3 = ">=2.5.0"
2324
wordcloud = "*"
2425

2526
[dev-packages]

Pipfile.lock

Lines changed: 1563 additions & 1285 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,23 @@ This project seeks to quantify the size and diversity of the commons--the
99
collection of works that are openly licensed or in the public domain.
1010

1111

12+
### Meaningful
13+
14+
The reports generated by this project (and the data fetched and processed to
15+
support it) seeks to be meaningful. We hope this project will provide data and
16+
analysis that helps inform discussions about the commons--the collection of
17+
works that are openly licensed or in the public domain.
18+
19+
The goal of this project is to help answer questions like:
20+
- How has the world's use of the commons changed over time?
21+
- How is the knowledge and culture of the commons distributed?
22+
- Who has access (and how much) to the commons?
23+
- What significant trends can be observed in the commons?
24+
- Which public domain dedication or licenses are the most popular?
25+
- What are the correlations between public domain dedication or licenses and
26+
region, language, domain/endeavor, etc.?
27+
28+
1229
## Code of conduct
1330

1431
[`CODE_OF_CONDUCT.md`][org-coc]:
@@ -57,6 +74,7 @@ Quantifying/
5774
│ ├── 1-fetch/
5875
│ ├── 2-process/
5976
│ ├── 3-report/
77+
│ ├── plot.py # Data visualizations with matplotlib
6078
│ └── shared.py
6179
├── .cc-metadata.yml
6280
├── .flake8 # Python tool configuration
@@ -105,10 +123,9 @@ modules:
105123
[homebrew]: https://brew.sh/
106124

107125

108-
### Running scripts that require client credentials
126+
### Managing client credentials
109127

110-
To successfully run scripts that require client credentials, you will need to
111-
follow these steps:
128+
Client credentials should be stored in an environment file:
112129
1. Copy the contents of the `env.example` file in the script's directory to
113130
`.env`:
114131
```shell
@@ -121,8 +138,22 @@ follow these steps:
121138
GCS_CX = your_pse_id
122139
```
123140
3. Save the changes to the `.env` file.
124-
4. You should now be able to run scripts that require client credentials
125-
without any issues.
141+
142+
You should now be able to run scripts that require client credentials without
143+
any issues. The `.env` file is ignored by git to help ensure sensitive data is
144+
not distributed.
145+
146+
147+
### Running the scripts
148+
149+
All of the scripts should be run from the root of the repository using pipenv. For example:
150+
```bash
151+
pipenv run ./scripts/1-fetch/github_fetch.py -h
152+
```
153+
154+
When run this way, the shared library (`scripts/shared.py`) provides easy access
155+
to all of the necessary paths and all of the modules managed by pipenv are
156+
available.
126157
127158
128159
### Static analysis

0 commit comments

Comments
 (0)