Skip to content

Commit a848c21

Browse files
committed
Refactor to be object-oriented; df.ids instead of ids(df).
- Refactor Scala implementation. - Refactor Python implementation. - Remove legacy items from pom.xml - Update documentation. - Update README - Resolves #2
1 parent 1d9751d commit a848c21

File tree

8 files changed

+542
-518
lines changed

8 files changed

+542
-518
lines changed

README.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
# Tweet Archives Unleashed Toolkit (twut)
22

3-
[![codecov](https://codecov.io/gh/archivesunleashed/twut/branch/main/graph/badge.svg)](https://codecov.io/gh/archivesunleashed/twut)
43
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut/badge.svg)](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut)
54
[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0)
65
[![Contribution Guidelines](http://img.shields.io/badge/CONTRIBUTING-Guidelines-blue.svg)](./CONTRIBUTING.md)
76

8-
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
7+
An open-source toolkit for analyzing line-oriented JSON data from the Twitter v1.1 API using Apache Spark.
98

109
## Dependencies
1110

@@ -15,41 +14,41 @@ An open-source toolkit for analyzing line-oriented JSON Twitter archives with Ap
1514

1615
## Getting Started
1716

18-
### Packages
17+
To get started with `twut`, you can either use it directly from Maven or download the JAR and ZIP files for Spark or PySpark.
1918

20-
#### Spark Shell
19+
### Using the Spark Shell
20+
21+
To use `twut` with Apache Spark, you can use the following command to include the package:
2122

2223
```
23-
$ spark-shell --packages "io.archivesunleashed:twut:0.0.4"
24+
$ spark-shell --packages "io.archivesunleashed:twut:1.0.0"
2425
```
2526

26-
### Jars
27-
28-
You can download the [latest release files here](https://github.com/archivesunleashed/twut/releases) and include it like so:
29-
30-
#### Spark Shell
27+
Alternatively, you can download the JAR file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it manually:
3128

3229
```
33-
$ spark-shell --jars /path/to/twut-0.0.4-fatjar.jar
30+
$ spark-shell --jars /path/to/twut-1.0.0-fatjar.jar
3431
```
3532

36-
#### PySpark
33+
### Using PySpark
34+
35+
For Python users, download the ZIP file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it in your PySpark environment:
3736

3837
```
39-
$ pyspark --py-files /path/to/twut-0.0.4.zip
38+
$ pyspark --py-files /path/to/twut-1.0.0.zip
4039
```
4140

42-
You will need the `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables set.
41+
You will also need to set the `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables.
4342

44-
## Documentation! Or, how do I use this?
43+
## Documentation and Tutorials
4544

46-
Once built or downloaded, you can follow the basic set of recipes and tutorials [here](https://github.com/archivesunleashed/twut/tree/main/docs/usage.md).
45+
After you have `twut` built or downloaded, you can follow the basic set of recipes and tutorials [here](https://github.com/archivesunleashed/twut/tree/main/docs/usage.md).
4746

48-
# License
47+
## License
4948

5049
Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
5150

52-
# Acknowledgments
51+
## Acknowledgments
5352

5453
This work is primarily supported by the [Andrew W. Mellon Foundation](https://mellon.org/). Other financial and in-kind support comes from the [Social Sciences and Humanities Research Council](http://www.sshrc-crsh.gc.ca/), [Compute Canada](https://www.computecanada.ca/), the [Ontario Ministry of Research, Innovation, and Science](https://www.ontario.ca/page/ministry-research-innovation-and-science), [York University Libraries](https://www.library.yorku.ca/web/), [Start Smart Labs](http://www.startsmartlabs.com/), and the [Faculty of Arts](https://uwaterloo.ca/arts/) and [David R. Cheriton School of Computer Science](https://cs.uwaterloo.ca/) at the [University of Waterloo](https://uwaterloo.ca/).
5554

0 commit comments

Comments
 (0)