Sourcing the data

For now there is very little to see. This is a work-in-progress, unstable, pre-alpha implementation of osrank in Rust.

Sourcing the data

This project provides a bunch of binaries to source the data necessary to compute things like an adjacency matrix locally, bypassing the Jupyter notebook. In particular:

osrank-source-dependencies can be used to produce a CSV file in the same format of the one produced by the Jupyter notebook of all the projects and its dependencies for a given ecosystem, and can be parameterised by platform to generate multiple CSV files.
osrank-source-contributions can be used to produce a CSV file of a list of maintainers, alongside the projects they maintain and the number of contributions. It can be parameterised by platform to generate multiple CSV files.
osrank-adjacency-matrix can be used to calculate the adjancency matrix for a whole network using the formula of the basic model.

Before starting

For the sake of not committing bit objects into git, we do not store these .csv files into the git history (apart from rare exceptions), but they rather need to be generated from the binaries (or uploaded to a place like S3 for quicker retrieval). In order to do so, there are a bunch of preliminary operations a user must do:

Download the (fairly big) dataset from libraries.io which includes a bunch of interesting datasets we need to operate on;
Setup a Github authentication token if one desires to run osrank-source-contributions. You don't need to set any permission for this one (i.e. you don't need to check any checkbox in the menu, when creating one).

osrank-source-dependencies

It's warmly recommended to compile the binary in release mode by typing:

cargo build --release --features build-binary --bin osrank-source-dependencies

The --features build-binary is a compilation flag used to minimise the dependency footprint of the project, making sure certain libraries are compiled and downloaded only for these binaries, but not for library code.

Once the compilation finished, one can proceed running the script like so (for example):

./target/release/osrank-source-dependencies \
~/Downloads/libraries-1.4.0-2018-12-22/dependencies-1.4.0-2018-12-22.csv Cargo

This will produce a data/cargo_dependencies.csv and a data/cargo_dependencies_meta.csv csv files on the local filesystem.

osrank-source-contributions

Same process applies for this binary, with the exception that a valid Github API token needs to be supplied as a valid env-var. For example:

OSRANK_GITHUB_TOKEN=<VALID_TOKEN> \
./target/release/osrank-source-contributions \
~/Downloads/libraries-1.4.0-2018-12-22/projects_with_repository_fields-1.4.0-2018-12-22.csv Cargo

This script will take a while to run as it is throttled to ensure we do not hit Github's Quota Limit, as authenticated users are allowed to only perform 5000 requests per hour. At the end of the process, this will produce a data/cargo_contributions.csv file on disk.

osrank-adjacency-matrix

In order to compute the pagerank (the naive version) we rely on matrix inversion, which is provided by ndarray-linalg and BLAS. This means that the user is required to install gfortran on this system before running the executable. For example, running the tests via cargo watch is done by:

RUSTFLAGS='-L/usr/local/Cellar/gcc/8.3.0/lib/gcc/8' cargo watch -x \
'test --features build-binary --bin osrank-adjacency-matrix'

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.buildkite		.buildkite
benches		benches
bin		bin
data		data
docker/build		docker/build
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sourcing the data

Before starting

osrank-source-dependencies

osrank-source-contributions

osrank-adjacency-matrix

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sourcing the data

Before starting

osrank-source-dependencies

osrank-source-contributions

osrank-adjacency-matrix

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages