Skip to content

adinapoli-mndc/osrank-rs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Buildkite

For now there is very little to see. This is a work-in-progress, unstable, pre-alpha implementation of osrank in Rust.

Sourcing the data

This project provides a bunch of binaries to source the data necessary to compute things like an adjacency matrix locally, bypassing the Jupyter notebook. In particular:

  • osrank-source-dependencies can be used to produce a CSV file in the same format of the one produced by the Jupyter notebook of all the projects and its dependencies for a given ecosystem, and can be parameterised by platform to generate multiple CSV files.

  • osrank-source-contributions can be used to produce a CSV file of a list of maintainers, alongside the projects they maintain and the number of contributions. It can be parameterised by platform to generate multiple CSV files.

  • osrank-adjacency-matrix can be used to calculate the adjancency matrix for a whole network using the formula of the basic model.

Before starting

For the sake of not committing bit objects into git, we do not store these .csv files into the git history (apart from rare exceptions), but they rather need to be generated from the binaries (or uploaded to a place like S3 for quicker retrieval). In order to do so, there are a bunch of preliminary operations a user must do:

  • Download the (fairly big) dataset from libraries.io which includes a bunch of interesting datasets we need to operate on;

  • Setup a Github authentication token if one desires to run osrank-source-contributions. You don't need to set any permission for this one (i.e. you don't need to check any checkbox in the menu, when creating one).

osrank-source-dependencies

It's warmly recommended to compile the binary in release mode by typing:

cargo build --release --features build-binary --bin osrank-source-dependencies

The --features build-binary is a compilation flag used to minimise the dependency footprint of the project, making sure certain libraries are compiled and downloaded only for these binaries, but not for library code.

Once the compilation finished, one can proceed running the script like so (for example):

./target/release/osrank-source-dependencies \
~/Downloads/libraries-1.4.0-2018-12-22/dependencies-1.4.0-2018-12-22.csv Cargo

This will produce a data/cargo_dependencies.csv and a data/cargo_dependencies_meta.csv csv files on the local filesystem.

osrank-source-contributions

Same process applies for this binary, with the exception that a valid Github API token needs to be supplied as a valid env-var. For example:

OSRANK_GITHUB_TOKEN=<VALID_TOKEN> \
./target/release/osrank-source-contributions \
~/Downloads/libraries-1.4.0-2018-12-22/projects_with_repository_fields-1.4.0-2018-12-22.csv Cargo

This script will take a while to run as it is throttled to ensure we do not hit Github's Quota Limit, as authenticated users are allowed to only perform 5000 requests per hour. At the end of the process, this will produce a data/cargo_contributions.csv file on disk.

osrank-adjacency-matrix

In order to compute the pagerank (the naive version) we rely on matrix inversion, which is provided by ndarray-linalg and BLAS. This means that the user is required to install gfortran on this system before running the executable. For example, running the tests via cargo watch is done by:

RUSTFLAGS='-L/usr/local/Cellar/gcc/8.3.0/lib/gcc/8' cargo watch -x \
'test --features build-binary --bin osrank-adjacency-matrix'

About

Nothing to see here yet - just a dumping ground to play with Rust and Osrank

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Rust 98.9%
  • Other 1.1%