Skip to content

spinebil: Package to provide diagnostics for projection pursuit

Dianne Cook edited this page Mar 5, 2025 · 7 revisions

Background

This project will enhance the spinebil package to equip it with new methods for diagnosing projection pursuit (PP) indexes.

Related work

The package ferrn provides diagnostics for the projection pursuit guided tour, which is available in the tourr package. The paper by Laa et al describes methods currently available in the spinebil package. This paper also has numerous references to the projection pursuit literature.

Details of your coding project

The project involves

  • preparing the package for acceptance on CRAN
  • adding routines that can assess PP index behaviour, that include
    • the practical scale observed,
    • the change in function as the projection goes from pure noise to pure structure
    • the effect of sample size on index expected value, and selected quantiles
  • test these routines on existing indexes
  • provide examples of usage of the new functionality
  • develop revised scagnostic indexes that have better behaviour
  • document code
  • write a vignette with example usage

Expected impact

The availability of this package will enable better development and testing of new projection pursuit indexes. Projection pursuit is widely used to reduce dimension of high-dimensional data sets, to capture structure and associations that cannot be seen from principal component analysis. It is a linear dimension reduction method, so that it doesn't suffer from hallucinations occurring from non-linear dimension reduction methods like t-SNE and UMAP.

Mentors

Contributors, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Di Cook [email protected] is the author of numerous R packages including tourr, nullabor, GGally, and has had extensive GSOC experience since 2012.
  • Co-mentor: Jess Leung [email protected] is an in optimisation.
  • Co-mentor: Ursula Laa [email protected] is the current maintainer of the spinebil package, and has two years of GSOC experience.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential contributors can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the contributors write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the contributors that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: Download the occurrence data of platypus in Australia from the Atlas of Living Australia, and make a map showing the spatial locations of sightings for 2024.
  • Medium: Use the GSODR to download one year of daily weather data including temperature and precipitation for a station in Victoria, Australia near where large numbers of platypus are spotted.
  • Hard: Taking the names of tourism locations described in this variable Stopover state/region/SA2 from the data-raw/domestic_trips_2023-10-08.csv write code to geocode the locations with latitude and longitude.

Solutions of tests

Contributors, please post a link to your test results here.

  • EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Clone this wiki locally