Skip to content
Claudia Beleites edited this page Mar 20, 2020 · 16 revisions

Background

Package hyperSpec provides "infrastructure" for working with spectroscopic data in R, e.g.

  • import functions for various proprietary file formats
  • plotting functions
  • functions that allow seamless or almost seamless use of hyperSpec objects with models as pls::plsr(), MASS::lda() etc.
  • arithmetic functions that allow typical preprocessing done with spectra such as intensity normalization.

Over the years, some parts of hyperSpec have seen a steady growth, in particular in the file import functions. Unfortunately, this has lead to hyperSpec having many dependencies as well as a large base of test data (i.e. spectra files in a wide variety of proprietary [often binary] file formats), making hyperSpec not as easy to maintain as it could and should be:

  • we had to switch to git-lfs for the test files as the git repo became too large - this causes a steep learning curve for potential contributors.
  • File import tests and related vignettes are built offline, and not checked on CRAN.
  • Having a large number of dependencies makes hyperSpec vulnerable:
    we recently had a situation where a test on CRAN failed and as we were fixing this and getting ready to submit an update to CRAN, a dependency for some file import function became orphaned. We could not submit a fix for the first issue as long as the second was not fixed. Checking with our users that the proposed change in the dependency would not break their code took a while, so hyperSpec was archived on CRAN (thrown out) for some week. It would have been easily possible to deal with each of the two issues separately within the time frame granted by CRAN - but the combination caused a "lock down".

The aim of this GSoC proposal is three-fold:

  1. making hyperSpec easier to maintain by outsourcing e.g. file import functionality into specialized small packages
  2. shielding hyperSpec against breaking due to changes in a dependency
  3. provide better integration with other relevant existing packages with "bridging" packages.

Related work

These are not packages one would consider to use instead of hyperSpec, we'd like to use them together with hyperSpec:

  • Packages providing preprocessing for spectra: baseline and EMSC
    Claudia has contact to their creator/maintainer Kristian.

  • ggplot2 and tidyverse: hyperSpec has rudimentary qplot() functionality, and we recently started a hyperSpec.tidyverse package to fortify hyperSpec for use with dplyr and magrittr functions.

  • File import: readJDX(maintained by Bryan)

  • As for further file import functionality, e.g.

There are a few packages that one may use instead of hyperSpec, but they are less extensive and instead specialized on particular applications or particular types of spectroscopy. Bryan maintains a long list of FOSS packages for spectroscopy.

Details of your coding project

What exactly do you want your student to code in the 3-month deadline? What functions? What do they do? Docs? Tests? Vignettes?

Expected impact

This project should produce several small packages which provide two enhancements to the community:

  • The small packages are easier to maintain than one big hyperSpec with lots of dependencies: they "shield" hyperSpec from dependencies.
  • Enhanced functionality for packages that "bridge" hyperSpec with other packages such as baseline or EMSC.

Mentors

MENTORS: fill in this part. each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor). Please provide contact info for each mentor, along with qualifications.

IMPORTANT: you MUST write "EVALUATING" for one mentor, who will be required to do the three evaluations of the student during the summer. In previous years we have had issues with mentors who do not fill in evaluations, and when this happens R project is penalized (money is taken away), although students are not penalized (students are passed by default if no mentor eval is submitted). Therefore one mentor must take responsibility for doing the evaluations, and you must indicate that here, and your student must indicate that as well in the application. If it is not clear which mentor will be the EVALUATING mentor then your project will not be accepted. Example:

Students, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Toby Hocking [email protected] is the author of R packages X and Y.
  • Other Dev [email protected] is an expert in machine learning, and has previous GSOC experience with NAME_OF_OPEN_SOURCE_ORGANIZATION in 2015-2016.

Tests

easy

medium

  • fork hyperSpec from github, use lintr to standardize code formatting, and submit a pull request with the improved code.

hard

  • hyperSpec.tidyverse has some issues marked as good first issue. Note in the issue thread that you'll tackle this, fork the repo and write code, documentation, unit test and a brief explanation how to use this in the vignette and submit a pull request.

  • fork hyperSpec from github, use covr to find some function that does not yet have unit tests and write a unit test for one of these functions. Both hyperSpec and hyperSpec.tidyverse have their unittests in the .R files after the respective function definition, using a custom function test<- to attach them to the function in question. Unit tests for file import functions count as very hard, see below - but for other functions the testing can be done without the need to set up make etc.

very hard

  • set up a github repo and a package skeleton and copy a hyperspec qplot function
  • set up a github repo and a package skeleton and copy file import code, test files and if available also unit test code for one particular file format into the new package. Write one (additional) unit test.

Students, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
  • Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
  • Hard: Can the student write a package with Rd files, tests, and vignettes? If your package interfaces with non-R code, can the student write in that other language?

Solutions of tests

Students, please post a link to your test results here.

  • EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Clone this wiki locally