R GSoC 2020 hyperSpec

Student: Erick Oduniyi; Project: hyperSpec

hyperSpec maintainers organized an entrance test for potential GSoC 2020 students. These are my notes and solutions for the R GSoC 2020 hyperSpec entrance test.

1. Easy

Task: Install hyperSpec, covr and lintr from CRAN and hyperSpec.tidyverse from github.

# Install hyperSpec, covr, and lintr packages from CRAN
install.packages("hyperSpec")
install.packages("lintr")
install.packages("covr")

# Install hyperSpec.tidyverse package from github
remotes::install_github("cbeleites/hyperSpec.tidyverse")

The downloaded binary packages are in
	/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T//RtmpltRTPs/downloaded_packages

also installing the dependencies ‘cyclocomp’, ‘xmlparsedata’

The downloaded binary packages are in
	/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T//RtmpltRTPs/downloaded_packages

The downloaded binary packages are in
	/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T//RtmpltRTPs/downloaded_packages

Downloading GitHub repo cbeleites/hyperSpec.tidyverse@master

[32m✔[39m  [38;5;247mchecking for file ‘/private/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T/RtmpltRTPs/remotes81a11e602b6a/cbeleites-hyperSpec.tidyverse-d2122a6/DESCRIPTION’[39m[36m[39m
[38;5;247m─[39m[38;5;247m  [39m[38;5;247mpreparing ‘hyperSpec.tidyverse’:[39m[36m[39m
[32m✔[39m  [38;5;247mchecking DESCRIPTION meta-information[39m[36m[39m
[38;5;247m─[39m[38;5;247m  [39m[38;5;247mchecking for LF line-endings in source and make files and shell scripts[39m[36m[39m
[38;5;247m─[39m[38;5;247m  [39m[38;5;247mchecking for empty or unneeded directories[39m[36m[39m
[38;5;247m─[39m[38;5;247m  [39m[38;5;247mbuilding ‘hyperSpec.tidyverse_0.1.0.tar.gz’[39m[36m[39m

2. Medium

Task: Fork hyperSpec from github, use lintr to standardize code formatting, and submit a pull request with the improved code.

2.1. The nature of linting

In general, linting is a way to automatically debug code (i.e., error checking and code standardization) without executing the program. Linters aren't intelligent, so it only tries to adhere to a grammar and thus can only check for potential errors. There are two categories of linting: code style linting and syntax linting. There is no right or wrong way to lint, so it's important to choose a linting style (i.e., style convention) that fits the project.

So, linting is super helpful because it helps programmers adhere to style conventions. In the context of working on open source projects, this can add a bit of confidence to a programmers contribution, which of course is related to unit testing and continuous integration.

2.2. Applying `lintr` to hyperSpec

The lintr package is a very nice way to checks adherence to a given style, syntax errors and possible semantic issues.

# Load lintr package
library(lintr)

# Go to the directory of hyperSpec source files
setwd("/path/to/directory/R")

# Check directory
list.files()

<ol class=list-inline><li>‘aggregate.R’</li><li>‘all.equal.R’</li><li>‘apply.R’</li><li>‘Arith.R’</li><li>‘as.data.frame.R’</li><li>‘barbiturates.R’</li><li>‘bind.R’</li><li>‘call.list.R’</li><li>‘chk.hy.R’</li><li>‘chondro.R’</li><li>‘collapse.R’</li><li>‘colMeans.R’</li><li>‘Compare.R’</li><li>‘count_lines.R’</li><li>‘cov.R’</li><li>‘decomposition.R’</li><li>‘deprecated.R’</li><li>‘dim.R’</li><li>‘dimnames.R’</li><li>‘DollarNames.R’</li><li>‘droplevels.R’</li><li>‘empty.R’</li><li>‘extract.R’</li><li>‘factor2num.R’</li><li>‘fileio.optional.R’</li><li>‘fix_spc_colnames.R’</li><li>‘flu.R’</li><li>‘getbynames.R’</li><li>‘guesswavelength.R’</li><li>‘hyperspec-class.R’</li><li>‘hyperspec-package.R’</li><li>‘initialize.R’</li><li>‘labels.R’</li><li>‘laser.R’</li><li>‘levelplot.R’</li><li>‘makeraster.R’</li><li>‘map.identify.R’</li><li>‘map.sel.poly.R’</li><li>‘mark.dendrogram.R’</li><li>‘mark.peak.R’</li><li>‘Math.R’</li><li>‘matlab.palette.R’</li><li>‘mean_sd.R’</li><li>‘merge.R’</li><li>‘mvtnorm.R’</li><li>‘normalize01.R’</li><li>‘options.R’</li><li>‘orderwl.R’</li><li>‘paracetamol.R’</li><li>‘paste.row.R’</li><li>‘pearson.dist.R’</li><li>‘plot.R’</li><li>‘plotc.R’</li><li>‘plotmap.R’</li><li>‘plotmat.R’</li><li>‘plotspc.R’</li><li>‘plotvoronoi.R’</li><li>‘qplot.R’</li><li>‘qplotmixmap.R’</li><li>‘quantile.R’</li><li>‘rbind.fill.R’</li><li>‘read.asc.Andor.R’</li><li>‘read.asc.PerkinElmer.R’</li><li>‘read.ENVI.HySpex.R’</li><li>‘read.ENVI.Nicolet.R’</li><li>‘read.ENVI.R’</li><li>‘read.ini.R’</li><li>‘read.jdx.R’</li><li>‘read.mat.Cytospec.R’</li><li>‘read.mat.Witec.R’</li><li>‘read.spc.Kaiser.R’</li><li>‘read.spc.R’</li><li>‘read.spc.Shimadzu.R’</li><li>‘read.spe.R’</li><li>‘read.txt.Horiba.R’</li><li>‘read.txt.long.R’</li><li>‘read.txt.Renishaw.R’</li><li>‘read.txt.Shimadzu.R’</li><li>‘read.txt.wide.R’</li><li>‘read.txt.Witec.R’</li><li>‘regexps.R’</li><li>‘replace.R’</li><li>‘sample.R’</li><li>‘scale.R’</li><li>‘seq.R’</li><li>‘show.R’</li><li>‘spc.bin.R’</li><li>‘spc.fit.poly.R’</li><li>‘spc.identify.R’</li><li>‘spc.loess.R’</li><li>‘spc.NA.approx.R’</li><li>‘spc.rubberband.R’</li><li>‘spc.spline.R’</li><li>‘split.R’</li><li>‘split.string.R’</li><li>‘splitdots.R’</li><li>‘subset.R’</li><li>‘Summary.R’</li><li>‘sweep.R’</li><li>‘trellis.factor.key.R’</li><li>‘units.R’</li><li>‘unittest.R’</li><li>‘validate.R’</li><li>‘vandermonde.R’</li><li>‘wc.R’</li><li>‘wl.R’</li><li>‘wl2i.R’</li><li>‘wleval.R’</li><li>‘write.txt.long.R’</li><li>‘write.txt.wide.R’</li><li>‘y-pastenames.R’</li><li>‘zzz.R’</li></ol>

# Try using the lintr package on a single file
lint("aggregate.R")

[1m/path/to/directory/aggregate.R:1:1: [22m[34mstyle: [39m[1mLines should not be more than 80 characters.[22m
.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1m/path/to/directory/aggregate.R:1:24: [22m[34mstyle: [39m[1mRemove spaces before the left parenthesis in a function call.[22m
.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."),
                       ^
[1m/path/to/directory/aggregate.R:1:38: [22m[34mstyle: [39m[1mRemove spaces before the left parenthesis in a function call.[22m
.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."),
                                     ^
[1m/path/to/directory/aggregate.R:1:56: [22m[34mstyle: [39m[1mVariable and function name style should be snake_case.[22m
.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."),
                                                       ^~~
[1m/path/to/directory/aggregate.R:1:67: [22m[34mstyle: [39m[1mRemove spaces before the left parenthesis in a function call.[22m
.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."),
                                                                  ^ To get a better sense of the types of style suggestions `lintr` is proposing:

# Collect lintr suggestions
lint.sugg <- lint("aggregate.R")

# Turn suggestions into a data frame
lint.df <- as.data.frame(lint.comments)
head(lint.df, 3)

A data.frame: 3 × 7
filename	line_number	column_number	type	message	line	linter
<chr>	<dbl>	<dbl>	<chr>	<chr>	<chr>	<chr>
/path/to/directory/aggregate.R	1	1	style	<span style=white-space:pre-wrap>Lines should not be more than 80 characters. </span>	<span style=white-space:pre-wrap>.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."), </span>	<span style=white-space:pre-wrap>line_length_linter </span>
/path/to/directory/aggregate.R	1	24	style	<span style=white-space:pre-wrap>Remove spaces before the left parenthesis in a function call. </span>	<span style=white-space:pre-wrap>.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."), </span>	function_left_parentheses_linter
/path/to/directory/aggregate.R	1	38	style	<span style=white-space:pre-wrap>Remove spaces before the left parenthesis in a function call. </span>	<span style=white-space:pre-wrap>.aggregate <- function (x, by = stop ("by is needed"), FUN = stop ("FUN is needed."), </span>	function_left_parentheses_linter

Additionally, lintr can be integrated with Atom’s Linter for “on the fly linting”:

atom-lintr-example

While lintr gives you comments and suggestions related to your code, itself does not make any changes. Hence, the user must decide whether the changes should be made or rejected. Though this is mostly a good thing, but if your code base contains more than 100 files that need to be linted, reviewing and manually implementing these suggestions can be cumbersome. Luckily, issue #96 - Consistent code style hints at using the styler package.

2.3. Applying `styler` to hyperSpec

The styler package helps R user systematically enfore style across single R files and even entire packages or directories.

# Install the styler package from CRAN
install.packages("styler")

also installing the dependencies ‘R.methodsS3’, ‘R.oo’, ‘R.utils’, ‘R.cache’, ‘rematch2’

The downloaded binary packages are in
	/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T//RtmpltRTPs/downloaded_packages

The styler package has a lot of customization capability, and so it’s possible to define your own style guide or style transformers that the package will enforce when styling code. For testing purposes, we’ll adhere to the tidyverse style guidelines:

# Load styler library
library(styler)

# Adhere do tidyverse style guide
tidyverse <- tidyverse_style(
  scope = "tokens",
  strict = TRUE,
  indent_by = 2,
  start_comments_with_one_space = FALSE,
  reindention = tidyverse_reindention(),
  math_token_spacing = tidyverse_math_token_spacing()
)

# Perform directory level styling
style_dir(getwd(), transformers = tidyverse)

Styling  112  files:
 aggregate.R            ℹ
 all.equal.R            ℹ
 apply.R                ℹ
 Arith.R                ℹ
 as.data.frame.R        ℹ
 barbiturates.R         ℹ
 bind.R                 ℹ
 call.list.R            ℹ
 chk.hy.R               ℹ
 chondro.R              ℹ
 collapse.R             ℹ
 colMeans.R             ℹ
 Compare.R              ℹ
 count_lines.R          ℹ
 cov.R                  ℹ
 decomposition.R        ℹ
 deprecated.R           ℹ
 dim.R                  ℹ
 dimnames.R             ℹ
 DollarNames.R          ℹ
 droplevels.R           ℹ
 empty.R                ℹ
 extract.R              ℹ
 factor2num.R           ℹ
 fileio.optional.R      ℹ
 fix_spc_colnames.R     ℹ
 flu.R                  ℹ
 getbynames.R           ℹ
 guesswavelength.R      ℹ
 hyperspec-class.R      ℹ
 hyperspec-package.R    ℹ
 initialize.R           ℹ
 labels.R               ℹ
 laser.R                ℹ
 levelplot.R            ℹ
 makeraster.R           ℹ
 map.identify.R         ℹ
 map.sel.poly.R         ℹ
 mark.dendrogram.R      ℹ
 mark.peak.R            ℹ
 Math.R                 ℹ
 matlab.palette.R       ℹ
 mean_sd.R              ℹ
 merge.R                ℹ
 mvtnorm.R              ℹ
 normalize01.R          ℹ
 options.R              ℹ
 orderwl.R              ℹ
 paracetamol.R          ℹ
 paste.row.R            ℹ
 pearson.dist.R         ℹ
 plot.R                 ℹ
 plotc.R                ℹ
 plotmap.R              ℹ
 plotmat.R              ℹ
 plotspc.R              ℹ
 plotvoronoi.R          ℹ
 qplot.R                ℹ
 qplotmixmap.R          ℹ
 quantile.R             ℹ
 rbind.fill.R           ℹ
 read.asc.Andor.R       ℹ
 read.asc.PerkinElmer.R ℹ
 read.ENVI.HySpex.R     ℹ
 read.ENVI.Nicolet.R    ℹ
 read.ENVI.R            ℹ
 read.ini.R             ℹ
 read.jdx.R             ℹ
 read.mat.Cytospec.R    ℹ
 read.mat.Witec.R       ℹ
 read.spc.Kaiser.R      ℹ
 read.spc.R             ℹ
 read.spc.Shimadzu.R    ℹ
 read.spe.R             ℹ
 read.txt.Horiba.R      ℹ
 read.txt.long.R        ℹ
 read.txt.Renishaw.R    ℹ
 read.txt.Shimadzu.R    ℹ
 read.txt.wide.R        ℹ
 read.txt.Witec.R       ℹ
 regexps.R              ✔
 replace.R              ℹ
 sample.R               ℹ
 scale.R                ℹ
 seq.R                  ℹ
 show.R                 ℹ
 spc.bin.R              ℹ
 spc.fit.poly.R         ℹ
 spc.identify.R         ℹ
 spc.loess.R            ℹ
 spc.NA.approx.R        ℹ
 spc.rubberband.R       ℹ
 spc.spline.R           ℹ
 split.R                ℹ
 split.string.R         ℹ
 splitdots.R            ℹ
 subset.R               ℹ
 Summary.R              ℹ
 sweep.R                ℹ
 trellis.factor.key.R   ℹ
 units.R                ℹ
 unittest.R             ℹ
 validate.R             ℹ
 vandermonde.R          ℹ
 wc.R                   ℹ
 wl.R                   ℹ
 wl2i.R                 ℹ
 wleval.R               ℹ
 write.txt.long.R       ℹ
 write.txt.wide.R       ℹ
 y-pastenames.R         ℹ
 zzz.R                  ℹ
────────────────────────────────────────
Status	Count	Legend
✔ 	1	File unchanged.
ℹ 	111	File changed.
✖ 	0	Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!

Neat! styler outputs the changes it made and encourages users to review changes carefully!

3. Hard

Task: Tackle one of the “good first issues” in the hyperSpec.tidyverse issue thread

3.1. Contributing to the hyperSpec.tidyverse project

The goal of the hyperSpec.tidyverse project is to create an R package that makes hyperSpec (hyperSpec objects) work smoothly with the tidyverse. Practically, this means making analog tidyverse/dplyr functions (e.g., select(), rename(), mutate(), transmute(), etc.) for hyperSpec objects. In the end, the hope is to support “wrangling of hyperSpec objects in a dplyr-like style”.

3.2. Picking a “good first issue”

I decided to tackle issue #5 - dplyr::rename() because it was the first “good first issue” and although I’m not super familiar with tidyverse/dplyr, rename() seems like a straightforward function to implement. github-issue

3.3. Creating a structure to work on hyperSpec.tidyverse

I like to use Atom to manage projects and quickly develop coding ideas and RStudio for concrete implementation. Of course, before I can start working on my contribution I need to do:

Fork hyperSpec.tidyverse repository
Clone hyperSpec.tidyverse repository locally (i.e., computer hardrive)
View project in Atom
Open project in RStudio

3.4. Software Description

The task is to create a fortified version of dplyr::rename() so that it works smoothly with hyperSpec objects.

use-case-rename

3.4.1. The nature of `rename()` and `select()`

The dplyr package is a grammar of data manipulation. This makes common data manipulation task intuitive. Unsurprisingly, R users often want to rename the columns of their data frames or even select subsets of their tables to do more fine-grained analysis. It turns outs, these two operations are pretty similar.

3.4.2. `dplyr::rename()`

# Load dplyr and test data package
library(dplyr)
library(hflights)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

# Create a data frame using test data
df <- data.frame(hflights)[1:3, 1:3]
df

A data.frame: 3 × 3
	Year	Month	DayofMonth
	<int>	<int>	<int>
5424	2011	1	1
5425	2011	1	2
5426	2011	1	3

# Rename columns of data frame
df %>% rename(YearNewColName = Year) # same as rename(df, YearNewColName = Year)

A data.frame: 3 × 3
	YearNewColName	Month	DayofMonth
	<int>	<int>	<int>
5424	2011	1	1
5425	2011	1	2
5426	2011	1	3

Observe df %>% rename(NewColName = OldColName, ...) only operates on the column names of the data frame and so while the column names of the data frame may be changed, the number of columns is preserved. So, if I pass in a $n \times\ m$ data frame to rename(), then the result will be an $n \times\ m$ data frame. dplyr::rename() is equivalent to:

names(df)[names(df) == "OldColName"] <- "NewColName"

3.4.3. `dplyr::select()`

# Select columns of data frame
df %>% select(Year) # same as select(df, year)

A data.frame: 3 × 1
	Year
	<int>
5424	2011
5425	2011
5426	2011

Observe df %>% select(ColName1, ColName2, ...) selects a subset of the columns as specified by the column names. Thus, select operates on the entire data frame. So, if I pass in a $n \times\ m$ data frame, then the result will be a data frame with $\#cols_{selected} \times \ m$ . dplyr::select() is equivalent to:

df[names(df) == "OldColName"]

However, you can use select() to also rename columns. That is, you can give new names to the selected subset: df %>% select(NewColName = OldColName, ...) is equivalent to:

df <- df[names(df) == "OldColName"]
names(df)[names(df) == "OldColName"] <- "NewColName"

# Rename column with select
df %>% select(YearNewColName = Year)

A data.frame: 3 × 1
	YearNewColName
	<int>
5424	2011
5425	2011
5426	2011

3.5. Function Design

3.5.1. The nature of `rename()` and `select()` cont.

While both rename() and select() operate on data frames (i.e., tables with column names) the difference between the two functions is that the former only changes column names of the data frame while the latter creates a subset of the data frame; in doing so, the names (column names) of these subsets can be changed as well.

3.5.2. `rename()` and `select()` for hyperSpec

What does this mean for hyperSpec objects? Well, hyperSpec objects are a S4 class that contain 4 slots:

@wavelength - numeric vector
@data - a data frame with the spectra matrix ($spc)
@labels - list of appropriate labels
@log - …

3.5.3. Nature of hyperSpec objects

In fact, the hyperSpec packages comes with five data sets (hyperSpec objects):

chondro - a Raman map of chondrocytes in cartilage
flu - a set of fluorescence spectra of a calibration series
laser - a time series of an unstable laser emission
paracetamol - a Raman spectrum of paracetamol (acetaminophene) ranging from 100 to 3200 cm−1 with overlapping wavelength ranges.
barbiturates - GC-MS spectra with differing wavelength axes as a list of 286 hyperSpec objects

Let’s take a look at the structure of these objects:

# Load hyperSpec, Data Explorer, networkD3, and cli packages
library(hyperSpec)
library(DataExplorer)
library(networkD3)

# Create a function for exploring the structure hyperSpec objects
explore.object <- function(obj){
    print(obj)
    diagonalNetwork(plot_str(obj, print_network = FALSE))
}

# Take a look at the five hyperSpec objects
print("chondro")
explore.object(chondro)

print("flu")
explore.object(flu)

print("laser")
explore.object(laser)

print("paracetamol")
explore.object(paracetamol)

# We'll avoid printing barbiturates as it contains 200+ hyperSpec objects

[1] "chondro"
hyperSpec object
   875 spectra
   5 data columns
   300 data points / spectrum
wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
data:  (875 rows x 5 columns)
   1. y: y [numeric] -4.77 -4.77 ... 19.23
   2. x: x [numeric] -11.55 -10.55 ... 22.45
   3. filename: filename [character] rawdata/chondro.txt rawdata/chondro.txt ... rawdata/chondro.txt
   4. clusters: clusters [factor] matrix matrix ... lacuna + NA
   5. spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942 ![](/hyperSpec.gsoc2020/img/chondro-graph.png)

[1] "flu"
hyperSpec object
    6 spectra
    3 data columns
    181 data points / spectrum
wavelength: lambda/nm [numeric] 405.0 405.5 ... 495
data:  (6 rows x 3 columns)
    1. spc: I[fl]/"a.u." [matrix181] 27.15000 66.80133 ... 294.6495
    2. filename: filename [character] rawdata/flu1.txt rawdata/flu2.txt ... rawdata/flu6.txt
    3. c: c / (mg / l) [numeric] 0.05 0.10 ... 0.3 ![](/hyperSpec.gsoc2020/img/flu-graph.png)

[1] "laser"
hyperSpec object
    84 spectra
    3 data columns
    36 data points / spectrum
wavelength: lambda/nm [numeric] 404.5828 404.6181 ... 405.8176
data:  (84 rows x 3 columns)
    1. t: t / s [numeric] 0 2 ... 5722
    2. spc: I / a.u. [matrix36] 164.650 179.724 ... 112.086
    3. filename: filename [character] rawdata/laser.txt.gz rawdata/laser.txt.gz ... rawdata/laser.txt.gz ![](/hyperSpec.gsoc2020/img/laser-graph.png)

[1] "paracetamol"
hyperSpec object
    1 spectra
    2 data columns
    4064 data points / spectrum
wavelength: Delta * tilde(nu)/cm^-1 [numeric] 96.7865 98.1432 ... 3200.07
data:  (1 rows x 2 columns)
    1. spc: I / a.u. [matrix4064] 2056.50 2224.84 ... 299.229
    2. filename: filename [character] txt.Renishaw/paracetamol.txt ![](/hyperSpec.gsoc2020/img/paracetamol-graph.png)

So, not only does a hyperSpec object have to contain the 4 slots mentioned above, the @data slot must also contain the spectra matrix (hyperSpecObject@data$spc). Accordingly, the dplyr operations rename() and select() should only augment the @data slot. However, augmenting the data slot augments the @label slot. In other words, if I’m going to create a function that changes the column names of the data frame using dplyr::rename(), the function must also ensure that the spc column is not renamed and that the labels are correctly updated. Then, a function design (pseudo-code) for rename.hyperSpec() should look like this:

# Function design for rename.hyperSpec()
rename.hyperSpec() <- function(hyperSpecObject, NewColName = OldColName, ...){
  # Check if user passed in a hyperSpec object
  chk.hy(hyperSpecObject)
  # Use dplyr::rename() to rename hyperSpec object data slot
  res <- dplyr::rename(hyperSpecObject@data, NewColName = OldColName, ...)
  # Check if $spc was renamed
  if(spc.name.changed(res)){
    # Throw an error
    error("spc has been renamed")
  }else{
    # Update the data slot with newly renamed data frame
    hyperSpecObject@data <- res
    # Update labels of the hyperSpec object
    labels(hyperSpecObject) <- update.labels(hyperSpecObject)
    # Return hyperSpec object
    return{hyperSpecObject}
  }
}

3.6. Function Implementation

A few dplyr operations have already been implemented for hyperSpec (e.g., filter.hyperSpec() and select.hyperSpec()). Fortunately, dplyr::select() and dplyr::rename() are similar operations, so hyperSpec’s implementation of select() should be a good guide to follow for rename().

Using the above-mentioned function requirements and function design for rename.hyperSpec() and select.hyperSpec() as a guide, an implementation of rename() could be the following:

# Function implementation for rename.hyperSpec()
rename.hyperSpec <- function(.data, ...){
  # Check if user passed in a hyperSpec object
  chk.hy(.data)
  # Use dplyr::rename() to rename hyperSpec object data slot
  res <- rename(.data@data, ...)
  # Check if $spc was renamed
  if(!'spc' %in% colnames(res)){
    # Throw an error
    stop("$spc cannot be renamed")
  }else{
    # Get new and current column names
    labels.to.update <- setdiff(colnames(res), colnames(.data))
    labels.to.remove <- setdiff(colnames(.data), colnames(res))
    # Update the data slot with newly renamed data frame
    .data@data <- res
    # Update labels of hyperSpec object
    new.labels <- lapply(labels(.data, labels.to.update), as.expression)
    labels(.data)[c(labels.to.remove)] <- NULL
    labels(.data)[c(labels.to.update)] <- new.labels
    .data
  }
}

# Rename data frame of hyperSpec object
chondro %>% rename.hyperSpec(region = clusters)
chondro %>% rename.hyperSpec(spc2 = spc) # should throw an error

hyperSpec object
spectra
data columns
data points / spectrum
wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
data:  (875 rows x 5 columns)
y: y [numeric] -4.77 -4.77 ... 19.23
x: x [numeric] -11.55 -10.55 ... 22.45
filename: filename [character] rawdata/chondro.txt rawdata/chondro.txt ... rawdata/chondro.txt
region: region [factor] matrix matrix ... lacuna + NA
spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942

Error in rename.hyperSpec(., spc2 = spc): $spc cannot be renamed
Traceback:

chondro %>% rename.hyperSpec(spc2 = spc)
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
eval(quote(`_fseq`(`_lhs`)), env, env)
eval(quote(`_fseq`(`_lhs`)), env, env)
`_fseq`(`_lhs`
freduce(value, `_function_list`)
withVisible(function_list[[k]](value))
function_list[[k]](value)
rename.hyperSpec(., spc2 = spc)
stop("$spc cannot be renamed")   # at line 9 of file <text>

3.7. Testing and Validation

Now that rename() has been implemented it’s important to test the code. Unit testing (UT) is an automated procedure for checking the correctness of pieces/components/cases of a function/module. The life cycle of UT and code development could like the following:

$\mbox{function-implementation} \leftrightarrow \mbox{unit-test} \leftrightarrow \mbox{check-test} \leftrightarrow \mbox{update -implementation}$

Note: It’s important to know whether or not there is a problem with your code or the unit test your wrote for the code.

3.7.1. Writing unit test in R

In R, UT is carried out through the testthat package. The basic structure of testthat test are the following:

test_that("a description of my test", {
        expect_condition(function_name(parameter), expected_result))
    })

See Hadley Wickham’s Testing chapter to get more information about setting up unit testing for R. Additionally, André Müller’s “Unit Testing in R” is a helpful resource for R test development.

3.7.2. Writing unit test for `rename.hyperSpec()`

Because the column renaming functionality is provided by dplyr::rename() I don’t need to test if that works (it’s already being tested). Instead, the main things I want to test is that rename() doesn’t allow the renaming of spc and that when the columns of the data frame are renamed the labels are correctly updated:

Certainly there are more tests I could write for rename(), but I think these three are a good start. We can use covr to check the testing coverage:

Note: “One hundred percent unit test coverage does not mean we had good tests, or even that the tests are complete. The tests could be missing important data and only testing with data that succeeds, failing to test data that causes failures. One hundred percent unit test coverage doesn’t say anything about missing code, missing error handling, or missing requirements.” - John Ruberto

3.8. Documentation

Now that the function has been implemented and tested the last thing to do is document the function and provide an example is the package vingette. RStudio and the roxygen package makes it easy to build documentation for functions and packages:

The documentation output generated by roxygen can be viewed here

3.9. Submitting a pull request for `rename.R`

The last thing to do is commit and then submit a pull request for rename.R and wait for a code review

4. Very Hard

Task: Set up a github repo and R package skeleton and copy a hyperspec qplot function into the new package with one unit test.

I chose qplotspc because it seemed simple and because I wanted to know more about the spectra within the hyperSpec objects.

So, the main thing I'm trying to understand is how does one test a plotting function (maybe the ggplot2 repo has some good resources). Furthermore, I need to understand what the function does and how the function does it. Basically, I need to understand the parameters/arguments and how they get processed in the function body.

4.1. `qplotspc()` usage

See how qplotspc and other plotting functions are used on hyperSpec Plotting Examples with ggplot2

4.2. `qplotspc()` parameters

The parameters of qplotspc are the following:

x - hyperSpec object
wl.range - wavelength ranges to plot
mapping - uses gpgplot’s geom_line?
spc.nmax - maximum number of spectra to plot
map.lineonly - if TRUE, mapping will be handed to ggplot2 geom_line instead of ggplot2 ggplot.
debuglevel - if > 0, additional debug output is produced

4.3. `qplotspc()` function body

The function body of qplotspc is the following:

qplotspc <- function(x,
                     wl.range = TRUE, ...,
                     mapping = aes_string(x = ".wavelength", y = "spc", group = ".rownames"),
                     spc.nmax = hy.getOption("ggplot.spc.nmax"),
                     map.lineonly = FALSE,
                     debuglevel = hy.getOption("debuglevel"))
{
  chk.hy(x)
  validObject(x)

  ## cut away everything that isn't asked for _before_ transforming to data.frame
  if(nrow(x) > spc.nmax)
  {
    if(debuglevel >= 1L)
    {
      message("Number of spectra exceeds spc.nmax. Only the first ", spc.nmax, " are plotted.")
    }
    x <- x[seq_len(spc.nmax)]
  }
  wl.range <- wl2i(x, wl.range, unlist = FALSE)
  x <- x[, , unlist(wl.range), wl.index = TRUE]
  df <- as.long.df(x, rownames = TRUE, na.rm = FALSE) # with na.rm trouble with wl.range

  ## ranges go into facets
  if(length(wl.range) > 1L)
  {
    tmp <- wl.range
    for(r in seq_along(tmp))
    {
      tmp [[r]][TRUE] <- r
    }
    df$.wl.range <- rep(unlist(tmp), each = nrow(x))
  }
  df <- df[!is.na(df$spc), , drop = FALSE]
  if(map.lineonly)
  {
    p <- ggplot(df) +
      geom_line(mapping = mapping, ...)
  }
  else
  {
    p <- ggplot(df, mapping = mapping) +
      geom_line(...)
  }
  p <- p + xlab(labels(x, ".wavelength")) + ylab(labels(x, "spc"))
  if(!is.null(df$.wl.range))
  {
    p <- p + facet_grid(. ~ .wl.range,
                        labeller = as_labeller(rep(NA, nlevels(df$.wl.range))),
                        scales = "free", space = "free") +
      theme(strip.text.x = element_text(size = 0))
  }
  p
}

4.3.1. Understanding the pieces of `qplotspc`

Okay, clearly there is a bunch of stuff going on in this function, so let’s try and break this into pieces first:

piece_1: parameters list

function(x, # expects a hyperSpec object
         wl.range = TRUE, # defaults to true => will plot the wavelength ranges
         ..., # passes additional arguments to ggplot geom_line()
         mapping = aes_string(x = ".wavelength", y = "spc", group = ".rownames"), # map the variables in the data to visual properties
         spc.nmax = hy.getOption("ggplot.spc.nmax"), # gets the number of spectra to be plotted based on the current package option
         map.lineonly = FALSE, # defaults to handing mapping to ggplot2 ggplot
         debuglevel = hy.getOption("debuglevel") # gets the debuglevel based on the current hyperSpec package option
       )

piece_2: checking hyperSpec object

chk.hy(x)
validObject(x)

piece_3: transforming hyperSpec object into a data frame

# Collect the first spc.nmax number of spectra
if(nrow(x) > spc.nmax)
{
  # Report warning to user
  if(debuglevel >= 1L)
  {
    message("Number of spectra exceeds spc.nmax. Only the first ", spc.nmax, " are plotted.")
  }
  # Collect a subset of the hyperSpec object's spectra (i.e., number of rows)
  x <- x[seq_len(spc.nmax)]
}
# Get indices of specified wavelength ranges
wl.range <- wl2i(x, wl.range, unlist = FALSE)
# Collect a subset of the hyperSpec object's wavelength (i.e., number of columns)
x <- x[, , unlist(wl.range), wl.index = TRUE]
# Turn hyperSpec object into a data frame
df <- as.long.df(x, rownames = TRUE, na.rm = FALSE) # with na.rm trouble with wl.range

piece_4: turning ranges into facets

# Replace the wavelength ranges with a vector containing 1 to length(wl.ranges)
if(length(wl.range) > 1L)
{
  tmp <- wl.range
  for(r in seq_along(tmp))
  {
    tmp [[r]][TRUE] <- r
  }
  # Update wavelength ranges for all rows
  df$.wl.range <- rep(unlist(tmp), each = nrow(x))
}
# Collect non NA rows and ensure a data frame is returned
df <- df[!is.na(df$spc), , drop = FALSE]

piece_5: Handling map.lineonly input

# Let geom_line handle mapping
if(map.lineonly)
{
  p <- ggplot(df) +
    geom_line(mapping = mapping, ...)
}
# Let ggplot handle mapping
else
{
  p <- ggplot(df, mapping = mapping) +
    geom_line(...)
}
# Add x and y labels to plot
p <- p + xlab(labels(x, ".wavelength")) + ylab(labels(x, "spc"))

piece_6: adding facets

# Check if the data frame has wavelength ranges
if(!is.null(df$.wl.range))
{
  # Add facets and theme to plot
  p <- p + facet_grid(. ~ .wl.range,
                      labeller = as_labeller(rep(NA, nlevels(df$.wl.range))),
                      scales = "free", space = "free") +
    theme(strip.text.x = element_text(size = 0))
}
# Return plot
p

All right! So I broke the function into six parts, which were divided according the start of control structures (i.e., if()). This was done to better understand how the function arguments were being handled based on the input given by the user. This was also done to try and develop more comprehensive unittest. Now, hopefully it's a little more clearer how to test each part of the code.

4.4. Writing unit test for `qplotspc`

Again, the plan is to write one unit test for each part of the function:

See for the remaining unittest

Finally, check package coverage:

hyperSpec.gsoc2020

Meeting notes and progress related to hyperSpec GSoC2020

R GSoC 2020 hyperSpec

1. Easy

2. Medium

2.1. The nature of linting

2.2. Applying lintr to hyperSpec

2.3. Applying styler to hyperSpec

3. Hard

3.1. Contributing to the hyperSpec.tidyverse project

3.2. Picking a “good first issue”

3.3. Creating a structure to work on hyperSpec.tidyverse

3.4. Software Description

3.4.1. The nature of rename() and select()

3.4.2. dplyr::rename()

3.4.3. dplyr::select()