A Short Introduction to Spectroscopy: Notes on Information, Energy, Matter, and Light
Goals of GSoC2020 - RGSoC2020 application proposal
Days Before GSoC2020 - Solutions to the entrance exam
My Commits During GSoC2020
-
repo cbeleites/hyperSpec
Report for Week of 5-4-20 (Week 1)
Week-1-expected: Create a clear communication schedule for weekly meetings, protocols for check-ins, and a high-level road map for successful contribution to the
hyperSpec
project (e.g., forking, merging, pulling, etc.).
Week-1-actual:
Created a clear communication schedule for weekly meetings, protocols for check-ins, and a high-level road map for successful contribution to the hyperSpec
project (e.g., forking, merging, pulling, etc.).
1. Infrastructure for Open-source Contribution
Before contributing or creating a repository, there must be an individual or team that manages contributions and the people that make them: contributors - A person, or organization responsible for making contributions to the repository (open-source projects don’t write themselves, really). And so, we’ll specify that contributions can come in the form of coding contributions (CCs) and non-coding contributions (NCCs):
-
coding contributions - contributions that consist of the addition or deletion of code
-
non-coding contributions - contributions that support the project that are not CCs
From the perspective of a team hosting their project on Github, this means managing code repositories and their associated issues, pull request (PRs), code, branches, contributor access, and code reviews. For this first week (and at the beginning of any project) the r-hyperspec
team focused on NCCs.
1.2. Social and Team Infrastructure (Project management)
For the r-hyperspec
team, the infrastructure for CCs is Git/GitHub. Though, Github also provides infrastructure for NCCs (e.g., opening, commenting, and closing issues). However, there are a number of tools that IMHO significantly aid with managing NCCs. Such tools help with the organization of knowledge, planning of contributions, implementation of contributions, and various other “behind the scene” task that not only aid the CCs but also the social dynamics between the contributors. The r-hyperspec
GSoC2020 mentor Roman Kiselev aided the team with developing and managing such team infrastructures:
1.2.1. Trello
Roman created a Trello board for the r-hyperspec
team. Trello essentially hosts public and private kanban boards - “an agile project management tool designed to help visualize work, limit work-in-progress, and maximize efficiency (or flow)”.
1.2.2. Github
Throughout the summer Roman and the rest of the r-hyperspec
team would attach GitHub issues and useful resources and to the board such as this: Why you should not use (long-lived) feature branches. GitHub resources like that were super helpful as references for contributors that are familiar and unfamiliar with Github.
1.2.3. Trello + GitHub integration
Roman also showed the team how to integrate the two. From my perspective, this feature was essentially training wheels for me as I was the most inexperienced with Git/Github and project management. In fact, if one looks at the use of this feature throughout the summer via the public trello board or GitHub, they would find that its use decreases slowly over time (as I am a slow learner). The r-hyperspec
team Trello board can be viewed here.
Report for Week of 5-11-20 (Week 2)
Week-2-expected: Finalize development tools and workflow for contributing to the project over the summer. Inquire about helpful R programming and spectroscopy resources from mentors and digest them.
Week-2-actual:
-
The
r-hyperspec
team discussed project licenses, our public trello board, contributors guidelines, Github workflows, template repositories, and general R package building. -
I worked on turning the offline
hyperSpec
legacy Makefile system (which ran some test) to what would eventually become the current online CIr-hyperspec
ecosystem. -
My PR from the days before GSoC was merged. That was a particularly large commit because of the nature of the changes.
-
I opened an issue about a problem that I found particularly challenging related to
hySpc.dplyr
(Generalizing transmute.R to other columns that contain matrices)- In fact, this issue would not be resolved (“closed”) until June 13, 2020.
1. Package skeleton (template repo)
1.1 Templates
In general, a template is pattern for a repeatable format for a document or file. The primary benefit of such a structure is that the underlying structure does not have to be recreated from scratch. This ensures a level of consistency across projects. And so indeed, it makes sense to also have templates for repositories. And if it makes sense for repositories, perhaps it also makes sense for issues, pull request, and a range of collaborative structures.
1.2 Template repository
A template repository can be made in Github by going to the settings of the repository and checking the template repository box. Then, you can generate new repositories by using the url-end point:
https://github.com/username/repository.name/generate
This is equivalent to clicking the “Use this template” button in the repository’s code tab. Once, the derived repository is created, it’s a good idea to set the template repository as a remote via:
git remote add template [URL of the template repo]
Now, changes made to the template can be propagated to the derived repo by fetching and pulling in the changes.
1.3. Creating a hyperSpec
package skeleton
Because I will be creating a lot of R packages there should be a package pattern to ensure a consistent infrastructure style is present across the r-hyperspec
ecosystem.
Then, every hyperSpec
package should include the following components:
- Github repository
- R project file
- unit test via
hySpc.testthat()
- continuous integration (CI) via Travis CI and AppVeyor
- code coverage via
covr
and Codecov
1.3.1. Create a Github repository
Make a new Github repository for the package to be. Clone the Github repository locally. Clone the hyperSpec.skeleton repository (now the hySpc.skeleton
package. Commit/publish local repository.
1.3.2. Create a R package via RStudio
It is best to develop all packages from within RStudio:
- Create a new RStudio project from the existing local repository.
- Configure the project’s build tools so that they are set to “Package”.
- Check all of the Roxygen options.
- Edit the DESCRIPTION file appropriately.
- Use the function
devtools::load_all()
. - Clean and rebuild the package.
- Install and restart
Note: Should have the devtools
, roxygen2
, covr
, and testthat
R packages installed.
1.3.3. Set up testthat
for unit testing
The hyperSpec.skeleton
repository comes with the necessary infrastructure for unit testing via the testthat
package. See the link for a review on unit testing and this link for the testthat
manual.
Note: The r-hyperspec
team has a standard of unit test written within the same file as the function definition.
1.3.4. Set up CI
All r-hyperspec
packages include CI via Travis CI and AppVeyor. The .yml files for both of these services are included in the package directory.
Note: You need to have a Travis CI and AppVeyor account. Make sure the Travis CI and AppVeyor apps are installed in the Github repository’s settings.
1.3.5. Set up code coverage
All r-hyperspec
packages report code coverage via codecov.io. The .yml file for this service is included in the package directory. For generating code coverage reports from within R, see the covr
github repository and the package manual.
Note: You need to have a codecov.io account. Make sure the Codecov app is installed in the Github repository’s settings.
1.3.6. Adding badges
Badges for the packages go in the READEME.md file. Badges for Travis CI, AppVeyor, Codecov, and repo status can be found on the service’s respective website. Additionally, a comprehensive list of badges can be found at Naereen/badges.
Report for Week of 5-18-20 (Week 3)
Week-3-expected: Finalize development tools and workflow for contributing to the project over the summer. Inquire about helpful R programming and spectroscopy resources from mentors and digest them.
Week-3-actual: The r-hyperspec
team discussed project licenses, public trello board, contributors guideline, Github workflow, template repositories, and general R package building. Reviewed one PRs in the hyperSpec
repository. Worked on the hySpc.skeleton
framework where in theory a derived package, say hyperSpec.derivedpkg
or hySpc.*
would collect the changes committed to the skeleton as specified in the last week. Then, I began working on generalizing mutate.R
. Finally, I did some research into drat
and pkgdown
, and worked on trying to get hyperSpec
to build correctly.
Helpful resources:
-
Bryan A. Hanson’s blog
-
Bryan also pointed me to this gem - Design of R language
1. Generalizing mutate.R and improving setLabels.R
1.1. mutate
I need to update mutate.R so that it accounts for hyperSpec
objects that have other data slot columns with matrices. At the moment, mutate.R
only checks to see if the spc
column is being mutated/transmuted. Then, if it is, it “manually” assigns the value to be:
.data@data$spc <- .data@data$spc*2
But again, I need to do this for all matrix data columns:
.data@data$var <- .data@data$var*2
.data@data$var <- .data@data$spc*2
But I think checking if a column contains matrices is pretty straightforward:
is.matrix(.data@data$var) # returns true if it is a matrix
Which means, I could just devise a helper function that checks if the “expression” is a matrix and then accordingly updates the “name”:
if (args_name[i] in colnames(tmp_hy)) {
tmp_hy@data[c(args_name[i])] <- .data@data[c(args_name[i])]
}
.
.
.
if (args_names %in% colnames(.data)) {
# create a tmp var
tmp_var <- tmp_var
tmp_var <- eval(parse(text = paste(".data@data$", args_names[i]))) # What needs to be saved
tmp_hy@data[c(arg_names[i])] <-
}
handle_mat <- function(.data, expr) {
var <- eval(parse(text = paste(.data@data$, expr)))
if (is.matrix(var)) {
eval(parse(text = paste("tmp_hy@data$", args_name[i], "<-tmp_hy@data$", expr)))
cols2get <- (cols2get, args_name[i])
}
.
.
.
cols2get <- unique(cols2get)
}
make_tmp <- function(arg_names, .data) {
if (arg_names %in% colnames(.data) && is.matrix(.data@data$[args_names])) {
tmp_var <- .data@data$[args_names]
}
}
get_args <- function(.data, ...) {
# Collect function arguments
args <- enquos(...)
args_names <- names(args)
# Give nothing, return nothing
if (length(args) == 0L) {
return(NULL)
}
# Prepare a copy of the original hyperSpec object
tmp_hy <- .data
cols2get <- vector() # creates a list to save the column names to
# Prepare function arguments for mutate/transmute
# assumption: the variable name and expr
# share the same index (i.e., args_name[i] is the expr for the variable names(args[i]))
for (i in seq_along(args)) {
expr <- quo_name(quo_get_expr(args[[i]]))
# Process arguments with no names (assignments)
if ('' %in% args_names[i]) {
cols2get <- c(cols2get, expr)
# Process matrix argument assignments
# Manipulate matrix column before passing it on to mutate/transmute
} else if (args_names[i] %in% colnames(.data) && is.matrix(.data@data$[args_names[i]])) {
if (grepl("spc", expr) && !"spc" %in% args_names[i]) {
# Throw an error
stop("$spc can only be mutated from itself")
}
tmp_hy@data[c(arg_names[i])] <- .data@data$[c(args_names[i])]
# ensures operation on original column
eval(parse(text = paste("tmp_hy@data[c(", arg_names[i], ")]<-", "tmp_hy@data$", expr)))
cols2get <- c(cols2get, arg_names[i])
}
# Process non-matrix argument assignments
} else {
assign <- paste(args_names[i],'=', expr, sep='')
cols2get <- c(cols2get, assign)
}
}
# Hand off columns (i.e., prepared arguments) to mutate()/transmute()
cols2get <- unique(cols2get) # transmute/mutate already take care of this...
ls_hy <- list(tmp_data = tmp_hy@data, args = paste(cols2get, collapse=", "))
}
2. pkgdown
and drat
2.1. pkgdown
Use pkgdown
to generate online documentation automatically:
# Install pkgdown
install.packages("pkgdown")
also installing the dependency 'highlight'
The downloaded binary packages are in
/var/folders/4_/gg3mjn693bz4v5w1sxh71rtc0000gp/T//RtmpFMJVnd/downloaded_packages
# Load libraries
library(pkgdown)
library(devtools)
Loading required package: usethis
Attaching package: 'devtools'
The following object is masked from 'package:pkgdown':
build_site
# Run pkgdown
usethis::use_github_action("pkgdown")
v Setting active project to '/Users/erickoduniyi/Documents/Projects/open-source/hyperspec/hyperSpec.skeleton'
v Creating '.github/'
v Adding '^\\.github$' to '.Rbuildignore'
v Adding '*.html' to '.github/.gitignore'
v Creating '.github/workflows/'
v Writing '.github/workflows/pkgdown.yaml'
# Build pkgdown site
pkgdown::build_site()
2.2. drat
For an overview of drat
, see this link
3. Flesh hyperSpec.skeleton Github repository
Add template labels to hyperSpec.skeleton
so that the hyperSpec
ecosystem has a consistent set of labels.
3.1 shell script for fetching template labels
Each derived repository should include a file (shell script) that retrieves the hyperSpec.skeleton
Github repository configuration (i.e., labels, pull request templates, issue templates)
# hyperSpec.skeleton shell script
# author: Erick Oduniyi
# Make sure github-labels node package is installed
npm install github-labels -g
# Get Github username and repo name
echo "enter username and name of derived package repository (username/repo)"
read pkg_repo
# Get personal access tokens
echo "enter Personal access tokens"
read token
# Pass the skeleton labels to the derived package repo
labels -c hyperSpec.skeleton.labels.json pkg_repo -t token
4. Figure out the build/make issue of hyperSpec
on macOS
# In R install the tinytex package
install_packages("tinytex")
It’s a good idea to install the full MacTex2020 pkg as well.
# In Terminal within the directory of hyperSpec run make
make
Update the config file to reflect your user credentials.
Report for Week of 5-25-20 (Week 4)
Week-4-expected: Let the coding continue! Continue making progress on Goals 2 and 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-4-actual:
Started replacing the chondro
dataset with the faux_cell
dataset, worked on mutate.R and setLabels.R, and continued to look into pkgdown
and drat
.
1. Replacing chondro
I need to replace all of the references of the chondro
data set with the faux_cell
data set
From Roman:
@Erick Maybe you can start removal of chondro by first deleting the file chondro.R and the chondro vignette (the whole folder).
This will break everything. Then try to fix it by finding all occurrences of chondro in the code and replacing them with faux_cell.
It it works, you're done! :smile: If not, we'll have to check on a case-by-case basis.
According to Roman, I need to update the examples, unit test, and then vignettes.
# update.examples expects a directory of .R files where the examples exist
# name of the new dataset
# name of the old dataset
update.examples <- function(dir, new_ds, old_ds) {
# Check each .R file
# Find all the occurrences of the old_dt
# Swap the the old_dt occurrences with new_dt
# Test to see if the package can be built with the updated change
# If there is an error, warning, or break in test output it to the console and report it in the log
}
# Assumptions: The old_dt and new_dt exist in the package
Note: So, the first thing I am going to do is test the entire package (you can submit a package without passing all of the unittest?):
test("./hyperSpec")
Note: Also, it might be a better idea to run test locally per file as opposed to at the directory level.
The first thing that needs to be done from the #145 branch is:
make
R CMD build hyperSpec --no-build-vignettes
R CMD check hyperSpec_0.99-20200521.tar.gz --ignore-vignettes --as-cran
Then, one can open up RStudio and hit the Load All
button under the More
tab. To make sure everything is working properly:
> faux_cell
hyperSpec object
875 spectra
4 data columns
300 data points / spectrum
wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
data: (875 rows x 4 columns)
1. x: x position [numeric] -11.55 -10.55 ... 22.45
2. y: y position [numeric] -4.77 -4.77 ... 19.23
3. region: [factor] matrix matrix ... matrix
4. spc: intensity (arbitrary units) [matrix, array300] 45 82 ... 176
> chondro
hyperSpec object
875 spectra
5 data columns
300 data points / spectrum
wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
data: (875 rows x 5 columns)
1. y: y [numeric] -4.77 -4.77 ... 19.23
2. x: x [numeric] -11.55 -10.55 ... 22.45
3. filename: filename [character] rawdata/chondro.txt rawdata/chondro.txt ... rawdata/chondro.txt
4. clusters: clusters [factor] matrix matrix ... lacuna + NA
5. spc: I / a.u. [matrix, array300] 501.8194 500.4552 ... 169.2942
As you can see from the above code snippet there are some differences between the faux_cell
and chondro
datasets (i.e., labels, name of data slots)…which could all potentially affect the replacement process.
At the end of the day I need to replace the chondro
data set with the faux_cell
data set because the former has such a large overhead. Great, that should be the easiest part of the job (replacement/substitution).
The real work starts after the replacement happens. Because even IF, the two data sets are similar, there are a number of things that could break during the replacement process. In particular, the attached test, @examples, vignettes, and scattered references to chondro
.
Note: Examples exist within the .R files, man pages, and in the vignettes.
Report for Week of 6-1-20 (Week 5)
Week-5-expected: Let the coding begin! Start making progress on Goal 1. Stick to the development cycle for all weekly tasks (i.e., describe, design, implement, quality check, test, document, deliver, and iterate). Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-5-actual: This was actually the start of the first coding period, but A LOT is going on locally, nationally, and globally, so I honestly took this week to mediate and organize. Pretty hard to be productive. Hopefully I can get back to it properly next week.
Report for Week of 6-8-20 (Week 6)
Week-6-expected: Let the coding continue! Continue making progress on Goal 1. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-6-actual:
- Reviewed a PR related to a quosure bugfix
-
Implemented mutate.R and setLabels.R. Please take a look at the PR here
-
mutate
takes care of issue #6 and issue #7 of hySpc.dplyr -
setLabels
implements updating labels as discussed in #7 of hySpc.dplyr -
Worked on replacing
chondro
withfaux_cell
inhyperSpec
and submitted a PR
-
1. The Return of hyperSpec.mutate/transmute
Still need to finish fleshing out the details of the hyperSpec.mutate()
and hyperSpec.transmute()
functions. According to our Monday meeting, the generalized mutate/transmute
should allow and disallow the following:
# Is allowed:
hyperSpec.obj %>%
mutate (x = y, x2 = y2) # allowed
hyperSpec.obj %>%
mutate (c = c*2, c = c*0) # allowed
hyperSpec.obj %>%
mutate (y, x, filename, spc2 = spc*2) # allowed
hyperSpec.obj %>%
mutate(spc2 = spc*2) %>%
mutate(spc2) %>%
mutate(spc2*2) # allowed
# Let a and b be columns with row matrices, then
hyperSpec.obj %>%
mutate (a = a*0, a = a*2, a = a*3, b) # allowed
hyperSpec.obj %>%
mutate (a*0, a*2, a*3, b) # allowed
# Is not allowed:
hyperSpec.obj %>%
mutate (y, x, filename, spc = spc*2) # not allowed
hyperSpec.obj %>%
mutate (spc*2) # not allowed
hyperSpec.obj %>%
mutate(spc2 = spc*2) %>%
mutate(spc) # not allowed
Note: transmute
works in these cases as well except if $spc
is not present in transmutation a data frame is returned (i.e., transmute (x, y) # => df
)
hyperSpec.mutate
performs the appropriate mutate or transmute operations by analyzing the input arguments. From the above code snippet you can see that there are a few cases that are accepted and not. Perhaps the most import thing about this function is how it handles operations regarding spc
. In general, spc
can not be mutated (mutate(spc*2)
, mutate(spc = spc*2)
). Additionally, all data columns with row matrices are handled similarly to spc
the difference being mutation on these non-spc matrix columns is fine (mutate(mat.col1*2)
, mutate(mat.col1 = x)
).
The current implementation of mutate.R
is mostly expressed within the helper function get_args()
:
for (i in seq_along(args)) {
expr <- quo_name(quo_get_expr(args[[i]]))
# Process arguments with no names (assignments)
if ("" %in% args_names[i]) {
cols2get <- c(cols2get, expr)
# Process matrix argument assignments
# Manipulate `matrix column before passing it on to mutate/transmute
} else if (args_names[i] %in% colnames(.data) && is.matrix(var_expr)) {
# Handle misuse of spc column
if (grepl("spc", deparse(substitute(expr))) && !"spc" %in% args_names[i]) {
# Throw an error
stop("$spc can only be mutated from itself")
}
tmp_hy@data[[args_names[i]]] <- .data@data[[args_names[i]]] # ensures operation on original column
tmp_hy@data[[args_names[i]]] <- var_expr
cols2get <- c(cols2get, args_names[i])
# Process non matrix argument assignments
} else {
print("Here")
assign <- paste(args_names[i], "=", expr, sep="")
cols2get <- c(cols2get, assign)
}
}
Let’s try and distill the update as pseudo-code:
# If the argument has no name (only an expression)
if (args_name[i] is empty_string) {
if (expr is matrix) {
# going to be hard to deal with
if (expr contains `spc`+anything_else) {
stop("spc column can not be mutated")
} else {
# Update tmp .data
# Store expr as column (# just store `mat` not `mat`+anything_else)
}
} else {
# Store expr as column
}
# If name of argument is a column with row matrices
} else if (args_name[i] is matrix) {
if (args_name[i] is `spc`) {
stop("spc column can not be mutated")
} else {
# Update tmp .data
# Store expr as column
}
# If name of argument is not a column with row matrices
} else {
# Create an assignment using paste
# Store expr as column
}
Now, let’s try and implement the solution:
pre_mutation <- function(.data, ...) {
# Collect function arguments
args <- enquos(...)
args_names <- names(args)
# Give nothing, return nothing
if (length(args) == 0L) {
return(NULL)
}
# Make a copy of the original hyperSpec object
tmp_hy <- .data
cols2get <- vector()
# Prepare function arguments for mutate/transmute
# assumption: the variable name and expr
# share the same index (i.e., args_name[i] is the expr for the variable names(args[i]))
for (i in seq_along(args)) {
expr <- quo_name(quo_get_expr(args[[i]]))
col_name <- trimws(gsub("[[:punct:]].*","", expr), "right") # "base" expr
# Capture expression value
if (!grepl("\\D", expr)) {
expr_val <- eval(parse(text = expr))
} else {
expr_val <- eval(parse(text = paste0("tmp_hy@data$", expr)))
}
# Argument has no name (only an expression)
if ("" %in% args_names[i]) {
# Expression is a column with row matrices
if (is.matrix(expr_val)) {
# Mutation is being performed on `spc``
if ("spc" %in% col_name && grepl('[^[:alnum:]]', expr)) {
# Throw error
stop("$spc column cannot be mutated")
# Collect `spc` column
} else if ("spc" %in% expr) {
# Store expr in column
cols2get <- c(cols2get, 'spc')
# Update temporary hyperSpec object before passing it to mutate/transmute
} else {
# Update tmp_hy@data
tmp_hy@data[[col_name]] <- expr_val
# Store expr in column (# just store `mat` not `mat`+anything_else)
cols2get <- c(cols2get, col_name)
}
# Column already exist in the hyperSpec object
} else {
# Store "base" expr in column
cols2get <- c(cols2get, expr)
}
# Expression's name (args_name[i]) is not empty
} else {
# Mutation is being performed on `spc`
if ("spc" %in% args_names[i]) {
# Throw error
stop("$spc column cannot be a named argument")
# Expression is a column with row matrices
} else if (is.matrix(expr_val)) {
# Update tmp_hy@data
tmp_hy@data[[args_names[i]]] <- expr_val
# Store "base" expr in column
cols2get <- c(cols2get, args_names[i])
# "vanilla" assignment
} else {
# Create an assignment using paste
assign <- paste0(args_names[i], "=", expr)
# Store expr in column
cols2get <- c(cols2get, assign)
}
}
}
# Hand off columns (i.e., prepared arguments) to mutate()/transmute()
cols2get <- unique(cols2get) # transmute/mutate might already take care of this...
return(list(tmp_data = tmp_hy@data, args = paste(cols2get, collapse=", ")))
Well, according to dplyr 1.0.0. mutate
and transmute
now support operations on columns with matrices. So, this basically means the above work is unnecessary, el o el:
mutate.R
for hyperSpec
object:
mutate.hyperSpec <- function(.data, ...) {
# Check if user passed in a hyperSpec object
chk.hy(.data)
# Pass mutate arguments to dplyr::mutate
res <- mutate(.data@data, ...)
.data@data <- res
.data
}
And for transmute.R
:
transmute.hyperSpec <- function(.data, ...) {
# Check if user passed in a hyperSpec object
chk.hy(.data)
# Pass transmute arguments to dplyr::transmute
res <- transmute(.data@data, ...)
# Update labels
setLabels.select(.data, res)
}
2. Replace chondro
with faux_cell
Now that I have had a moment to collect my thoughts, I realize that it’s actually not that difficult of a task.
3. Updating .R files
I used a script to update the majority of the files and then did an inspection of each file to make sure that they weren’t referencing anything that faux_cell
did not have in its data slot.
4. Updating test
For each file I needed to make sure that the test could pass with faux_cell
. To do this, I ran the following function in the RStudio console:
with_reporter(reporter = SummaryReporter$new(), start_end_reporter = TRUE, get.test(.nameoffunction)())
Additionally, I performed the make
-> build
-> check
after each updated file to ensure that everything was cool.
5. Keep track of updates
Now that I have completed replacing chondro
with faux_cell
in the files that reference chondro
. Before committing I made sure the test and examples passed with the update. If it did not, I just skipped. Though, the only file that occurred in was spc.fit.poly.R
The files that have a reference to chondro
, but I did not touch are:
chondro.R
read.txt.Reinshaw.R
6. What else?
The vignettes still make use of examples that reference chondro
, so at some point that also needs to be updated.
Report for Week of 6-22-20 (Week 7)
Week-7-expected: Let the coding continue! Wrap up progress on Goal 1. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-7-actual:
Started implementing the hySpc.read.Witec
package (this package would later come to be known as hySpc.read.txt
).
hySpc.read.Witec
hySpc.read.Witec
is a bridge package for managing all kinds of files produced by Witec instruments. The application of this package is the following:
install.packages("hySpc.read.Witec")
library("hySpc.read.Witec")
hySpc.read.Witec("name/of/Witec/file")
Report for Week of 6-15-20 (Week 8)
Week-8-expected: Let the coding continue! Continue making progress on Goal 1. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-8-actual:
-
Worked on developing the
hySpc.skeleton
package -
Finally, the PR relating to the generalization of
mutate()
was merged
1. The Return of hyperSpec.skeleton
(hySpc.skeleton
)
According to naming conventions, the hyperSpec.skeleton
package has changed from hyperSpec.skeleton
to hySpc.skeleton
. Much of the work on this package was done with the gracious help of the rest of the r-hyperspec
team.
The hySpc.skeleton
package includes the following folders /
and files:
/R
/github-helpers
/pkgdown
/tests
.travis.yml
CONTRIBUTING.md
DESCRIPTION
LICENSE
NEWS.md
README.md
_pkgdown.yml
appveyor.yml
codecov.yml
project.Rproj
Report for Week of 6-29-20 (Week 9)
Week-9-expected: This period will be used to write a detailed report on the work done in Coding Period 1. All work completed will be uploaded and documented.
June 29 - July 3: Phase 1 Evaluations
Deliverables:
-
Distilled
hyperSpec
package -
New specialized
hyperSpec
packages for file I/O -
New implemented import filters for new file formats
Week-9-actual:
- Got more clarification about the
hySpc.testthat
package
1. hypSpc.read.Witec
This is the start of the file import output distillation process. Let the spectroscopy gods guide us.
1.1. Reading in ini files
library(ini)
ini_file <- read.ini("/path/to/ini/file.txt")
ini_file$SpectrumData
ini_file$SpectrumMetaData
read.table(file = textConnection(unlist(ini_file))
names(ini_file)
# I need to cleanly extract the name-value pairs from the ini file, then format it in such a way that it can be used to create a `hyperSpec` object
To get the spectrum data from the header information:
for (s in seq_along(i_spectra)) {
data <- unlist(file[[i_spectra[s] + 2]])
data <- scan(text = data, quiet = TRUE)
data <- matrix(data, nrow = nwl, ncol = 2L, byrow = TRUE)
#
if (s == 1)
wl <- data[, 1]
else
if (!all(wl == data[,1]))
stop("Spectrum ", s, " has different wavelength axis.")
spc[s,] <- data[, 2]
}
1.2. Developing hySpc.read.Witec
for non-spc columns
Claudia, Roman, and I met on Wednesday to go over a lot of progress for the hySpc.read.Witec
package. In particular, we helped each other make sure the package had the correct infrasturcutre (pkgdown
, hySpc.testthat
), fleshed out the functionality for reading in the Witec
format. The work that Claudia did to get from the Header information of the example file to a complete hyperSpec
object was pretty brilliant to watch. While I don’t understand all of the details the current read_txt_Witec_TrueMatch()
function has a lot of guides for parsing the remainder of the header information.
In fact, let’s following the guidance left for us:
read_txt_Witec_TrueMatch <- function(file) {
# Set the filename variable to the file that was passed in as a parameter
filename <- file
# Use the ini::read_ini function to read in the ini-like file
file <- read_ini(file)
# Get header information
# Note: A Witec_TrueMatch file could contain more than 1 spectra
# Each spectra has the following headers: SpectrumHeader, SpectrumMetaData, SpectrumData
i_spectra <- which(names(file) == "SpectrumHeader")
# Using the index obtained from the i_spectra variable
# Index the Witec_TrueMatch file for the header information and extract the SpectrumSize
nwl <- sapply(file[i_spectra], function(hdr) hdr$SpectrumSize)
The next step is to get the remainder of the SpectrumHeader data from Witec_TrueMatch file:
spc@data$header_variable <- value
sapply
2. Completion of GSoC Coding Period 1
Last week was not as productive as I hoped. This week marks the end of the first coding period. So, I wanted to review the outlined deliverables in the proposal for:
- Distilled
hypSpec
package - New specialized
hyperSpec
packages for file I/O - New implemented import filters for new file formats
As far as 1) is concerned, the team has managed to remove the use of Makefiles, which we believed was a critical component in distilling the original hyperSpec
package. With this removal came others: chondro
data set, the use of git-lfs. Moreover, the documentation infrastructure has been further developed, so that the documentation can be built online, separately (i.e., through browser).
Currently at the very beginning of 2). At the moment, four new packages have been created: hySpc.dplyr
, hySpc.testthat
, hySpc.read.Witec
, hySpc.skeleton
, and hySpc.pkgs
.
-
hySpc.dplyr
: Bridge and fortification package for thetidyverse::dplyr
universe. Implements all of the standard data wrangling grammars (rename
,select
,mutate/transmute
,filter
,between
,slice
, etc.) forhyperSpec
objects. -
hySpc.testthat
: Infrastructure package for attaching unittest to functions as is standard in ther-hyperspec
series (keeps unittest close to the functions - same file). This package is currently on CRAN. -
hySpc.read.Witec
: Distill package for managing files produced by Witec instruments. -
hySpc.pkgs
: “repository holding certain packages in ther-hyperspec
series (in particular, data-rich packages that are too large to be distributed on CRAN).”
So although there are still approximately the same number of .R files, there has been a considerable effort to develop a leaner contribution, documentation, testing infrastructure, so that more concentrated hyperSpec
distillation can be performed reliably and more efficiently. Though, I need some more clarification of the difference between 3) and 2) is. Even still, we are making our way to completing deliverables on time.
Report for Week of 7-6-20 (Week 10)
Week-10-expected: Let the coding continue! Start making progress on Goals 2 and 3. Stick to the development cycle for all weekly tasks. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-10-actual:
-
Continued to flesh out the
hySpc.read.Witec
package (it was renamed later tohySpc.read.txt
) and worked on drat strategy for automated resource moving. -
Submitted a PR for the
read_txt_WitecTrueMatch.R
1. Github Actions and pkgdown
To be honest, the whole DevOps side of the project (and software development in general) has been lost of me. I don’t really understand what Github Actions or Github workflows are or why they’re useful. I also don’t really understand how to use pkgdown
. So, today we must do research into the nature of both of these systems.
1.1. Github Actions
Well, to get started with Github Actions (GA) - A way to trigger task (workflow) when one or more events (pull request, issue, merging) have occurred within a repo. one needs a Github repo. Additionally Within that repo a .yml file is specified (workflow.yml):
name: hyperSpec CI
on:
event:
branches: [develop]
jobs:
test_pull_request:
runs-on: ubuntu-latest
steps:
- uses: actions/
- uses: actions/
with:
...
- run: ..
- run: ..
There are a number of open-source .yml solutions, as well as, configurations one can specify.
1.2. Let me tell you a secret
So, what is the difference between “personal access tokens”, “Github secrets”, “keys”, etc.
-
Personal Access Tokens - Personal access tokens (PATs) are an alternative to using passwords for authentication to GitHub when using the GitHub API or the command line.
-
Secrets - Secrets allow the storing of sensitive information
1.3. pkgdown
According to the main website for pkgdown
pkgdown is designed to make it quick and easy to build a website for your package.
1.4. drat
According to the maintainers, “Drat is an R package which makes it really easy to provide R packages via a repository, and also makes it easy to use such repositories for package installation and upgrades.” via:
Drat tries to help here and supports two principal modes:
- GitHub by leveraging gh-pages
- Other repos by using other storage where you can write and provide html access
1.5. Issue #2: Use Github Actions to Move .tar.gz over to hySpc.pkgs
According to Bryan, the team will be using drat
and hySpc.pkgs
to, “[hold] certain packages in the r-hyperspec
series (in particular, data-rich packages that are too large to be distributed on CRAN).” The purpose of this is to avoid using git-lfs, and make
and so that users can just install these large than life pkgs directly: install_github(hySpc.pkgs)
. With that being said, we will still need CI/CD via GA. So, to test this out we’re going to move hySpc.read.Witec
to hySpc.pkgs
1.6. Issue #2: Chef Hanson, Chef Oduniyi – Serving up some actions
Over the weekend Bryan and I worked on fleshing out our drat formula, which basically looks like this, but in GA/.yml speak:
if (package.size > CRAN.size) {
# put package on GH
}
else {
# put package on CRAN
}
A lot of these hySpc.pkgs
will be data packages.
2. hySpc.read.Witec
Still working on the hySpc.read.Witec
function. Most of the parsing of the Witec_TrueMatch file has been done. Additionally, a number of unittest have been implemented. The next couple task are to:
- find out better name
- provide axis labels
- x-axis units are in
$SpectrumHeader$XDataKind
- allow users to specify extra data to keep
- unit testing
- documentation
Instead of moving on to preparing the appropriate x-axis labels, I’m going to work on allowing the users to specify extra data columns, which I believe can be accomplished by having an argument in the read_txt_Witec_TrueMatch
function:
read_txt_Witec_TrueMatch <- function(file, keys.2header) {
.
.
.
}
The keys.2header
argument is taken from hyperSpec::read.ENVI.R
, but I don’t really understand how it’s implemented. Instead, I’m going to take the dplyr::mutate
approach, where the argument .keep
can be set to either c("all", "used", "unused", "none")
.
read_txt_Witec_TrueMatch <- function(file, keys.2header = c("all", "none")) {
.
.
.
keys.2header <- arg_match(keys.2header)
if (keys.2header == "all") {
# Basically do what I've been doing
} else if (keys.2header == "none") {
# Only retain `spc`
} else if (keys.2header != "all" && != "none") {
# collect those columns + spc from the generated hyperSpec object
} else {
# Default to do what I've been doing
}
.
.
.
}
Report for Week of 7-13-20 (Week 11)
Week-11-expected: Let the coding continue! Continue making progress on Goals 2 and 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-11-actual:
-
Reviewed Bryan’s PR for the deploy-pkg branch of
hySpc.read.txt
.-
Learned how to update labels for a repo using the GitHub Label Manager
-
All the labels for the
r-hyperspec
ecosystem are now “synced”.
-
-
Made some small commits to the
hySpc.pkgs
package -
Made some small commits to
hySpc.read.txt
1. Labels
Okay, for whatever reason I have been letting a couple of things on the Trello “In Progress” list for a little too long. So, I want to try and go ahead and clean up some of these task if not finish this week:
In Progress
-
Check performance of different ways to import the text data
-
Attach labels to
hySpc.testthat
andhySpc.dplyr
-
Update
hySpc.skeleton
documentation tohyperSpec
repo quality -
hySpc.read.Witec
-
hySpc.JCAMP-DX
-
skeleton package and
usethis
strategy -
Start making sure test coverage is at least 60% for
hyperSpec
Report for Week of 7-20-20 (Week 12)
Week-12-expected: Let the coding continue! Continue making progress on Goals 2 and 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-12-actual: Not much progress this week because of personal issues/medical reasons.
Report for Week of 8-27-20 (Week 13)
Week-13-expected: Let the coding continue! Start making progress on Goal 3. Stick to the development cycle for all weekly tasks (be adaptable too). Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
July 27 - July 31: Phase 2 Evaluations
Deliverables:
-
Shielded
hyperSpec
and associatedhyperSpec
packages -
Fortified
hyperSpec
fortidyverse
Week-13-actual: Not much progress this week because of personal issues/medical reasons.
-
Worked on the
hySpc.chondro
package.
Report for week of 8-3-20 (Week 14)
Week-14-expected: Let the coding continue! Continue making progress on Goal 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-14-actual:
-
Closed a few of issues:
-
Create separate data package (hySpc.chondro) for chondro in
hySpc.chondro
-
Error with pandoc-citeproc in
hySpc.chondro
-
Renaming
hySpc.read.Witec
tohySpc.read.txt
inhySpc.read.txt
-
-
Opened an issue in
hyperSpec
:- Formality of vignettes in
hyperSpec
- Formality of vignettes in
-
Completed small PRs related to
hySpc.chondro
-
Deleted a PR related to
hySpc.chondro
Report for week of 8-10-20 (Week 15)
Week-15-expected: Let the coding continue! Continue making progress on Goal 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-15-actual:
Test cases for the remaining styles of Witec
…these test were taken from the original hyperSpec
repo:
# Throw the kitchen sink at it!
read.txt.Witec("fileio/txt.Witec/Witec-Map_full.txt", type = "map", hdr.label = TRUE, hdr.units = TRUE)
read.txt.Witec("fileio/txt.Witec/Witec-Map_label.txt", type = "map", hdr.label = TRUE, hdr.units = FALSE)
read.txt.Witec("fileio/txt.Witec/Witec-Map_unit.txt", type = "map", hdr.label = FALSE, hdr.units = TRUE)
read.txt.Witec("fileio/txt.Witec/Witec-Map_no.txt", type = "map", hdr.label = FALSE, hdr.units = FALSE)
read.txt.Witec("fileio/txt.Witec/Witec-Map_unit.txt",
type = "map", hdr.label = FALSE, hdr.units = TRUE,
points.per.line = 5
)
read.txt.Witec("fileio/txt.Witec/Witec-Map_no.txt",
type = "map", hdr.label = FALSE, hdr.units = FALSE,
lines.per.image = 5
)
read.txt.Witec("fileio/txt.Witec/Witec-Map_no.txt",
type = "map", hdr.label = FALSE, hdr.units = FALSE,
points.per.line = 5, lines.per.image = 5
)
Report for week of 8-17-20 (Week 16)
Week-16-expected: Let the coding continue! Continue making progress on Goal 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-16-actual:
-
Reviewed PRs related to
hyperSpec
-
Completed small PRs related to
hyperSpec
-
Completed PRs related to
hySpc.read.txt
Moving functions from hyperSpec
to hySpc.read.txt
Moving the remaining functions related to read.txt
out of hyperSpec
into read.txt
: This will require that the functions in hyperSpec
will become deprecated. I’m going to need the Deprecated.R
and deprecation-messages.R
and copy the following functions:
- read.txt.Horiba.R
read.txt.Horiba
read.txt.Horiba.xy
read.txt.Horiba.t
- read.txt.long.R
read.txt.long
- read.txt.Renishaw.R
read.txt.Renishaw
read.zip.Renishaw
- read.txt.Shimadzu.R
read.txt.Shimadzu
- read.txt.wide.R
read.txt.wide
- read.txt.Witec.R
read.txt.Witec
read.dat.Witec
read.txt.Witec.Graph
wc.Rwc
- count_lines.R
count_lines
- read.asc.Andor.R
read.asc.Andor
- read.asc.PerkinElmer.R
read.asc.PerkinElmer
Report for week of 8-24-20 (Week 17)
Week-17-expected: Let the coding continue! Wrap up progress on Goal 3. Compile a weekly report of progress made. Meet with mentors. (check-in with mentors as necessary)
Week-17-actual:
With the guidance of the r-hyperspec
team I worked on my final GSoC2020 report. I did my best to follow the guidelines as stated here.
August 31: Final Results
All documentation, code modules (files, packages, etc.), and their associated tests have been uploaded. Excluding the three deliverables promised at the end of the third coding period:
-
Fortified
hyperSpec
forbaseline
with bridge packages -
Fortified
hyperSpec
forEMSC
with bridge packages -
Fortified
hyperSpec
formatrixStats
with bridge packages.
And so, with the support of the r-hyperspec
team all other deliverables promised for R GSoC 2020 have been…delivered. Furthermore, I plan on continuing to work with the r-hyperspec
team for the foreseeable future. That is, the remaining deliverables, issues, features, and their associated documentation will continue to be worked on by me and the r-hyperspec
team as time permits.
Final Notes
“There is no such thing as perfect and no plan is ever problem-free. Life does not wait for coding projects or internships to finish. So, if there is a decline in health or other personal and family issues arise, [you] should take a deep breath and let [your] mentors know what is going on without TMI. If [you] foresee a setback in development, make sure to communicate [your] difficulties and get suggestions on how to pivot from mentors so that milestones and deliverables can still be met in a timely manner.” - E. Oduniyi