Managing packages and environments in command-line R
In this session, we will begin by using R from the command line. Later, we will cover similar steps using RStudio Server available on Open OnDemand. We will primarily focus on using the renv package for package management, but we will also note alternatives at the end.
Choosing which version of R to use
Multiple R versions are available in the environment module system. Note that modules are named with ‘r’ and the program available after the module is loaded is ‘R’. With each new minor version of R you use, the renv package will need to be installed.
- First, use the cluster’s environment module system to find and load the version of R you want to use for your project:
module spider rorml spider r(note the lower-case ‘r’!). - Load the version of R you’d like to use. E.g.,
module load r/4.4.0orml load r/4.4.0. - Run
Rto open an R session of the version of R you loaded from a module. - Run the R command
install.packages('renv')to installrenvfor this version of R. - Run
q()to exit R, and enternwhen prompted to save the workspace image.
Creating and managing R environments with the renv package
To use renv for package management, it needs to be associated with an R project. This could be an empty directory if you are just starting a project or it could be one or more R scripts within a directory. Either way, the scope of the renv environment will be dictated by the working directory in which it is initialized. To initialize an renv environment, you use the renv::init() command. For getting started and for this workshop, we recommend passing two arguments when initializing the environment: renv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE)). These arguments keep package installations within the project directory instead of your home directory and prevent some potentially faulty URL translations from happening when packages are downloaded from repositories.
- If you are not already in your workshop directory, change into it by running
cd /90daydata/shared/$USER/. -
Create and change directory to a project directory:
mkdir my_project cd my_project - Start an R session with
R. - Initialize
renvby runningrenv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE)). Some handy messages will appear to describe whatrenvhas done.
Exercise 1: What files have been added to the project directory? (Hint: use
ls -ato include hidden files). What kind of content do they contain? (Hint: usecat filenameto print file contents to the screen.)
Exercise 2: Return to /90daydata/shared/$USER/, create and change directory to a new project folder
project1, and save the following R code into a file namedexercise2.R. (Hint: usenano exercise2.R.) Open an R session and initializerenvfor the project. What kinds of messages appear now?
library(magrittr)
x <- help.search("*", package="base")
N <- 3
for(i in c(1:N,N:1)){
string <- x$matches$Title %>%
sample(i) %>%
cat('\n')
}
R will have to be restarted before the project library is setup, i.e., our exercise2.R script won’t run successfully.
- Quit the current session and run
Rto open a new one. - Run the script with
source('exercise2.R').
Now we are set up with a project with an renv environment!
Installing and managing R packages in your project library
Next, we will expand our project with additional packages!
- You can install packages into your environment as you normally would with
install.packages('PACKAGE'), orrenvdoes have an expanded installation functionrenv::install('PACKAGE')that supports additional remote package sources, e.g., GitHub. If you are interested in learning more aboutrenv::install(), please see the documentation here. - To have
renvsave the state of the project (i.e., capture all the metadata of the used packages) in the environment configuration file called a ‘lockfile’, runrenv::snapshot(). - If you want to assess the state of the environment, (i.e., which packages are installed but not used, or which packages are used but not recorded), run
renv::status(). - Let’s save the script below in our project and install an old version of the
clipackage so we can simulate needing to update to the latest version next:install.packages("https://cran.r-project.org/src/contrib/Archive/cli/cli_3.6.1.tar.gz", repos=NULL,type="source"). - If we call
renv::status(), it will tell us we are out of sync. If we then callrenv::snapshot(), it will update the project.
library(magrittr)
library(cli)
x <- help.search("*", package="base")
N <- 3
for(i in c(1:N,N:1)){
string <- x$matches$Title %>%
sample(i) %>%
col_magenta() %>%
cat('\n')
}
Exercise 3: Update the version of
cliwith:install.packages('cli'). Modify the environment to be consistent.
- Another
renvfunction to make your project environment consistent isrenv::restore(). It helps update your project library to match your lockfile. - For example, if we install the
MASSlibrary because we think we may need it but later don’t,renv::restore(clean=TRUE)will help remove the unused package from the project library. renv::restore()can also be used to revert package version discrepancies like forcliabove.
Reproduce renv projects
In order to make environments and package management truly useful, we need a mechanism to easily reproduce environments. With the renv project files we looked at before, you can have renv reproduce the same environment in a new project directory.
- The
renvdirectories and files that should remain with the project are therenv.lockfile, therenv/activate.Randrenv/settings.jsonfiles, and the.Rprofilefile. With these files, the project environment can be easily recreated, therefore helping to ensure that your code and analyses are fully reproducible. - If you are using git for version control for the project,
renvadds therenvfiles that do not need to be tracked (i.e., the packages themselves) to the.gitignorefile for you.
Exercise 4: Create a new project directory in your workshop directory. Copy over the script and lockfile from
project1into the new project. From the new project directory, runR. What happens? Try initializing.
Exercise 5: Create another new project directory in your workshop directory. Copy over all of
project1’s files exceptrenv/libraryandrenv/staging(the package files) into the new project. From the new project directory, runR. What happens?
Managing packages and environments in RStudio Server
The approach of using renv in RStudio is very similar to using renv with command-line R. For completeness, we will create another environment in RStudio Server.
Choosing which version of R to use
Multiple R versions are available when requesting RStudio Server sessions on Open OnDemand. From the Open OnDemand page, select “Interactive Apps” > “RStudio Server”. You will be taken to a page with multiple input fields to configure your RStudio Server session and one of those is “R Version”.
- For the following exercise, select the following inputs:
- R Version: 4.3.3
- Account Name: scinet_workshop1
- Partition: atlas
- QOS: normal 14-00:00:00
- Number of hours: 2
- Number of nodes: 1
- Number of tasks: 1
- Additional Slurm Parameters: --reservation=workshop --mem=8G
- When you are in RStudio Server, install
renv. Note, we only need to installrenvbecause we chose a different version of R.
Creating and managing R environments with the renv package
Since renv is project specific, you need to change the working directory to the project directory in which you would like to use renv. Once in the project directory, you can run renv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE)) to start managing the project environment with renv.
In RStudio Server, there are also additional graphical features in the interface when renv is active. E.g., note that there is an “renv” button at the top of the “Packages” pane. If you click on it, there is a dropdown menu that includes shortcuts to the renv::snapshot() and renv::restore() functions.
- Create a new project directory.
- Initialize
renvfor the new project withrenv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE)). If you do not see the activation message when R restarts, you will have to manually call it withsource(renv/activate.R). - Add at least one script to the project and install the packages it uses with either
install.packages('PACKAGE')orrenv::install('PACKAGE'). - Take a snapshot of the environment with
renv::snapshot()to update the lockfile.
Exercise 6: Use the
renv::install()function to download the development version of the ‘nsyllable’ package on GitHub at ‘quanteda/nsyllable’. Save the script below as an R script in your project directory. What does the program do? Use the “renv” button in the “Packages” pane to make sure this project’s lockfile captures the new package. Open therenv.lockfile to see the entry for the new package.
library(magrittr)
library(nsyllable)
x <- help.search("*", package="base")
x$matches["nsyl"] <- nsyllable(x$matches$Title)
for(i in c(5,7,5)){
correct_length <- x$matches$nsyl == i
string <- x$matches$Title[correct_length] %>%
sample(1) %>%
cat('\n')
}
If you don’t want to use renv
renv is the main solution if you want a handful of commands to manage a project’s packages with a project-specific library and also document that process to increase reproducibility. If, for any reason, you are not a fan of renv, there are some commands and R-related files that can help you at least manage the locations of package installations.
R commands:
.libPaths(): This function returns the path(s) of available libraries from which packages may be loaded. If there are multiple libraries available (i.e., if.libPaths()returns multiple paths), R will search for packages across libraries in the order in which.libPaths()lists the library paths. The function can also be used to add additional available libraries: e.g.,.libPaths('/path/to/new/library')will prepend “/path/to/new/library” to the list of available libraries, making it the first library searched in when loading packages. It is thus very handy to call.libPaths('/path/to/project/library')to make your project-specific library the first place in which packages are searched for loading. However, changes to.libPaths()only persist during the R session in which it was run.install.packages(): When you install R packages withinstall.packages(), the default location for installing those packages is in the first library path returned by.libPaths(). If you want a package installed somewhere else besides that first library path, you can specify the desired path with thelibparameter ininstall.packages(): e.g.,install.packages('PACKAGE', lib=‘path/to/the/project/library’).library(): When loading a package withlibrary(), R will search for packages across libraries in the order in which.libPaths()lists the library paths. If you instead want to specify the library path from which a package should be loaded, you can use thelib.locparameter inlibrary(): e.g.,library(PACKAGE, lib.loc='path/to/the/project/library').
R-related file:
.Renviron: You can specify R environment variables in this file, includingR_LIBS_USERwhich is a path that will be prepended to the library paths maintained by.libPaths(). For example, if you haveR_LIBS_USER=path/to/the/project/library/%vin your.Renvironfile in a directory, any R session started from that directory will first look inR_LIBS_USER=path/to/the/project/library/%v, where%vis the version of R being used, for packages to load.