Managing packages and environments in command-line R
In this session, we will begin by using R from the command line. Later, we will cover similar steps using RStudio Server available on Open OnDemand. We will primarily focus on using the renv
package for package management, but we will also note alternatives at the end.
Choosing which version of R to use
Multiple R versions are available in the environment module system. Note that modules are named with ‘r’ and the program available after the module is loaded is ‘R’. With each new minor version of R you use, the renv
package will need to be installed.
- First, use the cluster’s environment module system to find and load the version of R you want to use for your project:
module spider r
orml spider r
(note the lower-case ‘r’!). - Load the version of R you’d like to use. E.g.,
module load r/4.4.0
orml load r/4.4.0
. - Run
R
to open an R session of the version of R you loaded from a module. - Run the R command
install.packages('renv')
to installrenv
for this version of R. - Run
q()
to exit R, and entern
when prompted to save the workspace image.
Creating and managing R environments with the renv
package
To use renv
for package management, it needs to be associated with an R project. This could be an empty directory if you are just starting a project or it could be one or more R scripts within a directory. Either way, the scope of the renv
environment will be dictated by the working directory in which it is initialized. To initialize an renv
environment, you use the renv::init()
command. For getting started and for this workshop, we recommend passing two arguments when initializing the environment: renv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE))
. These arguments keep package installations within the project directory instead of your home directory and prevent some potentially faulty URL translations from happening when packages are downloaded from repositories.
- If you are not already in your workshop directory, change into it by running
cd /90daydata/shared/$USER/
. -
Create and change directory to a project directory:
mkdir my_project cd my_project
- Start an R session with
R
. - Initialize
renv
by runningrenv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE))
. Some handy messages will appear to describe whatrenv
has done.
Exercise 1: What files have been added to the project directory? (Hint: use
ls -a
to include hidden files). What kind of content do they contain? (Hint: usecat filename
to print file contents to the screen.)
Exercise 2: Return to /90daydata/shared/$USER/, create and change directory to a new project folder
project1
, and save the following R code into a file namedexercise2.R
. (Hint: usenano exercise2.R
.) Open an R session and initializerenv
for the project. What kinds of messages appear now?
library(magrittr)
x <- help.search("*", package="base")
N <- 3
for(i in c(1:N,N:1)){
string <- x$matches$Title %>%
sample(i) %>%
cat('\n')
}
R will have to be restarted before the project library is setup, i.e., our exercise2.R
script won’t run successfully.
- Quit the current session and run
R
to open a new one. - Run the script with
source('exercise2.R')
.
Now we are set up with a project with an renv
environment!
Installing and managing R packages in your project library
Next, we will expand our project with additional packages!
- You can install packages into your environment as you normally would with
install.packages('PACKAGE')
, orrenv
does have an expanded installation functionrenv::install('PACKAGE')
that supports additional remote package sources, e.g., GitHub. If you are interested in learning more aboutrenv::install()
, please see the documentation here. - To have
renv
save the state of the project (i.e., capture all the metadata of the used packages) in the environment configuration file called a ‘lockfile’, runrenv::snapshot()
. - If you want to assess the state of the environment, (i.e., which packages are installed but not used, or which packages are used but not recorded), run
renv::status()
. - Let’s save the script below in our project and install an old version of the
cli
package so we can simulate needing to update to the latest version next:install.packages("https://cran.r-project.org/src/contrib/Archive/cli/cli_3.6.1.tar.gz", repos=NULL,type="source")
. - If we call
renv::status()
, it will tell us we are out of sync. If we then callrenv::snapshot()
, it will update the project.
library(magrittr)
library(cli)
x <- help.search("*", package="base")
N <- 3
for(i in c(1:N,N:1)){
string <- x$matches$Title %>%
sample(i) %>%
col_magenta() %>%
cat('\n')
}
Exercise 3: Update the version of
cli
with:install.packages('cli')
. Modify the environment to be consistent.
- Another
renv
function to make your project environment consistent isrenv::restore()
. It helps update your project library to match your lockfile. - For example, if we install the
MASS
library because we think we may need it but later don’t,renv::restore(clean=TRUE)
will help remove the unused package from the project library. renv::restore()
can also be used to revert package version discrepancies like forcli
above.
Reproduce renv projects
In order to make environments and package management truly useful, we need a mechanism to easily reproduce environments. With the renv
project files we looked at before, you can have renv
reproduce the same environment in a new project directory.
- The
renv
directories and files that should remain with the project are therenv.lock
file, therenv/activate.R
andrenv/settings.json
files, and the.Rprofile
file. With these files, the project environment can be easily recreated, therefore helping to ensure that your code and analyses are fully reproducible. - If you are using git for version control for the project,
renv
adds therenv
files that do not need to be tracked (i.e., the packages themselves) to the.gitignore
file for you.
Exercise 4: Create a new project directory in your workshop directory. Copy over the script and lockfile from
project1
into the new project. From the new project directory, runR
. What happens? Try initializing.
Exercise 5: Create another new project directory in your workshop directory. Copy over all of
project1
’s files exceptrenv/library
andrenv/staging
(the package files) into the new project. From the new project directory, runR
. What happens?
Managing packages and environments in RStudio Server
The approach of using renv
in RStudio is very similar to using renv
with command-line R. For completeness, we will create another environment in RStudio Server.
Choosing which version of R to use
Multiple R versions are available when requesting RStudio Server sessions on Open OnDemand. From the Open OnDemand page, select “Interactive Apps” > “RStudio Server”. You will be taken to a page with multiple input fields to configure your RStudio Server session and one of those is “R Version”.
- For the following exercise, select the following inputs:
- R Version: 4.3.3
- Account Name: scinet_workshop1
- Partition: atlas
- QOS: normal 14-00:00:00
- Number of hours: 2
- Number of nodes: 1
- Number of tasks: 1
- Additional Slurm Parameters: --reservation=workshop --mem=8G
- When you are in RStudio Server, install
renv
. Note, we only need to installrenv
because we chose a different version of R.
Creating and managing R environments with the renv
package
Since renv
is project specific, you need to change the working directory to the project directory in which you would like to use renv
. Once in the project directory, you can run renv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE))
to start managing the project environment with renv
.
In RStudio Server, there are also additional graphical features in the interface when renv
is active. E.g., note that there is an “renv” button at the top of the “Packages” pane. If you click on it, there is a dropdown menu that includes shortcuts to the renv::snapshot()
and renv::restore()
functions.
- Create a new project directory.
- Initialize
renv
for the new project withrenv::init(settings = list(use.cache = FALSE, ppm.enabled = FALSE))
. If you do not see the activation message when R restarts, you will have to manually call it withsource(renv/activate.R)
. - Add at least one script to the project and install the packages it uses with either
install.packages('PACKAGE')
orrenv::install('PACKAGE')
. - Take a snapshot of the environment with
renv::snapshot()
to update the lockfile.
Exercise 6: Use the
renv::install()
function to download the development version of the ‘nsyllable’ package on GitHub at ‘quanteda/nsyllable’. Save the script below as an R script in your project directory. What does the program do? Use the “renv” button in the “Packages” pane to make sure this project’s lockfile captures the new package. Open therenv.lock
file to see the entry for the new package.
library(magrittr)
library(nsyllable)
x <- help.search("*", package="base")
x$matches["nsyl"] <- nsyllable(x$matches$Title)
for(i in c(5,7,5)){
correct_length <- x$matches$nsyl == i
string <- x$matches$Title[correct_length] %>%
sample(1) %>%
cat('\n')
}
If you don’t want to use renv
renv
is the main solution if you want a handful of commands to manage a project’s packages with a project-specific library and also document that process to increase reproducibility. If, for any reason, you are not a fan of renv
, there are some commands and R-related files that can help you at least manage the locations of package installations.
R commands:
.libPaths()
: This function returns the path(s) of available libraries from which packages may be loaded. If there are multiple libraries available (i.e., if.libPaths()
returns multiple paths), R will search for packages across libraries in the order in which.libPaths()
lists the library paths. The function can also be used to add additional available libraries: e.g.,.libPaths('/path/to/new/library')
will prepend “/path/to/new/library” to the list of available libraries, making it the first library searched in when loading packages. It is thus very handy to call.libPaths('/path/to/project/library')
to make your project-specific library the first place in which packages are searched for loading. However, changes to.libPaths()
only persist during the R session in which it was run.install.packages()
: When you install R packages withinstall.packages()
, the default location for installing those packages is in the first library path returned by.libPaths()
. If you want a package installed somewhere else besides that first library path, you can specify the desired path with thelib
parameter ininstall.packages()
: e.g.,install.packages('PACKAGE', lib=‘path/to/the/project/library’)
.library()
: When loading a package withlibrary()
, R will search for packages across libraries in the order in which.libPaths()
lists the library paths. If you instead want to specify the library path from which a package should be loaded, you can use thelib.loc
parameter inlibrary()
: e.g.,library(PACKAGE, lib.loc='path/to/the/project/library')
.
R-related file:
.Renviron
: You can specify R environment variables in this file, includingR_LIBS_USER
which is a path that will be prepended to the library paths maintained by.libPaths()
. For example, if you haveR_LIBS_USER=path/to/the/project/library/%v
in your.Renviron
file in a directory, any R session started from that directory will first look inR_LIBS_USER=path/to/the/project/library/%v
, where%v
is the version of R being used, for packages to load.