R in WeBWorK

From WeBWorK_wiki
Jump to navigation Jump to search

R integration gives problem authors the ability to run arbitrary computations in R and make their results available to the rest of the PG code as if they were constructed using the standard PG functions. (In our running example, we could construct the random vector in R with `sample`, and/or calculate its mean with `mean`.) The reason why one might want to do this is R's rich library of high-quality statistical functions as well as its graphical abilities. While in theory these could be both replicated with PG, it would take a huge effort that can be better spent by simply using the functionality already available in R.

Required setup

Use a compatible version of WeBWorK

As of version 2.13, WeBWorK as the built-in capability to use R code in authoring problems. Make sure that you have updated your server according to the upgrade instructions.

After you have a working WeBWorK server, there are three distinct steps that need to happen:

  1. set up the R server
  2. install the Perl module for the Perl-R bridge
  3. configure WeBWorK with the location of the R server

Set up the R server

Your R server (which can be on the same server as WeBWorK, but in a real-world scenario will more likely run on its own hardware/VM), is quite easy to set up:

  1. install R following OS-specific instructions
  2. install the Rserve server, which allows remote clients to execute R code on the Rserve's server and returns the execution's result in response. The easiest way is to run (as an administrative user) Rscript -e 'install.packages("Rserve", repo="https://cran.rstudio.com")'. Note that this package includes C source that needs to be compiled, so you have to have the basic developer tools present on the server. On Ubuntu there is also a package r-cran-rserve available.
  3. run the Rserve service as appropriate to your system. The command you need to run is R CMD Rserve. This will start the Rserve daemon which will listen on port 6311. The daemon only accepts connections from the localhost; if you run WebWork and Rserve on separate servers, read the final section of this page for additional configuration steps for your system. For systems using systemd (RHEL CentOS > 7, Ubuntu >15), you may use the following instructions to have Rserve start on boot:
    • Download this file, and place it in /usr/lib/systemd/system. Now you can start Rserve with the command (as a superuser) systemctl start rserve. You can also set Rserve to start at startup with the command systemctl enable rserve.
    • Note: if the folder /usr/lib/systemd/system does not exist, your system may expect you to place the file in /lib/systemd/system instead. You should also check to see where R is installed on your system. The file linked to above assumes that R is installed in /usr/lib64/R but on some systems (for example, Ubuntu 18.04) it might be in /usr/lib/R, in which case you'll have to edit the file to correct this path (in two places).
    • Note: if you run the service with a high number of users, and do a lot of temporary file I/O (see the last example below), you might eventually run into a situation where Rserve is still running, but not responding to the requests from WeBWorK - it seems to be issues with total number of file handlers, although this isn't entirely clear. Thus, it can be useful to force a restart on the daemon every few days. You can do this by modifying the [Service] section to say
   Restart=always
   RuntimeMaxSec=7d

If you then

   systemctl daemon-reload
   systemctl restart rserve

then the Rserve service will restart itself once a week, which can avoid this sort of behaviour.

Install the Perl-R bridge

Perl module Statistics::R::IO implements Rserve's communication protocol in Perl and provides translation from R data structures to Perl's. It is available on CPAN and can be installed in the standard manner for Perl modules, e.g., by running (as an admin user) cpan Statistics::R::IO.

Configure Webwork with the location of the R server

PG macros that communicate with R need to know the location of the R server, which is a URL . You can do this by modifying "${WW_PREFIX}/webwork2/conf/defaults.conf", but since it's a local configuration, you should put it in localOverrides.conf:

$pg{specialPGEnvironmentVars}{Rserve} = {host => "localhost"};

The value of this variable should be a reference to a hash with at least the key host. If you are running Rserve on a non-standard port (i.e., not on 6311), you should specify it with the key port.

You should now be able to load questions which call R. There are a number of such questions already in the OPL, for example Library/UBC/STAT/STAT300/hw07/stat300_hw07_q02.pg (which can be found under Statistics -> Simple linear regression -> Hypothesis tests). If your R server is working properly, you should see a scatterplot in this question.

Additional configuration when WeBWorK and R are on separate hosts

If you run WeBWorK and R on separate hosts, you can either set up a tunnel to forward port 6311 from WeBWorK to R's host, or do the following:

  1. set up Rserve to listen on all network interfaces, not just localhost by adding a line "remote enable" to file "/etc/Rserv.conf":
     cat <'EOF' >> /etc/Rserv.conf
       remote enable
       EOF
  2. Use the correct host name (instead of "localhost") in the "Rserve host" line of "localOverrides.conf". For example:
    $pg{specialPGEnvironmentVars}{Rserve} = {host => "www.example.com"};

Note that in this case you will also want to set up the Rserve host's firewall to only allow connections from the WeBWorK host(s), or otherwise it will happily execute arbitrary code from any Rserve client anywhere on the internet!

Troubleshooting Installation Problems

Please note that you must be running at least version 2.12-r (preferably 2.13 or newer) of WeBWorK in order for the integration to work. It's mentioned at the top of the page, but it's easy to miss.

Then, if you happen to be on CentOS 7, there's a special configuration set for root which causes the CPAN install of Statistics::R::IO to not be recognized by Apache and WeBWorK. See this forum post for more details on disabling this. Another forum thread which goes through the troubleshooting step-by-step is here.

If you are using R 3.5.0 or newer, there is an incompatibility issue with version 1.7 of Rserve, which is the version provided by CRAN as of this writing. If you receive errors like "Unrecognized response type" when trying to load problems involving R, then this could be the issue. The solution is to install the latest version of Rserve from source, which can be done by running the following command in the R console (as root):

install.packages('Rserve',,"http://rforge.net/",type="source")

Authoring problems with R code

The way the R integration works is that WeBWorK uses a Perl module that can talk to a server running the Rserve software, which allows remote clients to execute R code on the Rserve's server and returns the execution's result in response. The Perl module converts this response from R's native values (e.g., a generic vector, aka "list") to those understandable to Perl (e.g., an array), making them available to the rest of the PG code.

Loading the macros

To use R code in a problem, include "RserveClient.pl" in the "loadMacros" call at the start of the question. For example:

loadMacros(
  "PGstandard.pl",     # Standard macros for PG language
  "MathObjects.pl",
  "RserveClient.pl"    # <--- R integration
);

Basic Rserve macros

The Rserve software creates an R session for each remote client. This means that clients' interactions with R are kept separate from each other, just as if you started R twice on your local computer. A session persists as long as the client is connected, so that multiple calls from the client using the same session see the objects created in previous calls. (This behaviour mirrors what happens in a local session, where each R command you execute at the console after pressing ENTER sees the results of earlier commands.) When a sessions is *closed*, its contents are wiped off without a trace, just like quitting the R application run locally.

The RserveClient offers macros to start and finish a session, and execute R commands in the current session:

  1. rserve_eval("some R code"): this function sends the R code given as its string argument to Rserve for execution. It returns *an array* representation of the R code's result. (This means that the value of rserve_eval("pi") is an array with a single element 3.14159265358979. If you want to keep this value and use it in the rest of the problem, assign it to an array variable. For example:

    @pi = rserve_eval("pi");

    Note: Multiple calls within the same problem share the R session and the object workspace, so you can break up your R code in as many rserve_eval statements as you'd like.

  2. rserve_start(), rserve_finish(): Start up and close the current connection to the Rserve server. In normal use, these functions are completely optional because the first call to rserve_eval will call start the Rserve session if one is not already open. Similarly, the current session will be automatically closed at the end of the problem. Other than backward compatibility, the only reason for using these functions is to start a new clean session within a single problem, which shouldn't be a common occurrence.

A note on Perl quoting rules

Beware of Perl's quoting rules when writing R code. The text in double quotes gets interpreted for escape sequences (e.g., "\n" represents a newline) and variables (e.g., "The value of pi is $pi[0]" will be interpolated into "The value of pi is 3.14159265358979", given the code above). This is a problem if you're trying to extract an element of a list by name using the "$" operator in R because the text following it will be interpreted as a variable. For instance, running rserve_eval("cars$speed") will not return the "speed" column of the standard "cars" dataset, because "$speed" in the string will be replaced by the value of the PG variable $speed, which if not yet defined will be empty string, so that the R code that actually gets executed is simply "cars". Instead, using single quotes, which prevent variable and escape sequence interpolation and instead keep the string exactly as entered: rserve_eval('cars$speed').

On the other hand, some time you actually might want variable interpolation to be done, for instance to construct the R code that uses values of variables constructed with PG functions. For instance:

Context("Numeric");

$pi = Real("pi");
@difference = rserve_eval("pi - $pi");

will calculate the difference between the value between R and PG's values of "pi" and put the result in the @difference array. Note that the same R code can be constructed using Perl's string concatenation operator dot ("."): rserve_eval('pi - ' . $pi). Personally, I recommend sticking with single quotes to prevent unwanted surprises, and using the dot operator if needing to include the value of a PG variable.

Displaying R graphics

R has excellent facilities for creating production-quality statistical graphics, from simple scatterplots to complex spatial visualizatons overlaid on geographical maps. These graphics can be produced in a variety of formats (in R parlance, devices), from the user's monitor to PDF or JPG files. The RserveClient allows the author to present these graphics in the question by bracketing the R graphing code with macros rserve_start_plot and rserve_finish_plot, and then showing the produced image in the question with PG's macro image (see the WeBWorK documentation).

The following code is a complete example

DOCUMENT();

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "RserveClient.pl",
);

# Print problem number and point value (weight) for the problem
TEXT(beginproblem());

#  Setup
Context("Numeric");

$mean = random(-2, 2, .5);

$img = rserve_start_plot('png');
rserve_eval('curve(dnorm(x, mean=' . $mean . '), xlim=c(-4, 4)); 0');
$image_path = rserve_finish_plot($img);

#  Text
Context()->texStrings;
BEGIN_TEXT

What is the mean of the normal distribution shown in the figure below:
\{ ans_rule(5) \}

$PAR

\{ image($image_path, width=>300, height=>300) \}:
END_TEXT

Context()->normalStrings;

#  Answers
ANS(Real($mean)->cmp);

ENDDOCUMENT();

The four key lines are as follow:

  1. $img = rserve_start_plot('png'): sets up R to plot to a 'PNG' file and returns a unique plot identifier to be used later.

  2. rserve_eval('curve(...)'): runs plotting commands on the R server

  3. $image_path = rserve_finish_plot($img): completes the plotting to the PNG file and transfers it to a location on the WebWork server. Returns the path of the file, which is stored in Perl variable $image_path and later used as the first argument to the image macro.

  4. \{ image($image_path, width=>300, height=>300) \}: inserts the image into the web page.

Transferring files from the R server

Sometimes it may be convenient to make a file from R server available to the student via a link in Webwork. (For instance, using R to generate a (potentially randomized) data file that the student can download to work on the problem offline.) The macro rserve_get_file REMOTE_NAME [, LOCAL_NAME] can be used to transfer the file REMOTE_NAME from the R server to WebWork's temporary file area, and returns the name of the local file that can then be used by the htmlLink macro. Specifying LOCAL_NAME is optional; if it is not specified, the filename portion of the REMOTE_NAME is used.

The following code is a complete example

DOCUMENT();

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "RserveClient.pl",
);

# Print problem number and point value (weight) for the problem
TEXT(beginproblem());

#  Setup
Context("Numeric");

my ($intercept, $slope) = rserve_eval('coef(lm(log(dist)~log(speed), data = cars))');

my ($remote_file) = rserve_eval('filename <- tempfile(fileext=".csv"); write.csv(cars, filename); filename');
my $local_file = rserve_get_file($remote_file);

($local_url = $local_file) =~ s|$tempDirectory|$tempURL|;

#  Text
Context()->texStrings;
BEGIN_TEXT

What is the slope of the linear regression of log-transformed stopping distance vs. car speed in the dataset linked below:
\{ ans_rule(5) \}

$PAR

\{ htmlLink($local_url, "Download") \} the problem data (CSV file).

END_TEXT

Context()->normalStrings;

#  Answers
ANS(Real($slope)->cmp);

ENDDOCUMENT();

The four key lines are as follow:

  1. my ($remote_file) = rserve_eval('filename <- tempfile(fileext=".csv"); write.csv(cars, filename); filename'): stores the desired dataset into a temporary CSV file on the R server and returns its path, which is stored in Perl variable $remote_file.

  2. my $local_file = rserve_get_file($remote_file): transfers the file from the R server to WeBWorK's temporary file area and returns its path, which is stored in Perl variable $local_file.

  3. ($local_url = $local_file) =~ s|$tempDirectory|$tempURL|: converts the local file path into a URL that can be used as an argument to the htmlLink macro, saving it in Perl variable $local_url.

  4. \{ htmlLink($local_url, "Download") \}: inserts the link to the downloaded file into the web page.