R in WeBWorK

From WeBWorK_wiki
Revision as of 13:29, 24 November 2016 by Dglin (talk | contribs) (Add instructions for switching branches, instructions on systemd, and path to sample problem)
Jump to navigation Jump to search

R integration gives problem authors the ability to run arbitrary computations in R and make their results available to the rest of the PG code as if they were constructed using the standard PG functions. (In our running example, we could construct the random vector in R with `sample`, and/or calculate its mean with `mean`.) The reason why one might want to do this is R's rich library of high-quality statistical functions as well as its graphical abilities. While in theory these could be both replicated with PG, it would take a huge effort that can be better spent by simply using the functionality already available in R.

Required setup

Use a compatible version of WeBWorK

Starting with PG pull request #214, WeBWorK has a built-in capability to use R code in authoring problems. If you have WeBWorK 2.12 installed, you will need to switch to a version which supports R integration:

  • develop - the development version of WeBWorK. This is the latest WeBWorK code that includes untested features. While efforts are made to keep this stable, it is pre-release, and thus may have undiscovered issues.
  • 2.12-r - a copy of the 2.12 release of WeBWorK with only the Rserve support added on. If you are running 2.12, switching to this branch should not require any significant changes to your WeBWorK installation.

To switch to one of these versions, you will need to run the following commands as root (or add sudo in front of the commands):

   cd /opt/webwork/webwork2
   git fetch origin
   git branch -t [version] origin/[version]
   git checkout [version]
   cd /opt/webwork/pg
   git fetch origin
   git branch -t [version] origin/[version]
   git checkout [version]

(where [version] is either develop or 2.12-r).

If you switch to the develop version, you may need to log into the admin course and upgrade your courses.

After you have a working WeBWorK server, there are three distinct steps that need to happen:

  1. set up the R server
  2. install the Perl module for the Perl-R bridge
  3. configure WeBWorK with the location of the R server

Set up the R server

Your R server (which can be on the same server as WeBWorK, but in a real-world scenario will more likely run on its own hardware/VM), is quite easy to set up:

  1. install R following OS-specific instructions
  2. install the Rserve server, which allows remote clients to execute R code on the Rserve's server and returns the execution's result in response. The easiest way is to run (as an administrative user) Rscript -e 'install.packages("Rserve", repo="https://cran.rstudio.com")'. Note that this package includes C source that needs to be compiled, so you have to have the basic developer tools present on the server.
  3. run the Rserve service as appropriate to your system. The command you need to run is R CMD Rserve. This will start the Rserve daemon which will listen on port 6311. The daemon only accepts connections from the localhost; if you run WebWork and Rserve on separate servers, read the final section of this page for additional configuration steps for your system. For systems using systemd (RHEL CentOS > 7, Ubuntu >15), you may use the following instructions to have Rserve start on boot:
    • Download this file, and place it in /usr/lib/systemd/system. Now you can start Rserve with the command (as a superuser) systemctl start rserve. You can also set Rserve to start at startup with the command systemctl enable rserve.

Install the Perl-R bridge

Perl module Statistics::R::IO implements Rserve's communication protocol in Perl and provides translation from R data structures to Perl's. It is available on CPAN and can be installed in the standard manner for Perl modules, e.g., by running (as an admin user) cpan Statistics::R::IO.

Configure Webwork with the location of the R server

PG macros that communicate with R need to know the location of the R server, which is a URL . You can do this by modifying "${WW_PREFIX}/webwork2/conf/defaults.conf", but since it's a local configuration, you should put it in localOverrides.conf:

$pg{specialPGEnvironmentVars}{Rserve} = {host => "localhost"};

The value of this variable should be a reference to a hash with at least the key host. If you are running Rserve on a non-standard port (i.e., not on 6311), you should specify it with the key port.

You should now be able to load questions which call R. There are a number of such questions already in the OPL, for example Library/UBC/STAT/STAT300/hw07/stat300_hw07_q02.pg (which can be found under Statistics -> Simple linear regression -> Hypothesis tests). If your R server is working properly, you should see a scatterplot in this question.


Additional configuration when WeBWorK and R are on separate hosts

If you run WeBWorK and R on separate hosts, you can either set up a tunnel to forward port 6311 from WeBWorK to R's host, or do the following:

  1. set up Rserve to listen on all network interfaces, not just localhost by adding a line "remote enable" to file "/etc/Rserv.conf":
cat <'EOF' >> /etc/Rserv.conf
remote enable
EOF
  1. use the correct host name (instead of "localhost") in the "Rserve host" line of "localOverrides.conf". For example:
 $pg{specialPGEnvironmentVars}{Rserve} = {host => "www.example.com"};

Note that in this case you will also want to set up the Rserve host's firewall to only allow connections from the WeBWorK host(s), or otherwise it will happily execute arbitrary code from any Rserve client anywhere on the internet!

Authoring problems with R code

The way the R integration works is that WeBWorK uses a Perl module that can talk to a server running the Rserve software, which allows remote clients to execute R code on the Rserve's server and returns the execution's result in response. The Perl module converts this response from R's native values (e.g., a generic vector, aka "list") to those understandable to Perl (e.g., an array), making them available to the rest of the PG code.

Loading the macros

To use R code in a problem, include "RserveClient.pl" in the "loadMacros" call at the start of the question. For example:

loadMacros(
  "PGstandard.pl",     # Standard macros for PG language
  "MathObjects.pl",
  "RserveClient.pl"    # <--- R integration
);

Basic Rserve macros

The Rserve software creates an R session for each remote client. This means that clients' interactions with R are kept separate from each other, just as if you started R twice on your local computer. A session persists as long as the client is connected, so that multiple calls from the client using the same session see the objects created in previous calls. (This behaviour mirrors what happens in a local session, where each R command you execute at the console after pressing ENTER sees the results of earlier commands.) When a sessions is *closed*, its contents are wiped off without a trace, just like quitting the R application run locally.

The RserveClient offers macros to start and finish a session, and execute R commands in the current session:

  1. rserve_eval("some R code"): this function sends the R code given as its string argument to Rserve for execution. It returns *an array* representation of the R code's result. (This means that the value of rserve_eval("pi") is an array with a single element 3.14159265358979. If you want to keep this value and use it in the rest of the problem, assign it to an array variable. For example:

    @pi = rserve_eval("pi");

    Note: Multiple calls within the same problem share the R session and the object workspace, so you can break up your R code in as many rserve_eval statements as you'd like.

  2. rserve_start(), rserve_finish(): Start up and close the current connection to the Rserve server. In normal use, these functions are completely optional because the first call to rserve_eval will call start the Rserve session if one is not already open. Similarly, the current session will be automatically closed at the end of the problem. Other than backward compatibility, the only reason for using these functions is to start a new clean session within a single problem, which shouldn't be a common occurrence.

A note on Perl quoting rules

Beware of Perl's quoting rules when writing R code. The text in double quotes gets interpreted for escape sequences (e.g., "\n" represents a newline) and variables (e.g., "The value of pi is $pi[0]" will be interpolated into "The value of pi is 3.14159265358979", given the code above). This is a problem if you're trying to extract an element of a list by name using the "$" operator in R because the text following it will be interpreted as a variable. For instance, running rserve_eval("cars$speed") will not return the "speed" column of the standard "cars" dataset, because "$speed" in the string will be replaced by the value of the PG variable $speed, which if not yet defined will be empty string, so that the R code that actually gets executed is simply "cars". Instead, using single quotes, which prevent variable and escape sequence interpolation and instead keep the string exactly as entered: rserve_eval('cars$speed').

On the other hand, some time you actually might want variable interpolation to be done, for instance to construct the R code that uses values of variables constructed with PG functions. For instance:

Context("Numeric");

$pi = Real("pi");
@difference = rserve_eval("pi - $pi");

will calculate the difference between the value between R and PG's values of "pi" and put the result in the @difference array. Note that the same R code can be constructed using Perl's string concatenation operator dot ("."): rserve_eval('pi - ' . $pi). Personally, I recommend sticking with single quotes to prevent unwanted surprises, and using the dot operator if needing to include the value of a PG variable.

Displaying R graphics

R has excellent facilities for creating production-quality statistical graphics, from simple scatterplots to complex spatial visualizatons overlaid on geographical maps. These graphics can be produced in a variety of formats (in R parlance, devices), from the user's monitor to PDF or JPG files. The RserveClient allows the author to present these graphics in the question by bracketing the R graphing code with macros rserve_start_plot and rserve_finish_plot, and then showing the produced image in the question with PG's macro image (see the WeBWorK documentation).

The following code is a complete example

DOCUMENT();

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "RserveClient.pl",
);

# Print problem number and point value (weight) for the problem
TEXT(beginproblem());

#  Setup
Context("Numeric");

$mean = random(-2, 2, .5);

$img = rserve_start_plot('png');
rserve_eval('curve(dnorm(x, mean=' . $mean . '), xlim=c(-4, 4)); 0');
$image_path = rserve_finish_plot($img);

#  Text
Context()->texStrings;
BEGIN_TEXT

What is the mean of the normal distribution shown in the figure below:
\{ ans_rule(5) \}

$PAR

\{ image($image_path, width=>300, height=>300) \}:
END_TEXT

Context()->normalStrings;

#  Answers
ANS(Real($mean)->cmp);

ENDDOCUMENT();

The four key lines are as follow:

  1. $img = rserve_start_plot('png'): sets up R to plot to a 'PNG' file and returns a unique plot identifier to be used later.

  2. rserve_eval('curve(...)'): runs plotting commands on the R server

  3. $image_path = rserve_finish_plot($img): completes the plotting to the PNG file and transfers it to a location on the WebWork server. Returns the path of the file, which is stored in Perl variable $image_path and later used as the first argument to the image macro.

  4. \{ image($image_path, width=>300, height=>300) \}: inserts the image into the web page.

Transferring files from the R server

Sometimes it may be convenient to make a file from R server available to the student via a link in Webwork. (For instance, using R to generate a (potentially randomized) data file that the student can download to work on the problem offline.) The macro rserve_get_file REMOTE_NAME [, LOCAL_NAME] can be used to transfer the file REMOTE_NAME from the R server to WebWork's temporary file area, and returns the name of the local file that can then be used by the htmlLink macro. Specifying LOCAL_NAME is optional; if it is not specified, the filename portion of the REMOTE_NAME is used.

The following code is a complete example

DOCUMENT();

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "RserveClient.pl",
);

# Print problem number and point value (weight) for the problem
TEXT(beginproblem());

#  Setup
Context("Numeric");

my ($intercept, $slope) = rserve_eval('coef(lm(log(dist)~log(speed), data = cars))');

my ($remote_file) = rserve_eval('filename <- tempfile(fileext=".csv"); write.csv(cars, filename); filename');
my $local_file = rserve_get_file($remote_file);

($local_url = $local_file) =~ s|$tempDirectory|$tempURL|;

#  Text
Context()->texStrings;
BEGIN_TEXT

What is the slope of the linear regression of log-transformed stopping distance vs. car speed in the dataset linked below:
\{ ans_rule(5) \}

$PAR

\{ htmlLink($local_url, "Download") \} the problem data (CSV file).

END_TEXT

Context()->normalStrings;

#  Answers
ANS(Real($slope)->cmp);

ENDDOCUMENT();

The four key lines are as follow:

  1. my ($remote_file) = rserve_eval('filename <- tempfile(fileext=".csv"); write.csv(cars, filename); filename'): stores the desired dataset into a temporary CSV file on the R server and returns its path, which is stored in Perl variable $remote_file.

  2. my $local_file = rserve_get_file($remote_file): transfers the file from the R server to WeBWorK's temporary file area and returns its path, which is stored in Perl variable $local_file.

  3. ($local_url = $local_file) =~ s|$tempDirectory|$tempURL|: converts the local file path into a URL that can be used as an argument to the htmlLink macro, saving it in Perl variable $local_url.

  4. \{ htmlLink($local_url, "Download") \}: inserts the link to the downloaded file into the web page.