WeBWorK and High Availability Clusters

WeBWorK and High Availability Clusters

by Danny Glin -
Number of replies: 9
Because we run a number of in-class quizzes on WeBWorK, it is important that our server have 100% uptime during the semester. To achieve this, we've built a small cluster to ensure that a single hardware failure doesn't take down the system.

I'll post my experiences here, and I'm curious if anyone else has tried something similar/different.

We have two web servers (VMs on different physical hosts) that handle the apache side of things. Our central IT department has an enterprise-grade load balancer, so we are using that to handle incoming web requests. Web requests are sent to the load balancer, which then proxies them to one of the two web servers, alternating between them. This means we can have both servers simultaneously serving WeBWorK requests.

The WeBWorK database lives on a redundant (active-passive) server running MariaDB, and using drbd to replicate the data to the backup server, and pacemaker to handle failing over if the primary server dies.

The /opt/webwork directory on each web server is mapped to a gluster file system, which is replicated on two separate servers. This allows the two web servers to read and write the same files, and if one of the gluster servers fails, the other one can continue to serve the files.

Overall this has been pretty reliable, with a couple of caveats:
  1. Having the WeBWorK files stored on a file share means that any request that opens many files can be very slow. This means a lot of patience when using the Library Browser. It doesn't seem to impact students.
  2. When servers fail, they don't necessarily do so gracefully. If a server powers down completely, then things tend to switch over as expected. If a server becomes unresponsive (runs out of memory, disk, etc.), then the other nodes don't always declare it dead, and may continue to try to use it.
I'd like to know what other people are doing to prevent their WeBWorK servers from going down.
In reply to Danny Glin

Re: WeBWorK and High Availability Clusters

by Michael Gage -
This is a great question Danny. And I hope it will lead to an important discussion which we can summarize on the WeBWorK wiki (or the WeBWorK github wiki since the question is technical) or both. Universities with a large number of students using WW or who are using WW in class or for timed tests care a lot about this.

I can't add a lot to the discussion myself since at Rochester we have only about 1K students per semester and since we don't use WW in class (just for homework) they are not all hitting the server at once. For us nearly any modern computer (and sometimes even a pretty old one) is adequate.
In reply to Danny Glin

Re: WeBWorK and High Availability Clusters

by Tony Box -
I'm currently trying to set up an AWS-based environment for WW that can scale infinitely.

I'm using Ansible to deploy, configure servers, and set up WW as they pop online. SSL will be offloaded to the load balancer.

The only shared resource between webservers in the cluster will be an NFS mount which contains the /opt/webwork/courses, /opt/webwork/webwork2/DATA, /opt/webwork/webwork2/logs, and /opt/webwork/webwork2/tmp directories, which I believe are the only places the webservers must have write access to. I'm hoping this will minimize the performance impact by not sharing the entire /opt/webwork directories... I guess it should a bit, theoretically?

The database portion is easy in AWS since I will be using mariaDB in RDS. If we need to scale out the DB infrastructure that can be done using read replicas.

I can set up health checks that get as fancy as I want, using AWS' Lambda and CloudWatch, which can check for specific responses from each server--if one of them hits a threshold I can have the offender dropped from the load balancer pool, then have another server pop up and get automatically provisioned via Ansible.
In reply to Danny Glin

Re: WeBWorK and High Availability Clusters

by Jeremy Lounds -
Hello Danny - thank you for the valuable information! Here at Michigan State University, we are in the process of setting up a WeBWorK cluster to start testing, and I have a few questions for you (or anyone else who is doing something similar)

1) Does WeBWorK store any session data outside of the database?

2) Does WeBWorK write to any files inside /opt/webwork/webwork2?

3) Excluding admin tasks and problem set development, does WeBWorK write to /opt/webwork/libraries? In other words, would a *student's* interaction cause any writes to the libraries folder?

The reason I ask questions 2 and 3 is that we are thinking of keeping only the /opt/webwork/courses on the gluster volume, and using a central deployment tool to push library edits from a repository, in hopes of speeding up file access to those files.

Thank you, and I hope you have a great day!


In reply to Jeremy Lounds

Re: WeBWorK and High Availability Clusters

by Michael Gage -
(1) Some is stored under webwork2 All normal session data is stored in the database. There are writable logs in webwork2/logs and in webwork2/DATA

Looking at the write permissions for the server in webwork2 you see that
the directories DATA, logs,tmp and htdocs/tmp are writable. All of these locations are defined in the configuration files and can be moved.

(2) Possibly -- there are working files for constructing hard copy output:
# Contains non-web-accessible temporary files, such as TeX working directories.
$webworkDirs{tmp} = "$webworkDirs{root}/tmp";

there is a tmp directory which by default is inside /webwork/webwork/htdocs but is typically moved to something like /var/webwork/tmp which contains web accessible temporary files that store "on the fly" graphics images and web accessible links to permanent image files in the OPL
# Location of web-accessible temporary files, such as equation images.
# These two should be set in localOverrides.conf -- not here since this can be overwritten by new versions.
$webworkDirs{htdocs_temp} = "$webworkDirs{htdocs}/tmp";
$webworkURLs{htdocs_temp} = "$webworkURLs{htdocs}/tmp";
# Location of cached equation images.
$webworkDirs{equationCache} = "$webworkDirs{htdocs_temp}/equations";
$webworkURLs{equationCache} = "$webworkURLs{htdocs_temp}/equations";

Less often used is:

# Location of system-wide data files.
$webworkDirs{DATA} = "$webworkDirs{root}/DATA";

# Used for temporary storage of uploaded files.
$webworkDirs{uploadCache} = "$webworkDirs{DATA}/uploads";

These are defined in defaults.config and can be overridden in localOverrides.conf.

# Directory for temporary files
# Location of web-accessible temporary files, such as equation images.
# Default which is set in defaults.config:
#$webworkDirs{htdocs_temp} = "$webworkDirs{htdocs}/tmp";
#$webworkURLs{htdocs_temp} = "$webworkURLs{htdocs}/tmp";

# Alternate locations -- this allows you to place temporary files in a location
# that is not backed up and is the recommended set up for most installations.
# See
# for more information. Note that the wwtmp directory (or partition) should be
# created under Apache's main server document root which is usually /var/www. If this
# is in a different location on your system, edit the lines below accordingly.
# To implement, uncomment the following 6 lines:
#$webworkDirs{htdocs_temp} = '/var/www/wwtmp';
#$webworkURLs{htdocs_temp} = '/wwtmp';
#$webworkDirs{equationCache} = "$webworkDirs{htdocs_temp}/equations";
#$webworkURLs{equationCache} = "$webworkURLs{htdocs_temp}/equations";
#$courseDirs{html_temp} = "/var/www/wwtmp/$courseName";
#$courseURLs{html_temp} = "/wwtmp/$courseName";

(3) Student work and for that matter normal editing by instructors does not modify the library directories. Those are not normally writable by the webserver. Instructor modifications to problems are all local to their course.

(It is possible to install a parallel, writable version of the webwork-open-problem-library to which certain instructors can be given access so that they can correct existing problems and add new problems to the Contrib directory of the OPL. -- see

as a provisional method for doing this. )


In reply to Danny Glin

Re: WeBWorK and High Availability Clusters

by Jeremy Lounds -
Hello again,

Another HA question... we are testing WeBWorK connected to a MySQL-compatible Percona XrtraDB cluster.

It appears many WeBWorK tables do not have a primary key, which Percona essentially requires for the cluster to operate properly.

Has anyone else added an auto-increment primary key to tables that don't have one? It seems like a safe route to go, since WeBWorK wouldn't "know" about the new column, but obviously, there may be something I am not aware of that requires the table structure to be left exactly as-is.

Thanks again,


In reply to Jeremy Lounds

Re: WeBWorK and High Availability Clusters

by Michael Gage -
It should be safe as long as you use a new column name. Some of the (newer) tables do have primary keys. I've been meaning to add them to old tables as a preliminary step to trying to speed up the database lookups but haven't gotten around to it next.

If you add these primary keys and things seem ok (as I expect) could you send a pull request to and we'll do more testing and pull it into the main repo.

The "create table" codes are in webwork2/lib/WeBWorK/DB/Record has a unique id (autoincrement primary key) for example but and most of the others do not. Could we use uniq_id for the column name in each table? I don't think that would conflict with any existing columns and would make it uniform across tables. Or do you have a better idea?

You can change the tables in mysql without changing the code but if you want new courses to have the extra fields then you'll need to make changes in

The changes in the code will only affect new courses. As we find the need we'll have to make DB access commands to retrieve the uniq_id field, but
it may be that it is mostly used internally in mysql.

Thanks for looking in to this.


In reply to Michael Gage

Re: WeBWorK and High Availability Clusters

by Michael Gage -
One extra caution. If you add extra columns to the database directly (without changing the schema in the webwork code) you will get harmless error messages (usually when viewing from the admin page) that there are extra columns. If you add columns to the schema in webwork then you may get fatal errors that ask you to upgrade the course you are working with. (the upgrade tab is on the admin page).

You could change the database and see if the new uniq_id solves the issues with your outside system. Then once that is settled properly update the schema in WeBWorK so that the uniqu_id column will be added automatically to new courses.

Good luck.

In reply to Jeremy Lounds

Re: WeBWorK and High Availability Clusters

by Jeremy Lounds -
FYI, I am moving the discussion of primary keys to another thread.

In reply to Danny Glin

Re: WeBWorK and High Availability Clusters

by Allan Metts -

It's been a while since the last post here, and I haven't seen much other discussion on setting up a highly-available WeBWorK configuration in AWS.  I recently established a HA deployment at Georgia Tech, and it seems to be working well so far.  Here's what it looks like:

  • Everything automatically deployed to AWS using Terraform (infrastructure) and Ansible (server configuration)
  • WebWork configured in multiple EC2 instances (in separate AZs) behind an Application Load Balancer.  The ALB handles SSL and the associated certificates.
  • Amazon RDB as the MYSQL instance.  Better to have this managed by AWS instead of me.
  • Shared  EFS filesystem for the EC2 instances.

There is discussion upthread about what to put on the shared file system (i.e., all of /opt/webwork, or selected folders).  I settled on ONLY putting the courses and htdocs directories on the shared filesystem -- this seemed to be a happy compromise that seems to avoid inconsistent states across the various servers.

I didn't set up any symbolic links to the shared filesystem.  Instead, I simply set $webwork_courses_dir and $webwork_htdocs_dir to their respective shared folders in site.conf.

I wanted to keep log files and temporary files on the individual EC2 instances -- both to maximize performance, and to prevent messy diagnostics if something went wrong on an individual server.

We're a week into Summer semester with this configuration, and so far everything seems to be working fine.  Let me know if you spot any potential "gotchas" with this approach.