UsingWW: WeBWorK Server stalling and resolution

I wanted to report an experience we had relating to tracking down a stalling WeBWorK server.

Symptoms: At what appeared to be random intervals, and independent of server load, all requests to the WeBWorK server froze (browsers continued to wait for response). After some time (as many as 15-20 minutes later), the server would resume without any apparent side effects. The timing log (on our server at /opt/webwork/webwork2/logs/timing.log) did not give any evidence of these stalls, with timings comparable to any other time of the day. Logging directly onto the server showed no unusual loads.

The Culprit: The mysql database was being backed up. In order to maintain database integrity, the database had a READ lock put on all of the tables so that no changes could be made. Once the back-up was complete, the lock was freed and the server resumed as before.

Observations in the Process of Debugging:

I took advantage of the option to turn on debugging in WEBWORK_ROOT/lib/WeBWorK/Constants.pm so that the apache2/error.log file gave step-by-step updates as to what step in responding to requests were obtained. I then noticed that during the stalls, the user-authentication process was interrupted in the log file, indicating that the stall was occurring during the Authentication process. (Caution: I also had to make a few edits so that unencrypted passwords were not ending up in the log-file.)

I also took advantage of the database debugging option that is in the global.conf settings file, which copied every SQL request to the error.log file. Both of the debugging settings confirmed that the stall was occurring during an attempt to write to the database. (Reads were unaffected, so the root (i.e. webwork2/) and individual course login pages continued to render properly.

The reason the timing.log file did not show evidence of the stalls is that this routine only deals with the rendering stage. I have since modified our local installation so that the timing.log file records the full time to respond to a request as well as the rendering time.

When I started looking at the timing of the stalls, they were not actually random, but actually were occurring every 6 hours. When I sent the IT department my hypothesis for why we saw the stalls, they confirmed that these times corresponded to the SQL server being backed up.

Resolution: We have rescheduled the backup of the database tables to once a day and during the early morning hours when there should not be much activity.

Ongoing Issue: I think there is still a separate issue as there is a time in the morning when the actual machine load goes up and response times increase (but are at least manageable). My hypothesis is that there is some type of auto-update or other scheduled task occurring. The server admin has since modified some settings on auto-update, but I've yet to confirm if this does anything.

Re: WeBWorK Server stalling and resolution

by Arnold Pizer - Tuesday, 28 August 2012, 5:17 PM

Hi Brian,

Thanks very much for reporting this.

Concerning the slowdowns you are still experiencing. Could these be caused by backups? If so, make sure you are not backing up the tmp files and maybe things like MathJax and jsMath files (of which there are many). See

http://webwork.maa.org/wiki/Store_WeBWorK%27s_temporary_files_in_a_separate_directory_or_partition

and

http://webwork.maa.org/wiki/Clean_Out_Temporary_Files#Using_Cron_Jobs_to_remove_temporary_files

Can you please send us the details on "The reason the timing.log file did not show evidence of the stalls is that this routine only deals with the rendering stage. I have since modified our local installation so that the timing.log file records the full time to respond to a request as well as the rendering time." This certainly is something we should think about adding to the standard distribution. Do I understand that you have a separate database server or is MySQL running on the weBWorK server?

Arnie

Re: WeBWorK Server stalling and resolution

by Danny Glin - Tuesday, 28 August 2012, 5:37 PM

Just one quick comment on troubleshooting:

"Reads were unaffected, so the root (i.e. webwork2/) and individual course login pages continued to render properly."

It turns out that the root page (/webwork2/) doesn't have any database interaction at all. That page continues to work on our servers even when the DB server is dead. You don't find out that the DB server is dead until you click on a course, which then triggers the first DB read.

Re: WeBWorK Server stalling and resolution

by Michael Gage - Tuesday, 28 August 2012, 7:41 PM

Danny,

Do you think there is a way to use this observation to give better error messages about what is going wrong when DB is down?

I don't want to hide useful information for experts but something helpful but less frightening than the usual perl warnings might be welcomed.

Mike

Using WeBWorK

WeBWorK Main Forum

WeBWorK Server stalling and resolution

WeBWorK Server stalling and resolution

Re: WeBWorK Server stalling and resolution

Re: WeBWorK Server stalling and resolution

Re: WeBWorK Server stalling and resolution