UsingWW: Bob Byerly - Server maintenance -- memory leaks?

$inactiveTopic$ Server maintenance -- memory leaks?

topic started 2/14/2005; 11:51:23 AM
last post 8/29/2005; 12:12:23 AM

$user$ Bob Byerly - Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 11:51:23 AM (reads: 2725, responses: 12)

I hope this isn't off-topic, since it concerns a WebWork server rather than WebWork, but I'm hoping that readers will have some suggestions and comments about their experiences.

We've had several crashes (about every two weeks) this semester on our WebWork server that I believe are due to memory leaks in the apache-mod_perl processes. (The server, a 4-processor Xeon with 2G of memory, is not rebooting automatically, nor is it recording any error messages. The reason I think this is software related is that I can cause the same symptoms with a badly written pg file.) I was wondering if other people have had similar problems and what they are doing about them.

Things we've tried:

I looked at Arnold Pizer's message in http://webhost.math.rochester.edu/webworkdocs/discuss/msgReader$701 and set our Apache parameters similarly. The main difference (and this may have been a mistake) is that I set MaxRequestsPerChild considerably higher (to 200). Since Arnold's message is from the pre-mod_perl days, things may have changed. What do others find optimal for this and other parameters?
We're also trying rebooting daily in the wee hours from a cronjob. (Except Sunday mornings. Last crash happened Sunday evening. Hmm.)
Setting the Linux kernel "panic" parameter to a non-zero value so that it would reboot automatically after a kernel panic had no effect.

I'm considering:

starting the Apache processes with a hard memory limit (via ulimit). Are there any WebWork issues involved with this?
getting a watchdog card. (I'm aware they exist but don't know much about them.)

Any suggestions will be appreciated. (If we get enough good suggestions on this topic, a WebWork FAQ might not be a bad idea!)

<| Post or View Comments |>

$user$ Michael Gage - Re: Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 12:19:05 PM (reads: 2996, responses: 0)

If there is a really bad memory leak, then using a utility such as top you should be able to watch the memory size of the child grow. We could see this with webwork1.9 when we were caching code and reusing it. The size of the process would grow visibily as we watched. I haven't seen that happen with mod_perl and webwork2.

Type

top

into a unix window to see the most active processes displayed.

What do you put into a .pg file to cause the crash? an infinite loop?

--Mike

<| Post or View Comments |>

$user$ Bob Byerly - Re: Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 1:19:59 PM (reads: 2973, responses: 0)

I have been using top and I do see the size of the apache processes grow -- usually slowly.

On a freshly started server there are about 7 apache processes each with resident memory size about 27m. Right now I'm looking at some logs of memory usage I've been keeping (redirecting top in batch mode to a file) and observing one apache process whose resident memory usage has grown from 30m to 60m over the course of an hour.

If I recall correctly (I don't want to try this now on our working server and our development machine is being upgraded now) the pg error that caused run-away memory usage was something like passing the "Matrix" function in the new parser package a string rather than an array reference as it was expecting. E.g.,

$A=Matrix( "[[1,2],  [3,4] ]");

rather than the correct

$A=Matrix( [[1,2],  [3,4] ]);

When our development server is back up I'll try this again. At least once I was able to observe memory usage growing rapidly after such an error.

If nobody else is having these problems we'll just have to assume it's some problem with our set-up and try some upgrades. But I thought I'd check with others first.

FWIW we're using Apache 1.3.33, which we compiled ourselves with mod_perl-1.29, mod_ssl-2.8, and php4.3.9.

<| Post or View Comments |>

$user$ Michael Gage - Re: Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 1:39:46 PM (reads: 2997, responses: 0)

For reference, here is the child configuration we are using on our slower machine with 1/2 Gig of memory -- the one that runs hosted.webwork.rochester.edu

StartServers 7
MinSpareServers 7
MaxSpareServers 9
MaxClients 10
MaxRequestsPerChild 100

If we leave too many servers going swapping occurs -- hosted actually runs faster with fewer servers. If we had more memory we'd leave more servers open.

If you suspect memory leak then cut down on the number of requests per child. Our observations are similar to yours -- there may be a slow memory leak, but not a very fast one.

The optimization above is one we switched to a few weeks ago to resolve some slow response on the hosted machine -- we reduced the number of spare servers. It seems to have helped.

Let us know how your situation evolves.

-- Mike

<| Post or View Comments |>

$user$ Bob Byerly - Re: Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 1:50:48 PM (reads: 3009, responses: 0)

Thanks Mike. I'll try your parameters and see what happens.

Bob

<| Post or View Comments |>

$user$ Davide P. Cervone - Re: Server maintenance -- memory leaks? $blueArrow$
2/14/2005; 9:43:20 PM (reads: 2991, responses: 0)

Bob:

The problem you reported with Matrix("[[1,2],[3,4]]") turns out to be a bug in the Parser, and it did cause an infinite loop that eats up memory. I have fixed the error and submitted the changes to the CVS repository. In addition to preventing the infinite loop, I have arranged for Matrix() to evaluate the string to produce the matrix, rather than produce an error (which is what it should have done before). Similarly for Point(), Vector() and Real().

Hope that helps.

Davide

<| Post or View Comments |>

$user$ William Wheeler - Re: Server maintenance -- memory leaks? $blueArrow$
2/26/2005; 8:26:46 PM (reads: 2967, responses: 0)

WeBWorK2 here at Indiana does leak memory slowly. So we keep the MaxRequestsPerChild at no more than 100.

However, under heavy loads, this may not be sufficient, because Apache "requests" and WeBWorK "requests" are different.

If the Apache configuration directive KeepAlive is set to On (as recommended in the Apache documenation), then the number of Apache requests for an Apache child is the number of connections that the child has handled. So if one sets the Apache configuration directives to

MaxRequestsPerChild 100
KeepAlive On
KeepAliveTimeOut 30
MaxKeepAliveRequests 10

then an Apache Child might handle up to
MaxRequestsPerChild * MaxKeepAliveRequests = 100 * 10 = 1000
WeBWorK requests before it dies. The point is that if a student makes successive submissions in less than KeepAliveTimeOut seconds, then all of the submissions count as only one request and the processing will go on until the MaxKeepAliveRequests limit is reached.

Although it's uncommon for students in science calculus courses (e.g., Stewart's Single Variable textbooks) to submit answers rapidly one after another, it's fairly common here for some students in finite math and business calculus courses to submit answers rapidly one after another in an effort to "guess" the right answer. For instance, I've observed students, who are frustrated with counting problems, to submit the numbers 10, 11, 12, 13, 14, 15, 16, ... one after another at a rate of one per five seconds in an effort to guess the answers to questions.

I have WeBWorK2 record in its timing log the number of WeBWorK requests an Apache child has processed every time the child processes a WeBWorK request. Even though MaxRequestsPerChild=100 , I have seen counts of 200-300 WeBWorK requests processed by still active Apache children.

But one cannot just clamp down on MaxNumberOfRequests and MaxKeepAliveRequests in order to hold down memory leakage. In tuning a WeBWorK server, one has to find a balance between memory utilization, the average size of the Apache children (i.e., how much memory leakage to tolerate), the number of Apache children, and the average number of WeBWorK requests processed by Apache children during their lifetimes. Creating a new Apache child uses up a significant amount of time relative to the time required to process a WeBWorK request. During periods of peak loading here, our servers run at 80-100 % utilization with loads factors of 1.0-3.5 and 3-6 webwork requests per second for sustained periods of time. We cannot afford the overhead of frequently creating new Apache children. Fortunately, our servers have 4-8 GB of RAM, so we can tolerate a certain amount of memory leakage.

One additional technique that we use to minimize memory leakage is to route course administration requests to a separate Apache server that runs on the same computer but listens to a different port. The reason is that the course administration functions can consume a lot of RAM for large WeBWorK courses. (Some of our WeBWorK courses have 500-600 students in them.) Some of these courses assign homework for every class meeting, for a total of 25-30 assignments per semester. By the end of the semester, a single invocation of a course administration function can generate a memory leak of 30-50 megabytes. (In WeBWorK1, I even observed memory leaks of more than 100 megabytes just from calling the Professor's page.) Needless to say, we keep the Apache course administration server on an austere diet.

Sincerely,

Bill Wheeler, Indiana University, Bloomington

<| Post or View Comments |>

$user$ Maria Nogin - Re: Server maintenance -- memory leaks? $blueArrow$
8/23/2005; 1:51:32 AM (reads: 2078, responses: 0)

Bob, how did you solve the problem?

We are having memory problems too. We have 1 GB of memory and 2 GB of swap most of which is used. Each apache process is growing over 300 MB after about 150 requests.

Set MaxRequestsPerChild to 75, but that doesn't seem to help.

Maria Nogin, California State University, Fresno

<| Post or View Comments |>

$user$ Bob Byerly - Re: Server maintenance -- memory leaks? $blueArrow$
8/23/2005; 8:51:32 AM (reads: 2094, responses: 2)

Actually we've set MaxRequestsPerChild to 15, and MaxSpareServers to 10. We very occasionally get a message in our log files that a connection was denied because there weren't enough servers available, but apparently this doesn't happen often enough to provoke student complaints :).

The solution that I think really made the difference for us though was simply to make sure that a server process is killed and a new one respawned whenever it gets too big. We inserted the following in our httpd.conf:

PerlSetEnv PERL_RLIMIT_AS 100:120
PerlModule Apache::Resource

You will need to make sure you have the perl module Apache::Resource for this to work. There should be documentation with this module that explains these lines, but this is supposed to set the soft limit for an httpd child to 100mb and the hard limit to 120 mb. You may want to adjust these depending on how many servers you're running and your memory size.

Bob

<| Post or View Comments |>

$user$ Aleksey Nogin - Re: Server maintenance -- memory leaks? $blueArrow$
8/23/2005; 5:44:40 PM (reads: 2270, responses: 1)

Tweaking Apache parameters, while very helpful, feels like working around the real problem instead of solving it. Does anybody have any idea what is causing these memory leaks in the first place and how to try getting rid of them?

TIA for any information.

<| Post or View Comments |>

$user$ Bob Byerly - Re: Server maintenance -- memory leaks? $blueArrow$
8/24/2005; 8:33:22 AM (reads: 2389, responses: 0)

In our case, it definitely seemed to depend on the problems. When certain problem sets were due, memory leaks got much worse. In a few cases, I found that students were entering responses that were malformed in unexpected ways, so I suspect a syntax checking problem. I could only rarely reproduce the difficulties myself, but perhaps I'm not as ingenious as my students.

For us, the problem has apparently gotten less severe as certain parts of WeBWorK, particularly the parser, undergo refinement, and, more importantly, as we gain more experience in writing problems. As the fall semester starts, I'm planning to monitor things carefully and try to find out what kinds of problems provoke these memory leaks, assuming we still have them.

Are there any hooks in WeBWorK for logging what problems are attempted at various times?

Bob

<| Post or View Comments |>

$user$ Sam Hathaway - Re: Server maintenance -- memory leaks? $blueArrow$
8/27/2005; 11:08:39 AM (reads: 1915, responses: 0)

Bob,

You can enable the timing log in global.conf. This will give you a log of each problem rendered, and how long it took.
-sam

<| Post or View Comments |>

$user$ Aleksey Nogin - Re: Server maintenance -- memory leaks? $blueArrow$
8/29/2005; 12:12:23 AM (reads: 1879, responses: 0)

I've finally managed to find the biggest source of severe memory leaks in our case - due dates far in the future. One course had a problem set with a due date in year 20005 (that's with an extra 0) and displaying any information on this problem set resulted in the corresponding Apache process growing by almost 200MB!

http://bugs.webwork.rochester.edu/show_bug.cgi?id=829 has details.

<| Post or View Comments |>

Using WeBWorK

Forum archive 2000-2006

Bob Byerly - Server maintenance -- memory leaks?

Bob Byerly - Server maintenance -- memory leaks?