Features & Development

Supporting WW for very large user bases - request for information

Re: Supporting WW for very large user bases - request for information

by Nathan Wallach -
Number of replies: 0
Edfinity was willing to tell me that they do load-balancing with multiple servers and shared storage, but their back-end WW servers are stateless (their front-end handles all the database work). This is a positive indication about the possibility of such large scale back-end WW clusters.

---

Capacity planning remains quite challenging without much data, so I decided to stress-test my development server using a simple free tool to try to get a reasonable estimate handle on the capacity it provides. I used siege https://www.joedog.org/siege-manual/ as it is very simple and sufficed for the basic sort of testing I wanted to do, but it may not be sufficient to test a much larger server.

Below is information on the server, and the test results.

The VM has 10GB RAM, 3 vCPU development/testing server (WW running inside Docker) on a CentOS base OS VM. siege was pointed to URLS for embedding problems in HTML pages, and only requested the main HTML part of problems via the html2xml interface. The tests were against a list of 40 html2xml problems URLs. During these load tests, I also used a browser to load one of those pages (not very often), and it would render, but the delay/latency certainly was noticeable when the server was under real stress (150 clients), and much less so when it was handling 80 clients, when it was more or less typical in time to load/render on screen.

This server gets quite stressed when 150 siege clients are hitting it in parallel (lots of swap activity was triggered), but functions reasonably well for 100 siege clients (very moderate swap activity, but CPU usage would max out at times - apparently when Apache processes were being started and stopped). Average response times were a bit better with only 80 siege clients. I suspect that testing with somewhere between 1800 and 200 clients would probably DoS the server due to excessive swapping and the OoM issues, but I did not try it in practice.

Note: The WW timing.log file does not seem to keep up with the load, and only a small fraction of the render calls are getting logged to it. In my case the file is on NFS, which may be hindering the logging code more than would occur on local storage, but I suspect that the WW logging code for timing.log is simply not up to handling this sort of load.

These simple tests seem to show that this server config can support up to about 2750 "renders" per minute when 100 clients are making constant streams of sequential requests, and about 2650 "renders" per minute when there are only 80 such clients (but with faster average response times). That seems to be a reasonable estimate of maximum capacity for this server instance.

After some tuning of mpm_prefork setting and Apache2::SizeLimit (see ttps://webwork.maa.org/moodle/mod/forum/discuss.php?d=2692#p5887 ) as follows, I got the results shown below from several "siege" runs with different settings.
  • MaxRequestWorkers set to 200
  • MaxConnectionsPerChild set 25
  • $Apache2::SizeLimit::MAX_PROCESS_SIZE = 420000;
  • $Apache2::SizeLimit::MAX_UNSHARED_SIZE = 420000;
  • $Apache2::SizeLimit::CHECK_EVERY_N_REQUESTS = 5;
There is certainly a tradeoff between the memory growth of the Apache processes and the CPU costs of starting up new Apache workers. It is certainly possible that more careful tuning could somewhat improve performance on the tests, but I'm not sure how much more effort on this is worthwhile at present. I do hope to arrange to run tests with some additional RAM/vCPU resources in the near future to see what sort of scaling / performance behavior I can observe.

(Note: I found it helpful to put the servers IP address in my /etc/hosts file to avoid DNS delays and some failures during earlier stress tests I tried in the same approach).

siege -c 150 -t120S -f /home/tani/.siege/url-01.txt

** SIEGE 4.0.4
** Preparing 150 concurrent users for battle.
The server is now under siege...
[alert] socket: select and discovered it's not ready sock.c:351: Connection
timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection
timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out

Lifting the server siege...
Transactions:                   4575 hits
Availability:                  99.93 %
Elapsed time:                 119.37 secs
Data transferred:              49.18 MB
Response time:                  3.83 secs
Transaction rate:              38.33 trans/sec
Throughput:                     0.41 MB/sec
Concurrency:                  146.89
Successful transactions:        4575
Failed transactions:               3
Longest transaction:           55.34
Shortest transaction:           0.33

siege -c 100 -t360S -f /home/tani/.siege/url-01.txt

[alert] socket: select and discovered it's not ready sock.c:351: Connection
timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection
timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out


Lifting the server siege...
Transactions:                  16685 hits
Availability:                  99.98 %
Elapsed time:                 359.95 secs
Data transferred:             191.27 MB
Response time:                  2.14 secs
Transaction rate:              46.35 trans/sec
Throughput:                     0.53 MB/sec
Concurrency:                   99.37
Successful transactions:       16685
Failed transactions:               3
Longest transaction:           43.86
Shortest transaction:           0.10


siege -c 80 -t180S -f /home/tani/.siege/url-01.txt
** SIEGE 4.0.4
** Preparing 80 concurrent users for battle.
The server is now under siege...[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out

Lifting the server siege...
Transactions:        7530 hits
Availability:       99.91 %
Elapsed time:      179.84 secs
Data transferred:       85.16 MB
Response time:        1.87 secs
Transaction rate:       41.87 trans/sec
Throughput:        0.47 MB/sec
Concurrency:       78.19
Successful transactions:        7531
Failed transactions:           7
Longest transaction:       32.54
Shortest transaction:        0.10

siege -c 80 -t180S -f /home/tani/.siege/url-01.txt
** SIEGE 4.0.4
** Preparing 80 concurrent users for battle.
The server is now under siege...[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out

Lifting the server siege...
Transactions:        8006 hits
Availability:       99.99 %
Elapsed time:      179.30 secs
Data transferred:       90.26 MB
Response time:        1.77 secs
Transaction rate:       44.65 trans/sec
Throughput:        0.50 MB/sec
Concurrency:       79.14
Successful transactions:        8006
Failed transactions:           1
Longest transaction:       36.11
Shortest transaction:        0.10
 
tani@lxtani:~$ siege -c 80 -t180S -f /home/tani/.siege/url-01.txt
** SIEGE 4.0.4
** Preparing 80 concurrent users for battle.
The server is now under siege...[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out
[alert] socket: select and discovered it's not ready sock.c:351: Connection timed out
[alert] socket: read check timed out(30) sock.c:240: Connection timed out

Lifting the server siege...
Transactions:        8048 hits
Availability:       99.96 %
Elapsed time:      179.57 secs
Data transferred:       91.50 MB
Response time:        1.76 secs
Transaction rate:       44.82 trans/sec
Throughput:        0.51 MB/sec
Concurrency:       78.86
Successful transactions:        8048
Failed transactions:           3
Longest transaction:       34.93
Shortest transaction:        0.10