I'm not experienced with gateway quizzes, and am a pretty new member of the WeBWorK community. However, I have great interest in capacity planning and issues of "scale" for WeBWorK for reasons unrelated to online exams - so I am going to toss in my 2 cents and some.
My impression is that in recent years "capacity planning" of WeBWorK servers became of little interest to the community, as modern servers have enough RAM and CPU resources so that typical homework usage does often create noticeable stress and performance problems for most institutions using WeBWorK. We simple allocate enough resources to our servers and do a passable job in tuning the configuration to avoid any significant level of complaints from the students and leave "well enough alone".
The discussions I have seem about how multiple problems are rendered at once in a gateway quiz makes it clear that this is a very demanding manner of using WeBWorK.
Large online exams (and very large numbers of students) can apparently quickly leave the realm of where current setups suffice, and leave us uncertain of what should be done to efficiently and cost-effectively provide the "scale" we may need.
There seems to be a real need for "the community" to work together on investigating and determining best practices for "server load scaling". At present, it does not seem that there is enough accumulated experience to provide a guide of the sort many of us (myself included) would all like to have, which is why there is no such guide available anywhere we know of.
It would also be very nice to have instructions of how to install and operate a "cluster" using a load-balancer and multiple (virtual) servers which could "scale up" and "scale down" (horizontally) as necessary based on expected demand (ex. scale up in advance for online exams). Note: The database capacity would probably also need to scale up/down, and not only the WeBWorK Apache server capacity.
It could be that using public cloud providers (with their capabilities to do elastic scaling and bill based on "usage") might be a good option for online exams, were we to know how to do that and be able to turn it on/off as needed. Such an approach would hopefully avoid the need for each institution to operate local WW servers/clusters whose capacity is large enough for their large online exams, but overkill for the rest of the time. Hopefully usage based costs would make this an affordable approach. It is probably also possible to get a similar result using on-site solutions using ad-hoc approaches to horizontal scaling, additional VMs, etc. all managed by the local staff as necessary for "high demand" events. Determining the pros and cons of these two options and the costs (both financial and "staff time") involved would be very helpful in my opinion.
Getting from where we are today to where some of want to be in the future will require the investment of effort by several people with a vested interest in the outcome. Many of us (I speak at least for myself) do not have the background/experience to really do the configuration/testing/engineering needed to design the best practices for server/cluster scaling/capacity planning to be prepared for very high spurts of demand. It is very likely that a team of experienced WeBWorK "experts" together with some IT "scaling" experts would be far more able to advance the necessary investigations and planning effectively than just the "regular" WeBWorK community alone. More of our employers now have a need for WeBWorK at a large scale so hopefully some of them will allocate resources to help with finding the solutions.
I recently did some load testing to try to understand system capacity for large number of single problem render requests arriving in short periods of time. See https://webwork.maa.org/moodle/mod/forum/discuss.php?d=4748
and under some pretty intensive constant demand a moderately sized WW virtual server (3vCPU, 10GB RAM, WW in Docker on CentOS base OS). That server was able to support about 2500 single problem render requests per minute from about 100 "very demanding clients". Designing methodologies to load-test different use cases of different WW installations would be helpful in providing real data useful in the capacity planning decisions.
The memory ballooning of the Apache processes is certainly a critical restriction, but CPU power is needed to handle the render requests, as well as the need to replace the Apache child processes sufficiently frequently when the server is under significant load. In the gateway setting where all the questions of a quiz are rendered (and graded) at once (if I understand correctly) - CPU demand will probably be pretty high for each "request" so that having many students start a gateway quiz in a short time is very demanding on the server.
It terms of managing memory usage, I do not think that using "MaxConnectionsPerChild" is likely to be sufficient for Gateway quizzes, as each gateway request make multiple render request per call. I would recommend looking into also using "Apache2::SizeLimit" as discussed at https://webwork.maa.org/moodle/mod/forum/discuss.php?d=2692#p5887
but with the setting for "$Apache2::SizeLimit::CHECK_EVERY_N_REQUESTS" set to be a very small number (so that after a Gateway request, the memory jump will be detected quickly).
In a slightly different direction - maybe the "gateway quiz" is not the best assignment type for a large online exam. Maybe a new "assignment type" which is more "homework like" in rendering problems one at a time, but having some of the additional features that Gateway quizzes have could provide a better alternative for large online exams in WeBWorK. If the students could navigate from question to question and submit each one individually - the stress on the server would be lower. The price is that students would need to "flip" from question to question and as such make many small "render" requests. I'm not sure what would be needed, but it bears consideration. For now, using the existing "homework" assignment type with the assignment opened for just the few hours of an online exam might be able to support more students with less server problems than a Gateway quiz with the same set of questions.
Some other discussion thread with discussion of load issues: