Installation

Custer Capability

Custer Capability

by Christian McHugh -
Number of replies: 11
We are in the process of upgrading our webwork hardware. I've been looking through the wiki, but don't see anything about clustering capabilities. Would we be able to span webwork over multiple machines for increased failover? If not directly supported out of the box, might it be possible to point two servers to the same mysql instance so load balance connections?
In reply to Christian McHugh

Re: Cluster Capability

by Michael Gage -
It's pretty easy to use remote mysql databases with webwork: see

http://wwrk.maa.org/moodle/mod/forum/discuss.php?d=523#2147

In principle it's also possible to separate the webserver duties and the problem rendering duties. The hooks are all there to allow one machine to handle the web requests and have several other machines crunch the rendering of the problems (which takes most of the time). Look in the lib/WeBWorK/PG folder for the stub of a "Remote.pm" module to replace the "Local.pm" module which is currently used. No-one has actually used this feature so it would undoubtedly take some work to bring the Remote.pm module up-to-date and make it actually connect with other machines. I'd be glad to help with advice if someone has a use for this feature and has some time to work on it.

-- Mike

In reply to Michael Gage

Re: Cluster Capability

by Christian McHugh -
Alright, so I just got around to trying it out. Setting up webwork to point to a remote mysql server is not bad. But I'm now running into problems with clustered mysql. It appears that webwork wants to index a blob data type which is not allowed in mysql cluster:

mysql> alter table location_addresses engine=ndb;
ERROR 1073 (42000): BLOB column 'location_id' can't be used in key specification with the used table type
mysql> alter table locations engine=ndb;
ERROR 1073 (42000): BLOB column 'location_id' can't be used in key specification with the used table type


Is there any good way around this? Must the type be a blob? How about varchar? Has anyone managed to setup webwork to point to clustered mysql?
In reply to Christian McHugh

Re: Cluster Capability

by Christian McHugh -
Since everyone seems to be quiet. Might it be easier to not index those fields? I don't know the structure of the database, so I'm not exactly sure where what the performance impact on that would be. Also if I did want to do down this route, what sql creation script would I have to edit?

If I modify the sql structure from a stock install (like removing the indexing, nothing more major), is there any chance of tables being dropped and recreated in the regular format with my changes removed?

Or does anyone have any other ideas about running webwork on a mysql cluster?
In reply to Christian McHugh

Re: Cluster Capability

by Sam Hathaway -
Hi Christian,

WeBWorK is not careful about transactions at the moment, so I would not recommend spreading your database across multiple nodes. However, you can probably spread WeBWorK across multiple web servers, as long as the following is true:
  • All servers share the same courses directory and htdocs/tmp directory. NFS or a SAN would be helpful for this.
  • All servers share the same database, configured by $database_dsn in global.conf.
As for database replication, it is generally safe to tweak column types and indexes, but if you do you must pay attention to the database upgrade process. Before running wwdb_upgrade, read lib/WeBWorK/Utils/DBUpgrade.pm to determine if there are any table modifications that will interfere with your local modifications. (And of course, stop WeBWorK and back up your database before upgrading!)

Also, as Mike said, there is support for remote PG rendering, although this is old code and will need work to be production-ready. One big issue is that the rendering daemon (renderd) is a proof-of-concept that doesn't support concurrency at all.

In case anyone reading this would like to devote some serious effort to improving clustering capabilities, the clustering TODO is as follows:
  • General optimizations
    • Move session state out of database and into memcached so that many requests can be read-only.
    • Cache SELECT results with memcached.
  • Support clustered database: make WeBWorK transaction-safe.
    • Add BEGIN/COMMIT/ROLLBACK where necessary.
    • Look into changing SQL table definitions to support replication.
  • Support PG render farms: make remote rendering scalable and stable.
    • Implement PG rendering in WeBWorK::RPC. This gets us a mod_perl-based render server. Yay concurrency.
    • Rewrite WeBWorK::PG::Remote to use WeBWorK::RPC method.
In reply to Sam Hathaway

Re: Cluster Capability

by Christian McHugh -
Thanks for the reply. I noticed your todo of "changing SQL table definitions to support replication" and that is kind of what I'm attempting to do.

Like I said we are running a mysql cluster, which means there are a couple of limitations on the ndb engine versus the myisam type. As of now my stumbling block seems to be on the locations and location_addresses tables, and their having primary keys of blobs. Since primary keys are indexed, and mysql does not support indexing blobs, I'm not able to convert them to the ndb engine, or really use them.

So my question is, is there anything I can do about this? Do I have any options other than setting up a single mysql server and hoping it doesn't go down? What is really the layout of the database? It looks like the location_addresses table has two blog columns and both are primary keys, what's up with that?

Just overall is there any good way of pointing a webwork machine to a clustered mysql database?

Reference:
So far my only issue seems to be with:
http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-limitations-syntax.html
TEXT and BLOB columns. You cannot create indexes on NDB table columns that use any of the TEXT or BLOB data types.

In reply to Christian McHugh

Re: Cluster Capability

by Christian McHugh -
So I've managed to make things work. However, I gave up on using clustered mysql, and am instead using two mysql servers with replication (still with a heartbeat/ldirectord loadbalancer). mysql replication does not have any special limitations on replicated datatypes and I think the end result should be the same.
In reply to Christian McHugh

Re: Cluster Capability

by Danny Glin -
Sorry, I haven't been tracking this forum. We've been running Webwork with the mysql cluster suite for a full semester now. It did involve changing some data types since as you mention the ndb storage engine can't index text or blob fields. If you'd like I can provide the details of the modifications I needed to make in order to get things working.
Our general setup is as follows:
Two web servers load balanced via LVS.
A storage array to contain all of the webwork shared files.
Two mysql data servers.
All four servers are running the mysql cluster suite. The two web servers are also running the Red Hat Cluster Suite in order to share the same volume on the remote disk array.
In reply to Danny Glin

Re: Cluster Capability

by Christian McHugh -
Thanks Danny. This makes me feel better about our setup working :)

I was wondering though, what your LVS config for webwork looked like. Now that the semester has started and everyone is using our cluster setup for real, we've noticed the occasional problem with uploading:

http://wwrk.maa.org/moodle/mod/forum/discuss.php?d=183

Might this be due to the round robin load balancing? Any recommendations?

I've also been getting complaints about certain activities taking a very long time. For example, in the library browser hitting the "Set Definition Files" button seems to take a good 10 seconds, where on the old server it was instantaneous.

Thanks
In reply to Christian McHugh

Re: Cluster Capability

by Danny Glin -
We are currently in the process of migrating from LVS to a different load balancer. Unfortunately I can't tell you very much about the LVS configuration since it was set up and maintained by University IT, as they already had it in place for other applications.
I do know that we were able to run LVS without persistence as long as the nodes all had access to the same HTML/tmp directory (and obviously the same courses directory). When we had this set up, we had the secondary node connected via nfs to the primary node, which hosted all of the relevant files.
With regard to the delays accessing certain pages, how are you connected to the shared disk space? It could be a matter of it taking that much longer to access files on disk. I believe that when you click "Set Definition Files" in the Library Browser, the system recurses through all of the subdirectories of the templates directory looking for files of the form *.def. If this is a large directory structure, and is accessed remotely by something like nfs (in our case we are using gfs), it might account for the delay.
In reply to Danny Glin

Re: Cluster Capability

by Christian McHugh -
You make a good point with sharing the html/tmp directories. Right now they are independent on each machine.

As for the set definition files, it is accessing a local file system, but the delay seems to be a change from our previous old webwork version where recursion did not go as deep, or we had a different file hierarchy. So I believe these problems could be worked around.

However, I seem to have run into an even stranger problem with load balancing. As I just described in: http://wwrk.maa.org/moodle/mod/forum/discuss.php?d=183

Webwork hangs when access through the loadbalancer, but only when off campus, and only with certain browsers (small update, my laptop running vista from off campus has no problems, even with firefox). I'm not entirely convinced that I'm even replicating the problem consistently. Perhaps only one of 50 connections has this hanging issue, and I've only managed to trigger it from home, I don't know.

Basically, at this point, I'm pretty confused. This behavior seems very very odd, so if you have any ideas I'd love to hear them.
In reply to Christian McHugh

Re: Cluster Capability

by Danny Glin -
My first thought on this is to suggest that there may be a firewall at work. LVS does some fancy encapsulating of packets, and when we first set up our new firewall it was dropping anything sent from the LVS server as badly formed. It could be that either the firewall on the individual Webwork servers is blocking these packets, or the client computer is dropping the packets because of the encapsulation.
We ran for a whole year on LVS without anyone complaining of such issues, so beyond that I don't have any immediate ideas.