UsingWW: Efficiently Shaking Down Problem Libraries

I'm interested in taking an existing library of problems and efficiently debugging them. Before I lay out my ideas, if any work has already been done towards this goal, I'd love to hear about it.

Below I have a question about WeBWorK's capabilities, followed by an announcement, followed by a request for input to a checklist I am building.

Most problems involve randomization. Every problem in a set uses a 4-digit seed which determines the "random" variables used in the problem. Would it be possible to have WeBWorK execute a procedure that would compile a problem using every single seed, one at a time? And would this be an outrageous computational load if it were run on a library of say 1000 problems?

We'd be looking for compiling errors, like accidental division by 0. We'd be looking for TeX errors in the hardcopy version. We'd also want the procedure to take the literal text of the displayed correct answer and submit that as an answer, checking that the displayed correct answer at least actually counts as correct. (These issues might only arise for one seed in a thousand.) We'd also check that different seeds do in fact tend to lead to different problems.
Of course there is a lot more to be done when debugging a library for say, preparation for the OPL. One thing to do is to tidy up the code for readability. I'm working with a colleague on a script that can be run on a pg file to automatically indent and space out the code appropriately.
Then I have this checklist, and I would like to know if anyone can add to it. It's hard for me to imagine a way to automate checking most of these items - I think they mostly need to be checked by hand from problem to problem.

Does the problem use Math Objects if it's at all feasible to use them?
Is the problem's displayed correct answer actually mathematically correct?
Does the problem accept the displayed correct answer as correct? (would be automated via item 1)
Does the problem accept alternative expressions for the correct answer? (For example I just cleaned a problem that would accept 26/3, but not 26/3+ln(8)-ln(8))
Is the problem appropriately randomized enough? (would be automated via item 1)
Does the problem appropriately allow/disallow decimal approximations to correct numerical answers? Ditto with improper fractions and mixed numbers.
Is the expected correct answer of the appropriate Math Object type?
Are MathType type warnings turned off if they provide too big of a hint?
Are incorrect answers counted as incorrect?
Are incorrect answers that are not even of the right type given appropriate type warning messages?
Is the wording of the problem acceptable?
Are there any typos in the display of the problem on-screen or in the hard copy?
Do you have any other observations for improvement of this problem, such as its layout on screen?

If anyone is interested in developing a resource for problem testing with me, let's talk. I'll be in San Diego at the joint meetings in January too - maybe we could have a face-to-face meeting.

Re: Efficiently Shaking Down Problem Libraries

by Michael Gage - Saturday, 15 September 2012, 12:00 AM

Hi Alex,

Thanks for checklist of suggestions for doing quality assurance on the WeBWorK problems.

With regard to (1). The ability to do something like this has been in placefor some time. See http://webwork.maa.org/wiki/Client_Editor, http://webwork.maa.org/wiki/Release_notes_for_WeBWorK_2.5.0#National_Problem_Library, the README in webwork2/clients and code in checkProblem.pl in that same directory. The current version of checkProblem.pl just runs on a single seed and can check the OPL of 20K + problems in a few hours -- I usually run it overnight -- without killing the response time.

A lint program such as you suggest in (2) could be very helpful.

As you suggest there are few items in your checklist that could be added to the automated unit tester with the concerted effort of a few people for a short period of time. The other items will require a larger group of people (“editors”) to vet the OPL and continue to vet it as it grows.

There are several other people interested in this project, including John Jones at ASU and Jeff Holt at UVA, who helped create the original NPL. Paul Pearson at Hope College, Gavin LaRose at U. of Michigan and John Jones have been most active in squashing bugs as you can see from the OPL commit logs. Djun Kim is also interested and I apologize to the other interested parties who I have overlooked in this short list.

Many WeBWorK participants will be at JMM 2013 including myself. Working on software for this project -- and the editing itself might be a good topic for a future code camp such as

WeBWorK::Rochester,

WeBWorK::Clinton

or WeBWorK::Winona. http://webwork.maa.org/planet/

(Keep an eye on http://webwork.maa.org/planet for further blog posts on past and future code camps and other extended comments on WeBWorK. To get your own blog listed send a request and a url to me or to Jason Aubrey at U. of Missouri. )

-- Mike

Re: Efficiently Shaking Down Problem Libraries

by Alexander Basyrov - Sunday, 16 September 2012, 3:15 PM

I think I've made a script that would do what you want in item 1 -- render a problem for a list of random seeds, check that there were no warnings or errors, take the correct answers and submit them to see if the scores come back as 100%.

The hardcopy generation I did not look into, and the script that I have does not check that hardcopy would be generated and compiled into a pdf file.

The script is a variation of the renderProblem.pl script. The WebworkClient.pm module also had to be slightly modified. I'm willing to share the scripts I've got and be a part of further discussions.

It could take a while to test 1,000 problems with 10,000 random seeds that webwork seems to use.

-- Alex

Re: Efficiently Shaking Down Problem Libraries

by Michael Gage - Monday, 17 September 2012, 7:59 PM

Sounds excellent Alex. Let's try to get your changes into a near future distribution of the webwork code. Is this code currently on your github repo? Let's see if we can make some combination of the versions of renderProblem.pl and WebworkClient.pm which will work for all use cases.

-- Mike

Re: Efficiently Shaking Down Problem Libraries

by Boyd Duffee - Tuesday, 18 September 2012, 5:07 AM

> We'd also want the procedure to take the literal text of the displayed correct answer and submit that as an answer, checking that the displayed correct answer at least actually counts as correct.

For something like this, I'd recommend Test::WWW::Mechanize from CPAN which bolts some test harness functionality around WWW::Mechanize, which knows how to login, navigate and scrape webpages. There's a nice technique out there for using your browser to record all the click throughs on the site and playing them back with a WWW::Mechanize script.

Using WeBWorK

WeBWorK Problems

Efficiently Shaking Down Problem Libraries

Efficiently Shaking Down Problem Libraries

Re: Efficiently Shaking Down Problem Libraries

Re: Efficiently Shaking Down Problem Libraries

Re: Efficiently Shaking Down Problem Libraries

Re: Efficiently Shaking Down Problem Libraries