OPL Maintenance

OPL update after 2.15

OPL update after 2.15

by Alex Jordan -
Number of replies: 6
Today I started the process of upgrading some WW servers I manage to 2.15. I started with a "clean" server that had no customizations from 2.14. I pulled webwork2, pg, and the OPL.

I got to a point where I run the OPL update, and there were character encoding error messages (pasted below) among the output that is logged to the screen. Surely something to do with all the utf8 work for 2.15.

Does anyone have insight for what these are telling me? Something I should fix locally, or something not quite right with the distribution?

...
10100 10200 10300 10400 10500 10600 10700 10800 10900 11000
11100 11200 11300 11400 11500 11600 11700utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880013.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880013.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880138.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880138.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880264.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880264.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880375.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 880375.
11800 11900 12000
12100 12200 12300 12400 12500 12600 12700 12800 12900 13000
...
...
23100 23200 23300 23400 23500 23600 23700 23800 23900 24000
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2023434.
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2023434.
24100utf8 "\x96" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2025115.
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2025115.
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2025332.
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2026242.
utf8 "\xB0" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2026704.
24200 24300 24400 24500 24600 24700 24800 24900 25000
...
...
32100 32200 32300 32400 32500 32600 32700 32800utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2584581.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 326, <IN> line 2584581.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2585037.
utf8 "\xCA" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2585037.
32900 33000
...
...
35100 35200 35300 35400 35500 35600 35700 35800utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2760070.
35900 36000
36100 36200utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2778309.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2778433.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2778656.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2778755.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2778885.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2779000.
utf8 "\xA9" does not map to Unicode at /home/webwork/www/webwork-dev.aimath.org/webwork/webwork2/lib/WeBWorK/Utils/Tags.pm line 225, <IN> line 2779114.
36300 36400 36500 36600 36700 36800 36900 37000


In reply to Alex Jordan

Re: OPL update after 2.15

by Alex Jordan -
I think I get it. This is indicating a handful of OPL problems with some "bad" characters in them. Like \xCA is an E with a hat.

Would it be right for me to open a pull request to the OPL that edits these problem files, replacing the offending characters with something safer but still faithful to the original meaning? Would that somehow be a bad thing for WW servers 2.14 or earlier?
In reply to Alex Jordan

Re: OPL update after 2.15

by Danny Glin -
Is the problem occurring because those characters are in the metadata tags? In theory 2.15 is supposed to support foreign language text, so it would make sense if tags could also include extended characters. I don't know if any effort was put into allowing the library database to support this.
In reply to Danny Glin

Re: OPL update after 2.15

by Alex Jordan -
I'm not able to actually find files in the OPL with \xCA, as indicated in the error message. Maybe I am searching in the wrong way.

I do find files with \xB0, using:
grep -r -P --include=\*.pg "\xb0" .

from the OPL's root folder.
This character is a degree symbol, so these problems could be edited to use safer characters.

I find \xb9 (an e with an accent) in only two files. In these two files, it only appears in comments, used in the name Bezier. In those files I could just change the e to a plain e.
  1. I still can't find \xCA in the OPL, which is what the majority of the erros indicate.
  2. Is editing \xb9 to a plain "e" the right thing to do, or should it actually be fine for \xb9 to be in the .pg files, and something about the OPL-update scipt needs to evolve?
  3. Similar to item 2, but for the degree symbol. I could change that to use math mode, for example.

In reply to Alex Jordan

Re: OPL update after 2.15

by Michael Gage -
Alex,

Try changing the character set of the files in question from latin1 to utf8.
(vim will do this for example). I think that would cure the problem. You might
have to change the character set for the file and then erase and retype the offending character.

The underlying problem is that there are certain characters in latin1, usually
characters with accents but also the copyright symbol, which are illegal byte sequences in utf8 (those characters are represented by a different byte sequence in utf8.

It would be nice if we could figure out how to catch this error in the OPL script and specify the address of the .pg file that caused it. That would make cleaning up the library (and other files) easier. The error is emitted by perl itself -- not the script -- and I haven't figure out how to catch it and modify the error message.
In reply to Alex Jordan

Re: OPL update after 2.15

by Alex Jordan -
I should have paid closer attention. The files I found with \xB0 and \xA9 are all in either Contrib/ folder or the Pending/ folder. Does OPL-update even look in those?

I found that this (from webwork-open-problem-library):
grep -r -a -x -v --include=\*.pg '.*' OpenProblemLibrary/
reports all files that contain invalid utf8. And it was a list that matched those error messages, at least by count.

Some problem files had copyright symbols in the comments at the top that I changed to
& c o p y ;
Some were popup menu questions, where for some reason the initial question mark that is often presented as the first choice was a "EE?" where the E's had hats.

And some were degree symbols. They happened in math mode anyway, so I just made them ^{\circ}.

I'm going to make a pull request with all this to the OPL.
In reply to Alex Jordan

Re: OPL update after 2.15

by Alex Jordan -
OK. It turned out I hadn't (successfully) pulled an updated OPL in the first place. After doing that, all of the E-hats and copyright symbols went away. I presume Mike or Tani or someone edited those files recently.

But I still hit some files that had the degree symbol. And I opened this PR for those:
https://github.com/openwebwork/webwork-open-problem-library/pull/644