UsingWW: Failure to unarchive 2.16 course in 2.17 or 2.18

I have some archived courses from a 2.16 server that uses a mysql (not mariadb) database. When I attempt to unarchive them into a 2.17 or 2.18 server that is using mariadb, it fails. There are many errors of the same nature as this one:

Failed to restore table 
'course_name_password' with command '2>&1 
/usr/bin/mysql --defaults-file=/tmp/Kt89urvb8x webwork < 
/opt/webwork/courses/course_name/DATA/mysqldump/password.sql'
 (exit=1 signal=0 core=0): ERROR 1273 (HY000) at line 25: Unknown 
collation: 'utf8mb4_0900_ai_ci'

Where of course, "course_name" is the actual course name.

There is some information about this here:
https://dba.stackexchange.com/questions/248904/mysql-to-mariadb-unknown-collation-utf8mb4-0900-ai-ci
But I am not sure what I can trust. Are we (the WeBWorK community) familiar with this issue and how it can be addressed?

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

by Alex Jordan - Friday, 16 June 2023, 11:38 PM

I manually unzipped a course archive file. In the DATA/mysqldump/folder, I search-and-replaced all "utf8mb4-0900-ai-ci" with "utf8mb4_unicode_520_ci". (This is per the suggestion from the link I posted. Note that I tried "uca1400_as_ci" as suggested for newer MariaDB, but that did not work.)

Then I tar'red the folder and I was able to unarchive the course. I poked around and don't see anything amiss. I hope the data is intact, but not sure how I would tell.

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

by Andrew Parker - Wednesday, 21 June 2023, 11:15 AM

IIRC, you want to regex s/utf8mb4_0900_ai_ci/utf8mb4_unicode_ci/g

I recently performed a migration to MariaDB and that is the replacement that worked for me.

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

by Alex Jordan - Friday, 23 June 2023, 2:21 PM

Do we understand the implications of using utf8mb4_unicode_ci versus utf8mb4_unicode_520_ci? Apparently the formre uses this version 4.0.0 of some standard:

http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt

And the other uses version 5.2.0:

http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt

And we are changing from 9.0.0, which mariadb did not impleemeent support for:

https://www.unicode.org/Public/UCA/9.0.0/allkeys.txt

A diff between the 4.0.0 version and 9.0.0 version shows over 15K lines cut. A diff between the 5.2.0 version and 9.0.0 version shows over 22K lines cut. I just don't know what these collations are used for and if it matters for us.

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

by Danny Glin - Friday, 23 June 2023, 3:13 PM

As far as I understand a collation is used for sorting/ordering of strings and case sensitivity. I would expect any of these collations to handle common characters the same way, but they may have different rules about how to order some of the extended characters.

My guess is that the only thing that may change if you use a different collation is that a list of user ids or sets could end up sorted in a different order if there were exotic characters in it. Nothing in WW depends on these sortings so at worst things would be displayed in a different order than before.

If we're going to set a default I would suggest utf8mb4_unicode_ci. The fact that it's older means that it is more likely to be available on any given system, whereas the newer ones may be too recent for some existing servers.

Using WeBWorK

WeBWorK Main Forum

Failure to unarchive 2.16 course in 2.17 or 2.18

Failure to unarchive 2.16 course in 2.17 or 2.18

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

Re: Failure to unarchive 2.16 course in 2.17 or 2.18

Re: Failure to unarchive 2.16 course in 2.17 or 2.18