[ww-devel] webwork-devel Digest, Vol 51, Issue 3
Peter Staab
peter.staab at gmail.com
Tue Sep 15 10:09:32 EDT 2015
I’m at Fitchburg State in Massachusetts. Now that we are back in full swing, I’ve been looking into what I need to do to get you the data. It seems like our IRB would like to review it. I just check in with the IRB team and they say that the PIs (Geoff and Mike) would actually submit the IRB application here on our campus. I’m happy to do the grunt for this, but wondered if others have run into this before. And if you have had problem before Geoff or Mike, how can I best help?
Peter
> On Aug 8, 2015, at 1:07 PM, goehle at gmail.com wrote:
>
> I can try to answer these questions as best I can.
>
> * Some IRB panels may reject the data request because students did not have an opportunity to deny participation or deny their data being provided to the project. Some IRB panels may reject the data request because the data involves students under the age of 18. Some IRB panels may reject the data request because it doesn't appear to have passed through any IRB panel.
>
> I had an informal meeting with someone from my institutions IRB panel. They indicated that it wasn't something that needed to be approved by them. There were two reasons. First the project is passive; it does not involve interaction or manipulation of students or student grades in any way. Second the student data is anonymized; there is no infringement of privacy because the extracted data can't be tied to any individual student. Of course, our IRB, like many institutions, likes to look over things even when they aren't required to, so its a good idea to check with your institution IRB.
>
> * Even though the student data is anonymized, I think there will be institutions that will require the data request to be reviewed by either their IRB or the IRB appropriate for the project. In my experience, every institution has different standards for protecting student data.
>
> This is correct. This project is not covered by 45 CFR part 46, but could still be subject to local regulations. However, at the end of the day we are talking about hundreds of thousands of rows of anonymized data. Not even the university the data originated from is identifiable. People should consider whatever administrative issues might be present, but from a practical point of view there is no real possibility for any invasion of privacy.
>
> * It could be very helpful to have an explanation for the hash process of student data. Since your script does the hashing, I could see an IRB panel asking whether you (or someone else) could undo the hash. In that regard, it would be helpful to know whether the original data (prior to the hash) is a WeBWorK generated id number (e.g. 537), an institution-generated student ID number (e.g. 00827394 or social security number?), or other student identifying information (e.g. John_Smith_2017).
>
> The data used to create the hash consists of things like the set id, the user id, the problem id and so on. This data would be identifying. However, the data is hashed using HMAC-SHA256. (https://en.wikipedia.org/wiki/SHA-2 <https://en.wikipedia.org/wiki/SHA-2>) This algorithm is also used to secure passwords and by things like bitcoin to verify transactions. This particular type of encryption is a non reversible process. There isn't any way to recover the original data except to brute force guess possible string combinations and see if they generate the same hash.
>
> * It looks like the homework set name isn't hashed; I also presume the student's hashed information may be associated with an institution in order to keep students uniquely identified. Some IRB panels may raise these issues and whether these pose means for identifying data back to institutions/courses/instructors/students.
>
> The set name is hashed, as is the student id, the course id and the problem id. None of these are tied to any particular university in a visible way. Rather the university is part of the string that is used to make the hashes. This means that if the same data from the same university is exported twice, it will have the same hash values. However, the only way to go from the hash values to the original data is by brute force guessing.
>
> * IRB panels may be very concerned about who will have access to the data. (E.g. is the data only available to the project team and not published/presented? Is it published on a wiki for everyone to process? Is it somehow available back to WeBWorK instructors or administrators?)
>
> As far as I know the plan is to make the data public.
>
> * How long will the data be kept and used?
>
> If the data is made public then this is kind of moot.
>
>
> Cheers.
> Geoff.
>
> On Sat, Aug 8, 2015 at 12:00 PM, <webwork-devel-request at webwork.maa.org <mailto:webwork-devel-request at webwork.maa.org>> wrote:
> Send webwork-devel mailing list submissions to
> webwork-devel at webwork.maa.org <mailto:webwork-devel at webwork.maa.org>
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://webwork.maa.org/mailman/listinfo/webwork-devel <http://webwork.maa.org/mailman/listinfo/webwork-devel>
> or, via email, send a message with subject or body 'help' to
> webwork-devel-request at webwork.maa.org <mailto:webwork-devel-request at webwork.maa.org>
>
> You can reach the person managing the list at
> webwork-devel-owner at webwork.maa.org <mailto:webwork-devel-owner at webwork.maa.org>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of webwork-devel digest..."
>
>
> Today's Topics:
>
> 1. Re: Data Donation (Wangberg, Aaron)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 8 Aug 2015 15:16:42 +0000
> From: "Wangberg, Aaron" <AWangberg at winona.edu <mailto:AWangberg at winona.edu>>
> To: WeBWorK development discussion <webwork-devel at webwork.maa.org <mailto:webwork-devel at webwork.maa.org>>
> Subject: Re: [ww-devel] Data Donation
> Message-ID:
> <94A53A65DDDC654F8A07FBA6EAD19CB21EBBC6E9 at EXCHANGE2.winona.edu <mailto:94A53A65DDDC654F8A07FBA6EAD19CB21EBBC6E9 at EXCHANGE2.winona.edu>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Geoff,
>
> I?m impressed with the method to collect data from multiple institutions. I would caution that instructors should be wary of providing the data without consulting their IRB. The IRB may not raise an issue, but the consequences for providing data without IRB approval could impact not just the instructor but also your project. Institutions have an obligation to protect student data, and I think they would prefer the instructor is upfront with the institution about the plan for sharing data rather than finding out that an instructor has shared some amount of data (unknown to them) collected from hundreds/thousands of students to some project unknown to them.
>
> In reading your data request below, I wanted to highlight a few things that may capture the attention of a vigilant IRB. (I think the data request would be acceptable by most IRB panels, but there will be some who raise these issues ? it would be helpful to have explanations available.)
>
> * Some IRB panels may reject the data request because students did not have an opportunity to deny participation or deny their data being provided to the project. Some IRB panels may reject the data request because the data involves students under the age of 18. Some IRB panels may reject the data request because it doesn?t appear to have passed through any IRB panel.
>
> * Even though the student data is anonymized, I think there will be institutions that will require the data request to be reviewed by either their IRB or the IRB appropriate for the project. In my experience, every institution has different standards for protecting student data.
>
> * It could be very helpful to have an explanation for the hash process of student data. Since your script does the hashing, I could see an IRB panel asking whether you (or someone else) could undo the hash. In that regard, it would be helpful to know whether the original data (prior to the hash) is a WeBWorK generated id number (e.g. 537), an institution-generated student ID number (e.g. 00827394 or social security number?), or other student identifying information (e.g. John_Smith_2017).
>
> * It looks like the homework set name isn?t hashed; I also presume the student?s hashed information may be associated with an institution in order to keep students uniquely identified. Some IRB panels may raise these issues and whether these pose means for identifying data back to institutions/courses/instructors/students.
>
> * IRB panels may be very concerned about who will have access to the data. (E.g. is the data only available to the project team and not published/presented? Is it published on a wiki for everyone to process? Is it somehow available back to WeBWorK instructors or administrators?)
>
> * How long will the data be kept and used?
>
> Many of these items are very picky; I suspect many IRB panels will be satisfied knowing you?ve received IRB approval for this project. The exceptions will require more time?.
>
> Aaron
>
> From: webwork-devel-bounces at webwork.maa.org <mailto:webwork-devel-bounces at webwork.maa.org> [mailto:webwork-devel-bounces at webwork.maa.org <mailto:webwork-devel-bounces at webwork.maa.org>] On Behalf Of goehle at gmail.com <mailto:goehle at gmail.com>
> Sent: Tuesday, August 04, 2015 10:05 AM
> To: webwork-devel <webwork-devel at webwork.maa.org <mailto:webwork-devel at webwork.maa.org>>
> Subject: [ww-devel] Data Donation
>
> Hi all. I wanted to send out another email about collecting past answer
> data since I wasn't thorough enough in my last one.
>
> As part of the WeBWorK/MAA grant WeBWorK: Improving Student Success in
> Mathematics (http://www.nsf.gov/awardsearch/showAward?AWD_ID=0920341 <http://www.nsf.gov/awardsearch/showAward?AWD_ID=0920341> )
> we (Mike Gage, Arnie Pizer and myself) are doing "big data" research on
> student answers. The short term goals of this research are to improve
> the OPL, both by identifying which problems could potentially be broken
> or misleading, and by making set generation easier by allowing
> instructors to see problems which are often assigned together. Another
> goal is to build a large database of answer data which can be made to
> the public and used for future research.
>
> Since this research does not involve intervention or interaction with
> any individuals, and since it does not involve information which is
> individually identifyable it does not fall under the Office for Human
> Research Protections 45 CFR part 46 (e.g. its not covered by IRB). Of
> course data is still data, even if its anonymized. So you should only
> participate if you feel comfortable. However, I hope everyone will
> consider it. This is an easy way to give back to the WeBWorK project
> that doesn't require a lot of time or funds.
>
> If you are interested all you need to do is run the data collection
> script:
> 1) wget --no-check-cert
> https://raw.githubusercontent.com/goehle/webwork2/dbscript/bin/dump_past_answers <https://raw.githubusercontent.com/goehle/webwork2/dbscript/bin/dump_past_answers>
> 2) perl dump_past_answers
>
> Some things to keep in mind:
> - Do this sometime when your server isn't under a lot of pressure.
> This will put something of a strain on your server. If you have a small
> or medium sized webwork server the script should only take 5 minutes or
> so. If you have a really big WeBWorK server I'm not sure how long it
> will take.
> - The data is automatically uploaded at the end of the script.
> However, if it looks like there are problems with the upload you can
> just email the resulting .csv.gz file to me at goehle at gmail.com <mailto:goehle at gmail.com><mailto:goehle at gmail.com <mailto:goehle at gmail.com>>
> - In general if there are questions you can reach me at
> goehle at gmail.com <mailto:goehle at gmail.com><mailto:goehle at gmail.com <mailto:goehle at gmail.com>>
>
> Cheers.
>
> Geoff.
>
> agora.cs.wcu.edu/~goehle <http://agora.cs.wcu.edu/~goehle><http://agora.cs.wcu.edu/~goehle <http://agora.cs.wcu.edu/~goehle>>
>
> Example Row:
> # 0 - Answer ID hash :
> dc1d80eb2e3cb8974969dfb7e1856356aa97d6004f2c75dfb4191b87a25b97b3
> # 1 - Course ID hash :
> fc69ec7fb7ea05da794b2356fe270a96cdb72cca232781fe7e2d7187023696fe
> # 2 - Student ID hash :
> # 3 - Set ID hash
> # 4 - Problem ID hash
> # User Info
> # 5 - Permission Level
> # 6 - Final Status
> # Set Info
> # 7 - Set type
> # 8 - Open Date (unix time)
> # 9 - Due Date (unix time)
> # 10 - Answer Date (unix time)
> # 11 - Final Set Grade (percentage)
> # Problem Info
> # 12 - Problem Path
> # 13 - Problem Value
> # 14 - Problem Max Attempts
> # 15 - Problem Seed
> # 16 - Attempted
> # 17 - Final Incorrect Attempts
> # 18 - Final Correct Attempts
> # 19 - Final Status
> # OPL Info
> # 20 - Subject
> # 21 - Chapter
> # 22 - Section
> # 23 - Keywords
> # Answer Info
> # 24 - Answer timestamp (unix time)
> # 25 - Attempt Number
> # 26 - Raw status of attempt (percentage of correct blanks)
> # 27 - Status of attempt (post computed may be blank)
> # 28 - Number of Answer Blanks
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://webwork.maa.org/pipermail/webwork-devel/attachments/20150808/9ee5058e/attachment-0001.html <http://webwork.maa.org/pipermail/webwork-devel/attachments/20150808/9ee5058e/attachment-0001.html>>
>
> ------------------------------
>
> _______________________________________________
> webwork-devel mailing list
> webwork-devel at webwork.maa.org <mailto:webwork-devel at webwork.maa.org>
> http://webwork.maa.org/mailman/listinfo/webwork-devel <http://webwork.maa.org/mailman/listinfo/webwork-devel>
>
>
> End of webwork-devel Digest, Vol 51, Issue 3
> ********************************************
>
> _______________________________________________
> webwork-devel mailing list
> webwork-devel at webwork.maa.org
> http://webwork.maa.org/mailman/listinfo/webwork-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://webwork.maa.org/pipermail/webwork-devel/attachments/20150915/b7cdc115/attachment-0001.html>
More information about the webwork-devel
mailing list