[ww-devel] Big Data Table

Geoff Goehle goehle at gmail.com
Mon Aug 18 10:40:42 EDT 2014


Mike wanted to get started on the Big Data Table design and I figured we
could at least have a discussion about what the possible columns could
be and what some of the challenges are.  In terms of challenges I see
the following

-  Table design:  Do we have one giant table for portability and ease of
use, or do we have thinner but separate tables for performance and
assume people know how to do joins? Having thinner tables also addresses
the next question in some way. 
-  Data purity:  A lot of data which people care about can be included
but will not be very "pure".  For example fields like seed, due date,
answer date can be included from the appropriate tables, but they may
not actually be the seed, due date, answer date that the student had
when the answer was recorded.  Most of the time they will be the same
but there will certainly be times when they are not.  We could just use
the table structure we have now under the theory that its the most
accurate reflection of the data we have. 
-  Computed columns:  Do we want to have computed columns that do not
reflect any data we actually store and may involve some educated
guessing.  For example, we have the final number of incorrect attempts
and correct attempts and the final score, but we don't have these values
at the time of each attempt.  Do we want to try and compute these values
for each answer row?  What happens when we fail (e.g. the question has
auxiliary fields which were not recorded) or are just wrong (e.g. the
seed changed)?  

I also put together a list of possible columns.  This list and the
previous questions can be found at
https://github.com/openwebwork/webwork2/wiki/Data-Export-Columns if
people want to weigh in or change things. 

-  Answer ID: Salted hash or Unique int
-  Course ID: Salted hash
-  Student ID: Salted hash
-  Set ID: Salted hash
-  Problem ID: Salted hash
-  Answer Timestamp: Unix time
-  Answer String
-  Answer Correct String: String of 1's, 0's corresponding to
correctness of answers
-  Problem Path: Also serves as unique identifier of problem
-  Final Problem Status
-  Total Incorrect Attempts
-  Total Correct Attempts
-  Problem Value:  Possibly impure
-  Problem Max Attempts: Possibly impure
-  Seed: Possibly impure
-  Open Date: Unix time, Possibly impure
-  Due Date: Unix time, Possibly impure
-  Answer Date: Unix time, Possibly impure
-  Set Type
-  Library Subject: Possibly Missing
-  Library Chapter: Possibly Missing
-  Library Section: Possibly Missing
-  Library Keywords: Possibly Missing
-  Status of Attempt:  Post-Computed, Possibly Missing, Possibly Impure
-  Final Set Grade:  Post-Computed, Possibly Missing, Possibly Impure
-  Number Incorrect Previous Attempts: Post-Computed, Possibly Missing,
Possibly Impure
-  Number Correct Previous Attempts: Post-Computed, Possibly Missing,
Possibly Impure






More information about the webwork-devel mailing list