[WWdevel] Re: CVS alternative

Fri Jan 14 11:01:37 EST 2005

Sam Hathaway wrote:

> On Jan 10, 2005, at 5:47 PM, John Jones wrote:
>
>> Sam Hathaway wrote:
>>
>>> On Jan 10, 2005, at 4:13 PM, Michael Gage wrote:
>>>
>>> Can anyone give me more details on how the repository of problem 
>>> sources and the "database" will interact? Based on what little I 
>>> know, it seems to me that the problem source should be part of the 
>>> problem's database record.
>>
>>
>> In a sense, things are reversed.  The problem files initially contain 
>> all of the information; the extra information comes in the form of 
>> special comments in those files.  We then have a script to set up a 
>> mysql database, and to extract the information from the files and 
>> load it into the database.
>
>
> Would it be fair to say that the MySQL database does nothing more than 
> act as an index on the metadata associated with each problem? Or am I 
> missing something?

I think that is a reasonable description.

>> I like this approach since it is easy to reload the database if 
>> something goes wrong, and we are shipping mainly flat text files 
>> (except for the images).
>
>
> I like the simplicity of this, and in a distributed system like this 
> the more we can do with a version control system the better.
>
>>> By the way, has anyone thought about how problems will be packaged? 
>>> Many problems consist of more than one file and it might be worth 
>>> laying out a packaging format, so that a problem and all of its 
>>> auxiliary files and metadata can be distributed as a single file.
>>
>>
>> I hadn't thought of the extra files.  Thus far, the problems were 
>> basically not packaged in any special way.
>>
>> The distribution method I had in mind was that webwork would handle 
>> it behind the scenes.  It would fetch files over http from perl (I 
>> think the perl module is LWP, or something like that).  The entry 
>> point would be an extra tab in the admin course (along with add 
>> course, ..., and then Problem Database).  If you ask it to update 
>> your Problem Library Database, then it fetches the current list of 
>> files/version via http, checks it against your current list, and gets 
>> whatever is new and reloads the database.
>
>
> Shouldn't we leverage the version control system checkout features to 
> fetch and update problem libraries? It seems like a waste to keep the 
> problems in CVS (or Subversion) and then ignore the versioning 
> features of that system and track versions separately and fetch via HTTP.

At one time, there were two reasons for this.  Maybe neither is compelling.

The first was that I thought people might now want the most current 
version of some problems.  This would introduce various complications.  
After talking with Bill about this in Atlanta, we decided against this.  
Along with that, if someone modifies a problem and submits the change 
and it isn't a strict improvement, then we would fork the problem rather 
than replace the original.  Anyway, if people might prefer older 
versions of problems, we would not want the equivalent of "cvs up".

The second reason was that I knew I could get perl to do http requests.  
If someone was missing the needed perl module, it would just become one 
more cpan module to fetch.  If webwork is going to access the cvs 
command on their system, it may be more inconvenient at installation time.

Now that I think about it, the first reason is now out.  The second is 
connected to what system we choose.  If webwork is keeping track of the 
versions and just getting files as needed (i.e., doing some functions 
cvs can do) then it doesn't matter which system is storing files at 
Problem Library Central.  If sites get new versions of problems by the 
equivalent of "cvs up" and we are using subversion, presumably they need 
to have subversion installed.

What I think I have just talked myself into is:

    * if webwork does some of the cvs work and uses http to get files,
      we have more flexibility in how problems are maintained at the
      repository.
    * if we have files transmitted by cvs, cvs will do some of the work
      for us.  But, we then either use cvs itself, or increase the
      installation hassle of webwork (the latter is something I would
      not want to do).

I just thought of another complication with using a cvs-like system for 
fetching files.  Part of the repository structure currently envisioned 
is that there will be several directories with the same internal 
structure, and the process of downloading the problem library should 
amount to taking their union.  If we polish a problem, then it will move 
in the original repository, but it's location should not change the 
individual sites' machines.  Just its version number increases.  It is 
not insurmountable, but it is a complication.

>> My guess is that this is how the perl cpan module works, and it is 
>> how the xemacs package system works.
>
>
> By the way, CPAN modules are packaged in "distributions", tarballs 
> which have a predictable naming scheme and layout and a standard way 
> to build and install them.

There are lots of aspects to the cpan process.  I was only thinking of 
the part where it first seems to fetch a file which gives the modules 
available from a site and their current versions.

>> Since knowing which files need updating keys off of version numbers, 
>> we may have to keep those as part of the files' metadata.
>
>
> Would that still be a problem if you were to keep the local copy of 
> the problem database as a checked-out CVS (or Subversion) working copy?

No, then it shouldn't be a problem.  If the individual sites access the 
library via a cvs-like system, then all version control (e.g. the 
manifest mentioned below) would be handled by cvs.

>> This approach should still be ok with extra associated files.  They 
>> are listed in the manifest along with the problem files.  So, if you 
>> don't have one at the time of an update, it will be fetched for you.
>
>
> What is the manifest? I don't think you'd need any such thing if you 
> were to use a version control system to track files.

The manifest would be a list of files and version information.  It can 
also contain dependency information if we choose.  The main role would 
be to do simple version tracking, so when updating you only get the new 
and updated files.

> Thanks for explaining this all to me. If you get sick of it, just let 
> me know. I always have opinions about things that aren't really my 
> business, but if you'd like to be left alone, say the word. :)

It is good to have some discussion before moving farther forward.

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://webwork.maa.org/pipermail/webwork-devel/attachments/20050114/6886fa45/attachment.html>