Summary:
I think that in the current WeBWorK2 environment,
the following inequality involving the MySQL Table_cache_size
and the number of MySQL file_descriptors should almost always
yield almost optimal MySQL performance:
2 x Max_Apache_child. <= 2 x Table_cache_size <= Num_file_descr. <= 256
More specifically, one should arrange for
Max_Apache_child. <= Table_cache_size <= 128 .
If one's operating system permits more than 256 file descriptors
per process, then the 128 can be replaced by
Max_file_descriptors_per_process / 2 .
To be on the safe side, one should arrange for the inequalities to be strict.
=======================================================================
Explanation.
The MySQL documentation is imprecise, but it seems to be implying
the following:
Definitions:
(1) An active_table is a MySQL table that is being used
in at least one currently executing MySQL query/update
(2) An active_table_use is a use of a MySQL table in a currently
executing MySQL query/update.
(3) An active MySQL connection/thread is a conection/thread
that has a MySQL query/update currently executing.
(4) For each active MySQL connection/thread t,
let M(t) = number of tables being actively used by its
current query/update, i.e., the number of active_table_uses
attributable to that connection/thread.
Remark:
The distinction between (1) and (2) is that a table might be being
used by more than one currently executing MySQL query, because
several MySQL threads can concurrently read the same table.
Furthermore, a single MySQL thread can actively "use" the same
table more than once by joining the table to itself.
Note: An SQL join of tables is a subset of the Cartesian product
of the tables.
Definition: Let N = maximum number of tables in an active MySQL join
(= maximum number of factors in the corresponding Cartesian products).
Lemma 1: N = max{M(t): t is an active connection/thread}
Explanation: A query with an SQL join is the only SQL situation
in which more than one table needs to be open
simultaneously.
Lemma 2: For an active WeBWorK->MySql connection/thread w, M(w)=1.
Justification:
The WW database methods do not currently support SQL joins.
WW does table joins programmatically. For example,
StudentProgress.pm programmatically joins
problemRecord data and studentRecord data to produce
the Student Progress Report.
Assumptions: Currently, there is only one WeBWorK thread per Apache child
and the WeBWorK programs themselves are single-threaded.
Notation: A = {t: t is an active MySQL connection/thread}
W = {t: t is a WeBWorK MySQL connection/thread}
Corollary: | W \cap A | <= Max_Apache_children.
Justification: Although a WeBWorK thread (in Apache) may
have more than one MySQL connection/thread, only one
of them can be active at any time due to the (assumption of)
single-threaded-ness.
Remark: | W | may be large if WeBWorK2 does not close its MySQL
connections, because of the persistence of the WeBWorK2 programs
in Apache's mod_perl environment. So one may perhaps need
a large upper bound for the number MySQL connections.
But, at any instant, most of these connections
will be dormant.
------ Table Cache Size ------------------------
Proposition: The table_cache_size_needed (TCSN) needed (at any moment) is
TCSN = total number of active_table_uses
= Sum_{t in A} M(i)
= Sum_{t in (W \cap A)} M(i) + Sum_{t in A-W} M(i)
= Sum_{t in (W \cap A)} 1 + Sum_{t in A-W} M(i)
= | W \cap A | + Sum_{t in A-W} M(i)
= | W \cap A | + | A-W | N
-----------------------------------------
Assumption: | A-W | = 0 almost always for a WeBWorK MySQL database.
The only exceptions should be when an administrator
or an administrative process (e.g., a roster update
or a gradebook -> Registrar process) is accessing the
database. (For these relatively rare episodes, one can rely
on MySQL's dynamic resizing of the table cache.)
Corollary: TCSN <= Max_Apache_children almost always.
------- File Descriptors -------------------------
Proposition: The number of file descriptors needed (NFDN) (at any
moment) is
NFDN = total_number_of_active_table_uses + number_of_active_tables
= TCSN + number_of_active_tables.
Justification: MySQL documentation.
Corollary: NFDN <= TCSN + TCSN = 2 TCSN
Corollary: NFDN <= 2 Max_Apache_children almost always.
Fact. The number of file descriptors available to a process apparently
varies from one version of Unix to another and from one installation
to another. For instance, in a 32-bit Solaris operating system,
the maximum number of file descriptors available to a process is
2^8 = 256, the number of indices representable on a single 8-bit byte.
Because one byte is the smallest possible storage for an unsigned
integer, it seems likely that every Unix operating system does
provide for at least 2^8 = 256 file descriptors per process.
----------------------------------------------------
The assertions in the summary are consequences of the preceding
corollaries and facts.
----------------------------------------------------
Sincerely,
Bill
<| Post or View Comments |>
|