Statistics Macros

Normal distribution

Usage: normal_prob(a, b, mean=>0, deviation=>1);

Computes the probability of x being in the interval (a,b) for normal distribution. The first two arguments are required. Use '-infty' for negative infinity, and 'infty' or '+infty' for positive infinity. The mean and deviation are optional, and are 0 and 1 respectively by default.

"Inverse" of normal distribution

Usage: normal_distr(prob, mean=>0, deviation=>1);

Computes the positive number b such that the probability of x being in the interval (0,b) is equal to the given probability (first argument). The mean and deviation are optional, and are 0 and 1 respectively by default. Caution: since students may use tables, they may only be able to provide the answer correct to 2 or 3 decimal places. Use tolerance when evaluating answers.

Mean function

Usage: stats_mean(@data);

Computes the artihmetic mean of a list of numbers, data. You may also pass the numbers individually.

Standard Deviation function

Usage: stats_sd(@data);

Computes the sample standard deviation of a list of numbers, data. You may also pass the numbers individually.

Sum and Sum of Squares

Usage: stats_SX_SXX(@data);

Computes the sum of the numbers and the sum of the numbers squared.

Function to trim the decimal numbers in a floating point number.

Usage: significant_decimals(x,n)

Trims the number x to have n decimal digit. ex: significant_decimals(0.12345678,4) = 0.1235

Function to generate normally distributed random numbers

Usage: urand(mean,sd,N,digits)

Generates N normally distributed random numbers with the given mean and standard deviation. The digits is the number of decimal digits to use.

Function to generate exponentially distributed random numbers

Usage: exprand(lambda,N,digits)

Generates N exponentially distributed random numbers with the given parameter, lambda. The digits is the number of decimal digits to use.

Function to generate Poisson distributed random numbers

Usage: poissonrand(lambda,N)

Generates N Poisson distributed random numbers with the given parameter, lambda.

Function to generate Binomial distributed random numbers

Usage: binomrand(p,N,num)

Generates num binomial distributed random numbers with parameters p and N.

Function to generate Bernoulli distributed random numbers

Usage: bernoullirand(p,num,{"success"=>"1","failure"=>"0"})

Generates num Bernoulli distributed random numbers with parameter p. The value for a success is given by the optional "success" parameter. The value for a failure is given by the optional "failure" parameter.

Generate random values from a discrete distribution.

Usage: discreterand($n,@tableOfProbabilities)


Example:

my $total = 10; my @probabilities = ( [0.1,"A"], [0.4,"B"], [0.3,"C"], [0.2,"D"]);

@result = discreterand($total,@probabilities); $data = ''; foreach $lupe (@result) { $data .= $lupe . ", "; } $data =~ s/,$//;

This routine will generate num random results. The distribution is in the given array. Each element in the array is itself an array. The first value in the array is the probability. The second value in the array is the value assocated with the probability.

Chi Squared statistic for a two way table

Usage: chisqrTable(@frequencies)

Example:
        @row1 = (1,2,2,2);
@row2 = (3,1,2,4);
@row3 = (1,4,2,1);
@row4 = (3,1,4,3);
@row5 = (5,2,2,4);
push(@table,~~@row1);
push(@table,~~@row2);
push(@table,~~@row3);
push(@table,~~@row4);
push(@table,~~@row5);
($chiSquared,$df) = chisqrTable(@table);

Computes the Chi Squared test statistic for a two way frequency table. Returns the test statistic and the number of degrees of freedom. The array used in the argument is a list of references to arrays that have the frequencies for each row. If one of the rows has a different number of entries than the others the routine will throw an error.

Calc the results of a t-test.

Usage: ($t,$df,$p) = t_test(t_test(mu,@data);                       # Perform a two-sided t-test.
or:    ($t,$df,$p) = t_test(t_test(mu,@data,{'test'=>'right'});     # Perform a right sided t-test
or:    ($t,$df,$p) = t_test(t_test(mu,@data,{'test'=>'left'});      # Perform a left sided t-test
or:    ($t,$df,$p) = t_test(t_test(mu,@data,{'test'=>'two-sided'}); # Perform a left sided t-test

Computes the t-statistic, the number of degrees of freedom, and the p-value after performing a t-test on the given data. the value of mu is the assumed mean for the null hypothesis. The optional argument can set whether or not a left, right, or two-sided test will be conducted.

Calc the results of a two sample t-test.

Usage: ($t,$df,$p) = two_sample_t_test(\@data1,\@data2);                       # Perform a two-sided t-test.
or:    ($t,$df,$p) = two_sample_t_test(\@data1,\@data2,{'test'=>'right'});     # Perform a right sided t-test
or:    ($t,$df,$p) = two_sample_t_test(\@data1,\@data2,{'test'=>'left'});      # Perform a left sided t-test
or:    ($t,$df,$p) = two_sample_t_test(\@data1,\@data2,{'test'=>'two-sided'}); # Perform a left sided t-test

Computes the t-statistic, the number of degrees of freedom, and the p-value after performing a two sample t-test on the given data. The test is whether or not the means are the same. The optional argument can set whether or not a left, right, or two-sided test will be conducted.

Create a data file and make a link to it.

Usage: insertDataLink($PG,linkText,@dataRefs)

Writes the given data to a file and creates a link to the data file. The string headerTitle is the label used in the anchor link. $PG is a ref to an instance of a PGcore object. (Generally just use $PG in a problem) linkText is the text to appear in the anchor/link. @dataRefs is a list of references. Each reference is assumed to be ref to an array. All of the arrays must have the same length. The last entry in the array is assumed to be the label to use in the first row of the csv file.

Usage: # Generate random data @data1 = urand(10.0,2.0,10,2); @data2 = urand(12.0,2.0,10,2); @data3 = urand(14.0,4.0,10,2); @data4 = exprand(0.1,10,2);

# Append the labels for each data set
push(@data1,"w");
push(@data2,"x");
push(@data3,"y");
push(@data4,"z");

BEGIN_TEXT

blah blah

$BR Data: \{ insertDataLink($PG,"the data",(~~@data1,~~@data2,~~@data3,~~@data4)); \} $BR

Five Point Summary function

Usage: five_point_summary(@data);
or:    five_point_summary(@data,{method=>'includeMedian'});
or:    five_point_summary(@data,{method=>'proper'});

Computes the five point summary of a list of numbers, data. You may also pass the numbers individually. The optional parameter can be used to specify that the median be included in the calculation of the quartiles if it is in the data set or whether proper proportions should be used to calculate the quartiles.

Function to calculate the Pearson's sample correlation

Usage:  $cor = sample_correlation(~~@xData,~~@yData);

Calculates the Pearson's sample correlation for the given data. The arguments are references to two arrays where each array contains the associated data.

Function to calculate the linear least squares estimate for the linear relationship between two data sets

Usage:  ($slope,$intercept,$var,$SXX) = linear_regression(~~@xdata,~~@ydata);

Give the x data in @xdata and the t data in @ydata the least squares regression line is calculated. It also returns the variance in the residuals as well as SXX, the sum of the squares of the deviations for the x values. This is done to make it easier to perform calculations on the slope parameter such as the confidence interval or perform inference procedures.

Example: @xdata = (-1,2,3,4,5,6,7); @ydata = (6,5,6,7,8,9,11); ($slope,$intercept,$var,$SXX) = linear_regression(~~@xdata,~~@ydata);

Function to calculate the frequencies for the factors in a given data set.

Usage:  %freq = frequencies(@theData)

Finds the factors in the data set and calculates the frequency of occurance for each factor. Returns a hash whose keys ar the factors and the associated values are the frequencies.

math/PGstatisticsmacros.pl