Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Quiz statistics calculations: Difference between revisions

From MoodleDocs
No edit summary
No edit summary
Line 45: Line 45:
''(The only way I can think of to do this is to type TeX notation like $$x_i$$ even though MoodleDocs does not currently render it nicely. Maybe it will one day. In the mean time, if you want to see things nicely formatted, you will have to copy them to a Moodle with a working TeX filter.)''
''(The only way I can think of to do this is to type TeX notation like $$x_i$$ even though MoodleDocs does not currently render it nicely. Maybe it will one day. In the mean time, if you want to see things nicely formatted, you will have to copy them to a Moodle with a working TeX filter.)''


We have a lot of students $$s \in S$$, and a test comprising a number of items (questions) $$i \in I_s$$. That subscript s is because different students may have recieved different items, for example because if different random question. Similarly, we have $$S_i$$, the set of students who received item $$i$$.
We have a lot of students $$s \in S$$.


Each item has a maximum and minimum possible score within the test, $$x_i(min)$$ and $$x_i(max)$$. At the moment in Moodle, $$x_i(min)$$ is always zero, but we cannot assume that will continue to be the case. $$x_i(max)$$ is database column quiz_question_instances.grade.
The test has a number of positions $$p \in P$$.


Then, each student achieved an actual score $$x_i(s)$$ on each item. So $$x_i(min) \le x_i(s) \le x_i(max)$$.
The test is assembled from a number of items $$i \in I$$.


$$x_i(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.
Because of random questions, different students may have recieved different items in different positions, so $$i(p, s)$$ is the item student $$i$$ received in position $$p$$.
 
Let $$I_s$$ be the set of items that student $$s$$ saw. Let $$S_i$$ be the set of students who attempted item $$i$$.
 
Each position has a maximum and minimum possible contribution to the test score, $$x_p(min)$$ and $$x_p(max)$$. At the moment in Moodle, $$x_p(min)$$ is always zero, but we cannot assume that will continue to be the case. $$x_p(max)$$ is database column quiz_question_instances.grade.
 
Then, each student achieved an actual score $$x_p(s)$$ on the item in position $$p$$. So $$x_p(min) \le x_p(s) \le x_p(max)$$.
 
$$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.


Each student has a total score  
Each student has a total score  


$$T_s = max(\sum_{i \in I} $$x_i(s)$$, 0).
$$T_s = \sum_{p \in P} x_p(s)$$


Similarly, there are the maximum and minimum possible test scores
Similarly, there are the maximum and minimum possible test scores


$$T_max$$ = \sum_{i \in I} $$x_i(max)$$
$$T_max = \sum_{p \in P} x_p(max)$$


and  
and  


$$T_min = \sum_{i \in I} $$x_i(min)$$
$$T_min = \sum_{p \in P} x_p(min)$$


We need some calculated intermediate quantities for use in the formulae below.


Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$.
Average score for a position: $$\bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)$$.
Average rest of test score for a position: $$\bar{X}_p = \frac{1}{S} \sum_{s \in S} X_p(s)$$.
Score variance for a position: $$V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2$$.
Rest of test score variance for a position: $$V(X_p) = \frac{1}{S - 1} \sum_{s \in S} (X_p(s) - \bar{X}_p)^2$$.
Score to rest of test score covariance for a position: $$C(x_p, X_p} = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$


==Item statistics==
==Item statistics==
Line 75: Line 95:
===Facility index===
===Facility index===


===Standard deviation===
===Discrimination index===
===Discriminative efficiency===
===Effective question weight===




==Test statistics==
==Test statistics==
===Mean Score===
$$Test mean = \bar{T} = \frac{1}{S} \sum_{s \in S} T_s$$
===Median Score===
Sort all the $$T_s$$, and take the middle one if S is odd, or the average of the two middle ones if S is even.
===Standard Deviation===
$$Test standard deviation = \sqrt{\frac{1}{S - 1} \sum_{s \in S} (T_s - \bar{T})^2}$$.
===Skewness and Kurtosis===
Probably of limited interest, but included for completeness. First calculate:
$$m_2 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^2$$
$$m_3 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^3$$
$$m_4 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^4$$
Then compute
$$k_2 = \frac{S}{S - 1} m_2$$
$$k_3 = \frac{S^2}{(S - 1)(S - 2)} m_3$$
$$k_4 = \frac{S^3}{(S - 1)(S - 2)(S - 3)} ((S + 1)m_4 - 3(S - 1)m_2^2)$$
Then
$$Skewness = \frac{k_3}{k_2^{2/3}}$$
$$Kurtosis = \frac{k_4}{k_2^2}$$
===Coefficient of Internal Consistency===
===Error Ratio===
===Standard Error===


==Detailed item information==
==Detailed item information==

Revision as of 16:57, 12 February 2008

General issues

Quizzes that allow multiple attempts

For quizzes that allow multiple attempts, by default the report should only include data from the first attempt by each student. Subsequent attempts probably do not satisfy the assumptions that underlie item analysis. However, there should be an option 'Include data from all attempts', which should have a disclaimer that this may be statistically invalid either near it on screen, or possibly in the help file. (For small data sets, it may be better to include all data.)

Using the first attempt also avoids problems caused by each attempt builds on last.

Within the analysis, when multiple attempts per student are included, each attempt is treadted as an independant attempt.

Adaptive mode

Adaptive mode does not pose a problem. We just assume that each item in the test returns a score, and these scores are added up to get the test score. That is, we use the item score including penalties in the calculation of the statistics.

Certainty based marking

Similarly, should CBM, and/or negative scoring for multiple choice questions be implemented, we just use the final item score in the calculations, making sure that the formulae are not assuming that the minimum item score is zero.

Incomplete attempts

There is an issue about what you do when not all students have answered all questions. Depending on how you handle these missing items, you distort the statistics in different ways.

There are basically two reasons why a student may not have answered a particular question:

  • they may have chosen to omit it, or
  • they may have run out of time, if the test is timed. In this case, omitted questions tend to be towards the end of the test.

Available approaches for handling this are:

  1. treat omitted items as having a score of 0.
  2. exclude any attempt with missing scores from the analysis.
  3. Analyse each question separately - and when analysing a particular question, just include students who answered that question in the analysis.

I think we should implement 1 for now. This is how Moodle currently works - a blank answer is marked and receives a score of zero, and so is indistinguishable from a wrong answer. Item analysis is most important for summative tests, and the result of that question being in the test is that it gave that student a contribution of 0 marks towards their final score.

Random questions

In a quiz with random questions not all students will have attempted the same set of questions. We need to decide what to do about that too. This is related to the previous section.

At the moment, Moodle does manage to distinguish which actual question the student answered which questions, and analyses each question separately, as in option 3 in the previous section. This is quite nice, but relies on the very computationally expensive way the report currently operates - it would be impossible to improve this using more SQL and less PHP.

(Need to decide what to do here).

Notation used in the calculations

(The only way I can think of to do this is to type TeX notation like $$x_i$$ even though MoodleDocs does not currently render it nicely. Maybe it will one day. In the mean time, if you want to see things nicely formatted, you will have to copy them to a Moodle with a working TeX filter.)

We have a lot of students $$s \in S$$.

The test has a number of positions $$p \in P$$.

The test is assembled from a number of items $$i \in I$$.

Because of random questions, different students may have recieved different items in different positions, so $$i(p, s)$$ is the item student $$i$$ received in position $$p$$.

Let $$I_s$$ be the set of items that student $$s$$ saw. Let $$S_i$$ be the set of students who attempted item $$i$$.

Each position has a maximum and minimum possible contribution to the test score, $$x_p(min)$$ and $$x_p(max)$$. At the moment in Moodle, $$x_p(min)$$ is always zero, but we cannot assume that will continue to be the case. $$x_p(max)$$ is database column quiz_question_instances.grade.

Then, each student achieved an actual score $$x_p(s)$$ on the item in position $$p$$. So $$x_p(min) \le x_p(s) \le x_p(max)$$.

$$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.

Each student has a total score

$$T_s = \sum_{p \in P} x_p(s)$$

Similarly, there are the maximum and minimum possible test scores

$$T_max = \sum_{p \in P} x_p(max)$$

and

$$T_min = \sum_{p \in P} x_p(min)$$

We need some calculated intermediate quantities for use in the formulae below.

Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$.

Average score for a position: $$\bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)$$.

Average rest of test score for a position: $$\bar{X}_p = \frac{1}{S} \sum_{s \in S} X_p(s)$$.

Score variance for a position: $$V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2$$.

Rest of test score variance for a position: $$V(X_p) = \frac{1}{S - 1} \sum_{s \in S} (X_p(s) - \bar{X}_p)^2$$.

Score to rest of test score covariance for a position: $$C(x_p, X_p} = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$

Item statistics

Intended question weight

Random guess score

Facility index

Standard deviation

Discrimination index

Discriminative efficiency

Effective question weight

Test statistics

Mean Score

$$Test mean = \bar{T} = \frac{1}{S} \sum_{s \in S} T_s$$

Median Score

Sort all the $$T_s$$, and take the middle one if S is odd, or the average of the two middle ones if S is even.

Standard Deviation

$$Test standard deviation = \sqrt{\frac{1}{S - 1} \sum_{s \in S} (T_s - \bar{T})^2}$$.

Skewness and Kurtosis

Probably of limited interest, but included for completeness. First calculate:

$$m_2 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^2$$

$$m_3 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^3$$

$$m_4 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^4$$

Then compute

$$k_2 = \frac{S}{S - 1} m_2$$

$$k_3 = \frac{S^2}{(S - 1)(S - 2)} m_3$$

$$k_4 = \frac{S^3}{(S - 1)(S - 2)(S - 3)} ((S + 1)m_4 - 3(S - 1)m_2^2)$$

Then

$$Skewness = \frac{k_3}{k_2^{2/3}}$$

$$Kurtosis = \frac{k_4}{k_2^2}$$

Coefficient of Internal Consistency

Error Ratio

Standard Error

Detailed item information