Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Quiz statistics calculations: Difference between revisions

From MoodleDocs
m (Undo revision 39690 by Jamiesensei (talk))
mNo edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Moodle 2.0}}
{{Moodle 2.0}}
{{Quiz item analysis calculations}}
{{Quiz item analysis calculations}}
==Introduction==
This page describes the calculations of these quiz and question statistics. You can find a succinct description of their meanings here [[Quiz report statistics]]. Also see the documentation for the [https://docs.moodle.org/25/en/Quiz_statistics_report Quiz statistics report] itself.


==General issues==
==General issues==
Line 42: Line 46:
==Notation used in the calculations==
==Notation used in the calculations==


''Note: The only way I can think of to write this document with all the maths is to type TeX notation like $$x_i$$ even though MoodleDocs does not currently render it nicely. Maybe one day it will. In the mean time, if you want to see things nicely formatted, you will have to copy them to a Moodle with a working TeX filter, or copy and paste the following magic into your browser's Address bar and hit enter (or get the [http://www.mathtran.org/wiki/index.php/MathTran_bookmarklet MathTran bookmarklet):''
We have a lot of students  <math>s \in S</math>, who have completed at least one attempt on the quiz.
javascript:(function(){function searchWithinNode(node,re){var pos,imgnnode,middlebit,endbitskip=0;if(node.nodeType==3){pos=node.data.search(re);if(pos>=0){middlebit=node.splitText(pos);endbit=middlebit.splitText(RegExp.lastMatch.length);imgnode=document.createElement(%22img%22);imgnode.src=%22http://www.mathtran.org/cgi-bin/mathtran?tex=%22 + encodeURI(middlebit.data).replace(%22+%22,%22%252B%22);middlebit.parentNode.replaceChild(imgnode,middlebit);}}else if(node.nodeType==1&& node.childNodes){for (var child=0; child < node.childNodes.length; ++child){searchWithinNode(node.childNodes[child], re);}}}searchWithinNode(document.body, /\$\$(.*?)\$\$/);})();
 
We have a lot of students  $$s \in S$$, who have completed at least one attempt on the quiz.


The test has a number of positions $$p \in P$$.
The test has a number of positions <math>p \in P</math>.


The test is assembled from a number of items $$i \in I$$.
The test is assembled from a number of items <math>i \in I</math>.


Because of random questions, different students may have recieved different items in different positions, so $$i(p, s)$$ is the item student $$s$$ received in position $$p$$.
Because of random questions, different students may have recieved different items in different positions, so <math>i(p, s)</math> is the item student <math>s</math> received in position <math>p</math>.


Let $$I_s$$ be the set of items that student $$s$$ saw. Let $$S_i$$ be the set of students who attempted item $$i$$.
Let <math>I_s</math> be the set of items that student <math>s</math> saw. Let <math>S_i</math> be the set of students who attempted item <math>i</math>.


Each position has a maximum and minimum possible contribution to the test score, $$x_p(min)$$ and $$x_p(max)$$. At the moment in Moodle, $$x_p(min)$$ is always zero, but we cannot assume that will continue to be the case. $$x_p(max)$$ is database column quiz_question_instances.grade.
Each position has a maximum and minimum possible contribution to the test score, <math>x_p(min)</math> and <math>x_p(max)</math>. At the moment in Moodle, <math>x_p(min)</math> is always zero, but we cannot assume that will continue to be the case. <math>x_p(max)</math> is database column quiz_question_instances.grade.


Then, each student achieved an actual score $$x_p(s)$$ on the item in position $$p$$. So $$x_p(min) \le x_p(s) \le x_p(max)$$.
Then, each student achieved an actual score <math>x_p(s)</math> on the item in position <math>p</math>. So <math>x_p(min) \le x_p(s) \le x_p(max)</math>.


$$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.
<math>x_p(s)</math> should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.


We can also think of the student's score on a particular item $$x_i(s)$$. However, in this case, the score should be measured out of the Default question grade for this question. Also, there is a $$x_i(max)$$ ( = Default question grade) and $$x_i(min)$$ (currently, zero, but if we allow negative marking, that will change).
We can also think of the student's score on a particular item <math>x_i(s)</math>. However, in this case, the score should be measured out of the Default question grade for this question. Also, there is a <math>x_i(max)</math> ( = Default question grade) and <math>x_i(min)</math> (currently, zero, but if we allow negative marking, that will change).


$$x_{i(p, s)}(s) = x_p(s)\frac{x_i(max)}{x_p(max)}$$.
<math>x_{i(p, s)}(s) = x_p(s)\frac{x_i(max)}{x_p(max)}</math>.


Each student has a total score  
Each student has a total score  


$$\displaystyle T_s = \sum_{p \in P} x_p(s)$$
<math>\displaystyle T_s = \sum_{p \in P} x_p(s)</math>


Similarly, there are the maximum and minimum possible test scores
Similarly, there are the maximum and minimum possible test scores


$$\displaystyle T_{max} = \sum_{p \in P} x_p(max)$$
<math>\displaystyle T_{max} = \sum_{p \in P} x_p(max)</math>


and  
and  


$$\displaystyle T_{min} = \sum_{p \in P} x_p(min)$$
<math>\displaystyle T_{min} = \sum_{p \in P} x_p(min)</math>


===Intermediate calculations===
===Intermediate calculations===
Line 81: Line 82:
To simplify the form of the formulas below, we need some intermediate calculated quantities, derived from the ones above.
To simplify the form of the formulas below, we need some intermediate calculated quantities, derived from the ones above.


Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$.
Student's rest of test score for a position: <math>X_p(s) = T_s - x_p(s)</math>.


For any quantity that depends on position (for example $$x_p$$ or $$X_p$$), it's average is denoted with an overbar, and is an average over all students, so  
For any quantity that depends on position (for example <math>x_p</math> or <math>X_p</math>), it's average is denoted with an overbar, and is an average over all students, so  


$$\displaystyle \bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)$$.
<math>\displaystyle \bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)</math>.


When a quantity is a property of an item, a bar denotes an average over all students who attempted that item, so
When a quantity is a property of an item, a bar denotes an average over all students who attempted that item, so


$$\displaystyle \bar{x}_i = \frac{1}{S_i} \sum_{s \in S_i} x_i(s)$$.
<math>\displaystyle \bar{x}_i = \frac{1}{S_i} \sum_{s \in S_i} x_i(s)</math>.


Similarly we have the variance of a quantity depending on position:
Similarly we have the variance of a quantity depending on position:


$$\displaystyle V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2$$
<math>\displaystyle V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2</math>


and for a quantity depending on items:
and for a quantity depending on items:


$$\displaystyle V(x_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)^2$$.
<math>\displaystyle V(x_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)^2</math>.


Finally, we need co-variances of two quantites, for example:
Finally, we need co-variances of two quantites, for example:


$$\displaystyle C(x_p, X_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$
<math>\displaystyle C(x_p, X_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)</math>


$$\displaystyle C(x_i, X_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)(X_i(s) - \bar{X}_i)$$.
<math>\displaystyle C(x_i, X_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)(X_i(s) - \bar{X}_i)</math>.


==Position statistics==
==Position statistics==
Line 111: Line 112:
This is the average score on the item, expressed as a percentage:
This is the average score on the item, expressed as a percentage:


$$\displaystyle F_p = 100\frac{\bar{x}_p - x_p(min)}{x_p(max) - x_p(min)}$$.
<math>\displaystyle F_p = 100\frac{\bar{x}_p - x_p(min)}{x_p(max) - x_p(min)}</math>.


The higher the facility index, the easier the question is (for this cohort of students).
The higher the facility index, the easier the question is (for this cohort of students).
Line 119: Line 120:
Again expressed on a percentage scale:
Again expressed on a percentage scale:


$$\displaystyle SD_p = 100\frac{\sqrt{V(x_p)}}{x_p(max) - x_p(min)}$$.
<math>\displaystyle SD_p = 100\frac{\sqrt{V(x_p)}}{x_p(max) - x_p(min)}</math>.


===Discrimination index===
===Discrimination index===


This is the product moment correlation coefficient between $$x_p$$ and $$X_p$$, expressed on a percentage scale. That is,
This is the product moment correlation coefficient between <math>x_p</math> and <math>X_p</math>, expressed on a percentage scale. That is,


$$\displaystyle D_p = 100r(x_p, X_p) = 100\frac{C(x_p, X_p)}{\sqrt{V(x_P) V(X_p)}}$$.
<math>\displaystyle D_p = 100r(x_p, X_p) = 100\frac{C(x_p, X_p)}{\sqrt{V(x_P) V(X_p)}}</math>.


The idea is that for a good question (or at least a question that fits in with the other questions in the test), students who have scored highly on the other parts of the test should also have scored highly on this question, so the score for the question and the score for the test as a whole should be well correlated.
The idea is that for a good question (or at least a question that fits in with the other questions in the test), students who have scored highly on the other parts of the test should also have scored highly on this question, so the score for the question and the score for the test as a whole should be well correlated.


The weakness of this statistic is that, unless the facility index is 50%, it is impossible for the discrimination index to be 100%, or, to put it another way, if $$F_p$$ is close to 0% or 100%, $$D_p$$ will always be very small. That makes interpreting this statistic difficult.
The weakness of this statistic is that, unless the facility index is 50%, it is impossible for the discrimination index to be 100%, or, to put it another way, if <math>F_p</math> is close to 0% or 100%, <math>D_p</math> will always be very small. That makes interpreting this statistic difficult.


===Discriminative efficiency===
===Discriminative efficiency===


This gets around that weakness in the discrimination index by expressing $$C(x_p, X_p)$$ as a percentage of the maximum value it could have taken given the scores the students got on this question, and the test as a whole. That is:
This gets around that weakness in the discrimination index by expressing <math>C(x_p, X_p)</math> as a percentage of the maximum value it could have taken given the scores the students got on this question, and the test as a whole. That is:


$$\displaystyle DE_p = 100\frac{C(x_p, X_p)}{C_{max}(x_p, X_p)}$$
<math>\displaystyle DE_p = 100\frac{C(x_p, X_p)}{C_{max}(x_p, X_p)}</math>


where $$C_{max}(x_p, X_p)$$ is defined as follows:
where <math>C_{max}(x_p, X_p)</math> is defined as follows:


When you compute $$C(x_p, X_p)$$, you do the sum  
When you compute <math>C(x_p, X_p)</math>, you do the sum  


$$\displaystyle C(x_p, X_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$
<math>\displaystyle C(x_p, X_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)</math>


which involves a term for each student combining their question score and rest of test score. That is, you start with an array of $$x_p(s)$$ with an array of corresponding $$X_p(s)$$, one for each $$s$$. To compute $$C_{max}(x_p, X_p)$$, you just sort these two arrays before applying the above formula. That is, for the purpose of computing $$C_{max}$$, you pretend that the first student scored the lowest $$x_p$$ and the lowest $$X_p$$, the second student scored the second lowest $$x_p$$ and the second lowest $$X_p$$, and so on to the last student, who scored the highest $$x_p$$ and $$X_p$$.
which involves a term for each student combining their question score and rest of test score. That is, you start with an array of <math>x_p(s)</math> with an array of corresponding <math>X_p(s)</math>, one for each <math>s</math>. To compute <math>C_{max}(x_p, X_p)</math>, you just sort these two arrays before applying the above formula. That is, for the purpose of computing <math>C_{max}</math>, you pretend that the first student scored the lowest <math>x_p</math> and the lowest <math>X_p</math>, the second student scored the second lowest <math>x_p</math> and the second lowest <math>X_p</math>, and so on to the last student, who scored the highest <math>x_p</math> and <math>X_p</math>.


===Intended question weight===
===Intended question weight===
Line 149: Line 150:
How much this question was supposed to contribute to determining the overall test score.
How much this question was supposed to contribute to determining the overall test score.


$$\displaystyle IQW_p = 100\frac{x_p(max) - x_p(min)}{T_{max} - T_{min}}$$.
<math>\displaystyle IQW_p = 100\frac{x_p(max) - x_p(min)}{T_{max} - T_{min}}</math>.


===Effective question weight===
===Effective question weight===
Line 155: Line 156:
This is an estimate of what proportion of the variance in the students' test scores is due this question.
This is an estimate of what proportion of the variance in the students' test scores is due this question.


$$\displaystyle EQW_p = 100\frac{\sqrt{C(x_p, T)}}{\sum_{p \in P}\sqrt{C(x_p, T)}}$$.
<math>\displaystyle EQW_p = 100\frac{\sqrt{C(x_p, T)}}{\sum_{p \in P}\sqrt{C(x_p, T)}}</math>.


==Item statistics==
==Item statistics==
Line 161: Line 162:
===Number of attempts===
===Number of attempts===


The number of students who got this question as part of a quiz attempt. This is just $$S_i$$.
The number of students who got this question as part of a quiz attempt. This is just <math>S_i</math>.


===Random guess score===
===Random guess score===
Line 173: Line 174:
===Facility index===
===Facility index===


$$\displaystyle F_i = 100\frac{\bar{x}_i - x_i(min)}{x_i(max) - x_i(min)}$$.
<math>\displaystyle F_i = 100\frac{\bar{x}_i - x_i(min)}{x_i(max) - x_i(min)}</math>.


===Standard deviation===
===Standard deviation===


$$\displaystyle SD_i = 100\frac{\sqrt{V(x_i)}}{x_i(max) - x_i(min)}$$.
<math>\displaystyle SD_i = 100\frac{\sqrt{V(x_i)}}{x_i(max) - x_i(min)}</math>.


===Discrimination index===
===Discrimination index===


$$\displaystyle D_i = 100r(x_i, T) = 100\frac{C(x_i, T)}{\sqrt{V(x_i) V(T)}}$$.
<math>\displaystyle D_i = 100r(x_i, T) = 100\frac{C(x_i, T)}{\sqrt{V(x_i) V(T)}}</math>.


(It is not possible to define $$X_i$$ because it is conceivable that the same item may have been chosen in different positions with different weights, so we substitute $$T$$ instead.)
(It is not possible to define <math>X_i</math> because it is conceivable that the same item may have been chosen in different positions with different weights, so we substitute <math>T</math> instead.)


===Discriminative efficiency===
===Discriminative efficiency===


$$\displaystyle DE_i = 100\frac{C(x_i, T)}{C_{max}(x_i, T)}$$.
<math>\displaystyle DE_i = 100\frac{C(x_i, T)}{C_{max}(x_i, T)}</math>.


==Test statistics==
==Test statistics==
Line 193: Line 194:
===Number of Attempts===
===Number of Attempts===


This is $$S$$.
This is <math>S</math>.


Note that depending on the options, we may be counting one or all attempts per student.
Note that depending on the options, we may be counting one or all attempts per student.
Line 199: Line 200:
===Mean Score===
===Mean Score===


Test mean $$\displaystyle = \bar{T} = \frac{1}{S} \sum_{s \in S} T_s$$
Test mean <math>\displaystyle = \bar{T} = \frac{1}{S} \sum_{s \in S} T_s</math>


===Median Score===
===Median Score===


Sort all the $$T_s$$, and take the middle one if S is odd, or the average of the two middle ones if S is even.
Sort all the <math>T_s</math>, and take the middle one if S is odd, or the average of the two middle ones if S is even.


===Standard Deviation===
===Standard Deviation===


Test standard deviation $$\displaystyle = SD = \sqrt{V(t)} = \sqrt{\frac{1}{S - 1} \sum_{s \in S} (T_s - \bar{T})^2}$$.
Test standard deviation <math>\displaystyle = SD = \sqrt{V(t)} = \sqrt{\frac{1}{S - 1} \sum_{s \in S} (T_s - \bar{T})^2}</math>.


===Skewness and Kurtosis===
===Skewness and Kurtosis===
Line 215: Line 216:
First calculate:
First calculate:


$$\displaystyle m_2 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^2$$
<math>\displaystyle m_2 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^2</math>


$$\displaystyle m_3 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^3$$
<math>\displaystyle m_3 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^3</math>


$$\displaystyle m_4 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^4$$
<math>\displaystyle m_4 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^4</math>


Then compute
Then compute


$$\displaystyle k_2 = \frac{S}{S - 1} m_2 = V(T)$$
<math>\displaystyle k_2 = \frac{S}{S - 1} m_2 = V(T)</math>


$$\displaystyle k_3 = \frac{S^2}{(S - 1)(S - 2)} m_3$$
<math>\displaystyle k_3 = \frac{S^2}{(S - 1)(S - 2)} m_3</math>


$$\displaystyle k_4 = \frac{S^3}{(S - 1)(S - 2)(S - 3)} \left((S + 1)m_4 - 3(S - 1)m_2^2\right)$$
<math>\displaystyle k_4 = \frac{S^2}{(S - 1)(S - 2)(S - 3)} \left((S + 1)m_4 - 3(S - 1)m_2^2\right)</math>


Then
Then


Skewness $$\displaystyle = \frac{k_3}{k_2^{2/3}}$$
Skewness <math>\displaystyle = \frac{k_3}{k_2^{3/2}}</math>


Kurtosis $$\displaystyle = \frac{k_4}{k_2^2}$$
Kurtosis <math>\displaystyle = \frac{k_4}{k_2^2}</math>


===Coefficient of Internal Consistency===
===Coefficient of Internal Consistency===
Line 239: Line 240:
This is on a percentage scale
This is on a percentage scale


$$\displaystyle CIC = 100 \frac{P}{P - 1} \left(1 - \frac{1}{V(T)}\sum_{p \in P} V(x_p) \right)$$
<math>\displaystyle CIC = 100 \frac{P}{P - 1} \left(1 - \frac{1}{V(T)}\sum_{p \in P} V(x_p) \right)</math>
 
Also called Cronbach's alpha in the literature.


===Error Ratio===
===Error Ratio===
Line 245: Line 248:
Also a percentage.
Also a percentage.


$$\displaystyle ER = 100 \sqrt{1 - \frac{CIC}{100}}$$
<math>\displaystyle ER = 100 \sqrt{1 - \frac{CIC}{100}}</math>


===Standard Error===
===Standard Error===


$$\displaystyle SE = \frac{ER}{100}SD$$
<math>\displaystyle SE = \frac{ER}{100}SD</math>


These last three are to do with estimating how reliable the test scores are. If you take the view that the score the student got on the test on the day is a combination of their actual ability and a random error (how lucky they were on the day of the test), the the standard error is an estimate of the luck factor. So if SE ~= 10, and the student scored 60, then you can be quite confident that their actual ability is between 50 and 70.
These last three are to do with estimating how reliable the test scores are. If you take the view that the score the student got on the test on the day is a combination of their actual ability and a random error (how lucky they were on the day of the test), the the standard error is an estimate of the luck factor. So if SE ~= 10, and the student scored 60, then you can be quite confident that their actual ability is between 50 and 70.

Latest revision as of 05:01, 25 September 2013

Moodle 2.0


Introduction

This page describes the calculations of these quiz and question statistics. You can find a succinct description of their meanings here Quiz report statistics. Also see the documentation for the Quiz statistics report itself.

General issues

Quizzes that allow multiple attempts

For quizzes that allow multiple attempts, by default the report should only include data from the first attempt by each student. (The data for subsequent attempts probably does not satisfy the assumptions that underlie item analysis statistics.) However, there should be an option 'Include data from all attempts', which should have a disclaimer that this may be statistically invalid either near it on screen, or possibly in the help file. (For small data sets, it may be better to include all data.)

Using the first attempt also avoids problems caused by each attempt builds on last.

Within the analysis, when multiple attempts per student are included, each attempt is treated as an independent attempt. (That is, we pretend each attempt was by a different student.)

Adaptive mode

Adaptive mode does not pose a problem. Item Analysis just supposes that each item in the test returns a score, and these scores are added up to get the test score. Therefore Item Analysis does not care about adaptive/non-adaptive mode. However, just to be clear, for an adaptive question, the score used in the calculation is the final score for the item, including penalties.

Certainty based marking

Similarly, should CBM, and/or negative scoring for multiple choice questions be implemented, we just use the final item score in the calculations, making sure that the formulae are not assuming that the minimum item score is zero.

Incomplete attempts

There is an issue about what you do when not all students have answered all questions. Depending on how you handle these missing items, you distort the statistics in different ways.

There are basically two reasons why a student may not have answered a particular question:

  • they may have chosen to omit it, or
  • they may have run out of time, if the test is timed. In this case, omitted questions tend to be towards the end of the test.

Available approaches for handling this are:

  1. treat omitted items as having a score of 0.
  2. exclude any attempt with missing scores from the analysis.
  3. Analyse each question separately - and when analysing a particular question, just include students who answered that question in the analysis.

I think we should implement 1 for now. This is how Moodle currently works - a blank answer is marked and receives a score of zero, and so is indistinguishable from a wrong answer. Item analysis is most important for summative tests, and the result of that question being in the test is that it gave that student a contribution of 0 marks towards their final score.

Random questions

In a quiz with random questions not all students will have attempted the same set of questions. To account for this, the analysis below distinguishes between positions in the test, and test items.

Notation used in the calculations

We have a lot of students , who have completed at least one attempt on the quiz.

The test has a number of positions .

The test is assembled from a number of items .

Because of random questions, different students may have recieved different items in different positions, so is the item student received in position .

Let be the set of items that student saw. Let be the set of students who attempted item .

Each position has a maximum and minimum possible contribution to the test score, and . At the moment in Moodle, is always zero, but we cannot assume that will continue to be the case. is database column quiz_question_instances.grade.

Then, each student achieved an actual score on the item in position . So .

should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.

We can also think of the student's score on a particular item . However, in this case, the score should be measured out of the Default question grade for this question. Also, there is a ( = Default question grade) and (currently, zero, but if we allow negative marking, that will change).

.

Each student has a total score

Similarly, there are the maximum and minimum possible test scores

and

Intermediate calculations

To simplify the form of the formulas below, we need some intermediate calculated quantities, derived from the ones above.

Student's rest of test score for a position: .

For any quantity that depends on position (for example or ), it's average is denoted with an overbar, and is an average over all students, so

.

When a quantity is a property of an item, a bar denotes an average over all students who attempted that item, so

.

Similarly we have the variance of a quantity depending on position:

and for a quantity depending on items:

.

Finally, we need co-variances of two quantites, for example:

.

Position statistics

Facility index

This is the average score on the item, expressed as a percentage:

.

The higher the facility index, the easier the question is (for this cohort of students).

Standard deviation

Again expressed on a percentage scale:

.

Discrimination index

This is the product moment correlation coefficient between and , expressed on a percentage scale. That is,

.

The idea is that for a good question (or at least a question that fits in with the other questions in the test), students who have scored highly on the other parts of the test should also have scored highly on this question, so the score for the question and the score for the test as a whole should be well correlated.

The weakness of this statistic is that, unless the facility index is 50%, it is impossible for the discrimination index to be 100%, or, to put it another way, if is close to 0% or 100%, will always be very small. That makes interpreting this statistic difficult.

Discriminative efficiency

This gets around that weakness in the discrimination index by expressing as a percentage of the maximum value it could have taken given the scores the students got on this question, and the test as a whole. That is:

where is defined as follows:

When you compute , you do the sum

which involves a term for each student combining their question score and rest of test score. That is, you start with an array of with an array of corresponding , one for each . To compute , you just sort these two arrays before applying the above formula. That is, for the purpose of computing , you pretend that the first student scored the lowest and the lowest , the second student scored the second lowest and the second lowest , and so on to the last student, who scored the highest and .

Intended question weight

How much this question was supposed to contribute to determining the overall test score.

.

Effective question weight

This is an estimate of what proportion of the variance in the students' test scores is due this question.

.

Item statistics

Number of attempts

The number of students who got this question as part of a quiz attempt. This is just .

Random guess score

This is the score that the student would have got by guessing randomly. It depends on the question type. For types like shortanswer, it is 0 - or the score associated with answer '*', if there is one.

For multiple choice questions (including matching, truefalse, etc.) it is the average score over all the possible choices.

(There should probably be a method in the question type class to compute this.)

Facility index

.

Standard deviation

.

Discrimination index

.

(It is not possible to define because it is conceivable that the same item may have been chosen in different positions with different weights, so we substitute instead.)

Discriminative efficiency

.

Test statistics

Number of Attempts

This is .

Note that depending on the options, we may be counting one or all attempts per student.

Mean Score

Test mean

Median Score

Sort all the , and take the middle one if S is odd, or the average of the two middle ones if S is even.

Standard Deviation

Test standard deviation .

Skewness and Kurtosis

Probably of limited interest, but included for completeness. Skewness is a measure of the asymmetry in a distribution. Kurtosis - imagine a normal distribution. Kurtosis tells you if your distribution has more of a bulge, but thinner tails, or vice-versa.

First calculate:

Then compute

Then

Skewness

Kurtosis

Coefficient of Internal Consistency

This is on a percentage scale

Also called Cronbach's alpha in the literature.

Error Ratio

Also a percentage.

Standard Error

These last three are to do with estimating how reliable the test scores are. If you take the view that the score the student got on the test on the day is a combination of their actual ability and a random error (how lucky they were on the day of the test), the the standard error is an estimate of the luck factor. So if SE ~= 10, and the student scored 60, then you can be quite confident that their actual ability is between 50 and 70.