Quiz statistics calculations: Difference between revisions
No edit summary |
|||
Line 57: | Line 57: | ||
$$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states. | $$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states. | ||
We can also think of the student's core on a particular item $$x_i(s)$$, so $$x_{i(p, s)}(s) = x_p(s)$$. | |||
Each student has a total score | Each student has a total score | ||
Line 70: | Line 72: | ||
$$T_min = \sum_{p \in P} x_p(min)$$ | $$T_min = \sum_{p \in P} x_p(min)$$ | ||
===Intermediate calculations=== | |||
To simplify the form of the formulas below, we need some intermediate calculated quantities, derived from the ones above. | |||
Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$. | Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$. | ||
Student's rest of test score for a particular item: $$X_i(s) = T_s - x_i(s)$$. | |||
For any quantity that depends on position (for example $$x_p$$ or $$X_p$$), it's average is denoted with an overbar, and is an average over all students, so | |||
$$\bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)$$. | |||
When a quantity is a property of an item, a bar denotes an average over all students who attempted that item, so | |||
$$\bar{x}_i = \frac{1}{S_i} \sum_{s \in S_i} x_i(s)$$. | |||
Similarly we have the variance of a quantity depending on position: | |||
$$V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2$$ | |||
and for a quantity depending on items: | |||
$$V(x_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)^2$$. | |||
Finally, we need co-variances of two quantites, for example: | |||
$$C(x_p, X_p} = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$ | |||
$$C(x_i, X_i} = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)(X_i(s) - \bar{X}_i)$$. | |||
==Position statistics== | ==Position statistics== | ||
=== | ===Facility index=== | ||
This is the average score on the item, expressed as a percentage: | |||
$$F_p = 100\frac{\bar{x}_p - x_p(min)}{x_p(max) - x_p(min)}$$. | |||
The higher the facility index, the easier the question is (for this cohort of students). | |||
===Standard deviation=== | ===Standard deviation=== | ||
Again expressed on a percentage scale: | |||
$$SD_p = 100\frac{\sqrt{V(x_p)}}{x_p(max) - x_p(min)}$$. | |||
===Discrimination index=== | ===Discrimination index=== | ||
===Discriminative efficiency=== | ===Discriminative efficiency=== | ||
===Intended question weight=== | |||
===Effective question weight=== | ===Effective question weight=== |
Revision as of 14:33, 14 February 2008
General issues
Quizzes that allow multiple attempts
For quizzes that allow multiple attempts, by default the report should only include data from the first attempt by each student. (The data for subsequent attempts probably does not satisfy the assumptions that underlie item analysis statistics.) However, there should be an option 'Include data from all attempts', which should have a disclaimer that this may be statistically invalid either near it on screen, or possibly in the help file. (For small data sets, it may be better to include all data.)
Using the first attempt also avoids problems caused by each attempt builds on last.
Within the analysis, when multiple attempts per student are included, each attempt is treated as an independent attempt. (That is, we pretend each attempt was by a different student.)
Adaptive mode
Adaptive mode does not pose a problem. Item Analysis just supposes that each item in the test returns a score, and these scores are added up to get the test score. Therefore Item Analysis does not care about adaptive/non-adaptive mode. However, just to be clear, for an adaptive question, the score used in the calculation is the final score for the item, including penalties.
Certainty based marking
Similarly, should CBM, and/or negative scoring for multiple choice questions be implemented, we just use the final item score in the calculations, making sure that the formulae are not assuming that the minimum item score is zero.
Incomplete attempts
There is an issue about what you do when not all students have answered all questions. Depending on how you handle these missing items, you distort the statistics in different ways.
There are basically two reasons why a student may not have answered a particular question:
- they may have chosen to omit it, or
- they may have run out of time, if the test is timed. In this case, omitted questions tend to be towards the end of the test.
Available approaches for handling this are:
- treat omitted items as having a score of 0.
- exclude any attempt with missing scores from the analysis.
- Analyse each question separately - and when analysing a particular question, just include students who answered that question in the analysis.
I think we should implement 1 for now. This is how Moodle currently works - a blank answer is marked and receives a score of zero, and so is indistinguishable from a wrong answer. Item analysis is most important for summative tests, and the result of that question being in the test is that it gave that student a contribution of 0 marks towards their final score.
Random questions
In a quiz with random questions not all students will have attempted the same set of questions. To account for this, the analysis below distinguishes between positions in the test, and test items.
Notation used in the calculations
Note: The only way I can think write this document with all the maths is to type TeX notation like $$x_i$$ even though MoodleDocs does not currently render it nicely. Maybe one day it will. In the mean time, if you want to see things nicely formatted, you will have to copy them to a Moodle with a working TeX filter, or copy and paste the following magic into your browser's Address bar and hit enter:
javascript:(function(){function searchWithinNode(node,re){var pos,imgnnode,middlebit,endbitskip=0;if(node.nodeType==3){pos=node.data.search(re);if(pos>=0){middlebit=node.splitText(pos);endbit=middlebit.splitText(RegExp.lastMatch.length);imgnode=document.createElement(%22img%22);imgnode.src=%22http://www.mathtran.org/cgi-bin/mathtran?tex=%22 + encodeURI(middlebit.data);middlebit.parentNode.replaceChild(imgnode,middlebit);}}else if(node.nodeType==1&& node.childNodes){for (var child=0; child < node.childNodes.length; ++child){searchWithinNode(node.childNodes[child], re);}}}searchWithinNode(document.body, /\$\$(.*?)\$\$/);})();
We have a lot of students $$s \in S$$.
The test has a number of positions $$p \in P$$.
The test is assembled from a number of items $$i \in I$$.
Because of random questions, different students may have recieved different items in different positions, so $$i(p, s)$$ is the item student $$i$$ received in position $$p$$.
Let $$I_s$$ be the set of items that student $$s$$ saw. Let $$S_i$$ be the set of students who attempted item $$i$$.
Each position has a maximum and minimum possible contribution to the test score, $$x_p(min)$$ and $$x_p(max)$$. At the moment in Moodle, $$x_p(min)$$ is always zero, but we cannot assume that will continue to be the case. $$x_p(max)$$ is database column quiz_question_instances.grade.
Then, each student achieved an actual score $$x_p(s)$$ on the item in position $$p$$. So $$x_p(min) \le x_p(s) \le x_p(max)$$.
$$x_p(s)$$ should be measured on the same scale as the final score for the quiz. That is, scaled by quiz_question_instances.grade, but that is already how grades are stored in mdl_question_states.
We can also think of the student's core on a particular item $$x_i(s)$$, so $$x_{i(p, s)}(s) = x_p(s)$$.
Each student has a total score
$$T_s = \sum_{p \in P} x_p(s)$$
Similarly, there are the maximum and minimum possible test scores
$$T_max = \sum_{p \in P} x_p(max)$$
and
$$T_min = \sum_{p \in P} x_p(min)$$
Intermediate calculations
To simplify the form of the formulas below, we need some intermediate calculated quantities, derived from the ones above.
Student's rest of test score for a position: $$X_p(s) = T_s - x_p(s)$$.
Student's rest of test score for a particular item: $$X_i(s) = T_s - x_i(s)$$.
For any quantity that depends on position (for example $$x_p$$ or $$X_p$$), it's average is denoted with an overbar, and is an average over all students, so
$$\bar{x}_p = \frac{1}{S} \sum_{s \in S} x_p(s)$$.
When a quantity is a property of an item, a bar denotes an average over all students who attempted that item, so
$$\bar{x}_i = \frac{1}{S_i} \sum_{s \in S_i} x_i(s)$$.
Similarly we have the variance of a quantity depending on position:
$$V(x_p) = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)^2$$
and for a quantity depending on items:
$$V(x_i) = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)^2$$.
Finally, we need co-variances of two quantites, for example:
$$C(x_p, X_p} = \frac{1}{S - 1} \sum_{s \in S} (x_p(s) - \bar{x}_p)(X_p(s) - \bar{X}_p)$$
$$C(x_i, X_i} = \frac{1}{S_i - 1} \sum_{s \in S_i} (x_i(s) - \bar{x}_i)(X_i(s) - \bar{X}_i)$$.
Position statistics
Facility index
This is the average score on the item, expressed as a percentage:
$$F_p = 100\frac{\bar{x}_p - x_p(min)}{x_p(max) - x_p(min)}$$.
The higher the facility index, the easier the question is (for this cohort of students).
Standard deviation
Again expressed on a percentage scale:
$$SD_p = 100\frac{\sqrt{V(x_p)}}{x_p(max) - x_p(min)}$$.
Discrimination index
Discriminative efficiency
Intended question weight
Effective question weight
Item statistics
Random guess score
Facility index
Standard deviation
Discrimination index
Discriminative efficiency
Test statistics
Number of Attempts
This is $$S$$.
Note that depending on the options, we may be counting one or all attempts per student.
Mean Score
$$Test mean = \bar{T} = \frac{1}{S} \sum_{s \in S} T_s$$
Median Score
Sort all the $$T_s$$, and take the middle one if S is odd, or the average of the two middle ones if S is even.
Standard Deviation
$$Test standard deviation = SD = \sqrt{V(t)} = \sqrt{\frac{1}{S - 1} \sum_{s \in S} (T_s - \bar{T})^2}$$.
Skewness and Kurtosis
Probably of limited interest, but included for completeness. Skewness is a measure of the asymmetry in a distribution. Kurtosis - imagine a normal distribution. Kurtosis tells you if your distribution has more of a bulge, but thinner tails, or vice-versa.
First calculate:
$$m_2 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^2$$
$$m_3 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^3$$
$$m_4 = \frac{1}{S} \sum_{s \in S} (T_s - \bar{T})^4$$
Then compute
$$k_2 = \frac{S}{S - 1} m_2 = V(T)$$
$$k_3 = \frac{S^2}{(S - 1)(S - 2)} m_3$$
$$k_4 = \frac{S^3}{(S - 1)(S - 2)(S - 3)} ((S + 1)m_4 - 3(S - 1)m_2^2)$$
Then
$$Skewness = \frac{k_3}{k_2^{2/3}}$$
$$Kurtosis = \frac{k_4}{k_2^2}$$
Coefficient of Internal Consistency
This is on a percentage scale
$$CIC = 100 \frac{P}{P - 1} (1 - \frac{1}{V(T)}\sum_{p \in P} V(x_p) )$$
Error Ratio
Also a percentage.
$$ER = 100 \sqrt{1 - \frac{CIC}{100})$$
Standard Error
$$SE = \frac{ER}{100} * SD$$
These last three are to do with estimating how reliable the test scores are. If you take the view that the score the student got on the test on the day is a combination of their actual ability and a random error (how lucky they were on the day of the test), the the standard error is an estimate of the luck factor. So if SE ~= 10, and the student scored 60, then you can be quite confident that their actual ability is between 50 and 70.