Item analysis theoretical background: Difference between revisions

Revision as of 16:01, 30 November 2007

I am trying to understand what the state of the art is with Item analysis. I am struggling to find my way into the literature, but have started this page to record my progress. So far the following references were found after several hours in the OU library around classmark 371.26.--Tim Hunt 05:58, 30 November 2007 (CST)

References I have got old of

J J Barnard, Item Analysis in Test Construction pp. 195-206, in Geofferey N Masteres & John P Keeves 1999 Advances in Measurement in Educational Research and Assessment, Pergamon.

Mentions two topics "Classical Test Theory" and "Item Response Theory".
"Item analysis is not a substitute for the originality, effort and skill of the item writer and relatively poor statistical results can be overruled on logical grounds."
"The two most basic statistics computed and examined during item analysis are the items' difficulty and values."
The difficulty if basically the average score for the item. The higher the average score, the easier it is.
When computing this average, you have to decide whether to ignore students who did not submit an answer, or to include them as zero score. And you have to consider that in a timed test, questions near the end are more likely to be missed.
For discrimination, there are different techniques for questions with and without partial scores.
For questions that are scored 0/1 (dichotomously scored) point biserial correlation is most commonly used.
You should allow for the fact that the score for this item is included in the score for the whole test. However, for tests with many questions, the correction is small.
Item reliability index: (Gulliksen's product) r_it S_i.
The above is Classical Test Theory.
Item Response Theory is based on more computer-intensive techniques, involving fitting models to the data (maximum likelihood estimation).
"It can be concluded that CTT and IRT should be be viewed as rival theoretical frameworks. A duet, rather than a dual bewteen CTT and IRT will provide most information to the test developer. The results obtained from a CTT based item anaysis can yiedl useful information in finding flaws in items and guiding the test developer towards choosing an appropriate IRT model. The advantages that IRT parameters offer should subsequently be used for constructiong tests for specific purposes, ..."

R L Ebel 1972, Essentials of Educational Measurement, Prentice Hall.

There was a new edition of this book by Ebel And Frisbie in 1991.

William A Mehrens & Irvin J Lehmann 1973, Measurement and Evaluation in Education and Psychology, Holt Rinehart and Winston Inc.

R L Thorndike 1971, Educational Measurement, American Council on Education.

References to try to get

These last two above have probably both been superseded by:

R L Thorndike 2004, Measurement and Evaluation in Psychology and Education (Seventh edition), Prentice Hall.

This looks like it might be worth getting (previous edition cited by J J Barnard):

L Crocker & J Algina 2006, http://www.amazon.com/Introduction-Classical-Modern-Test-Theory/dp/0495395919/ Introduction to Classical and Modern Test Theory], Wadsworth Pub Co.

@@ Line 7: / Line 7: @@
 *Mentions two topics "Classical Test Theory" and "Item Response Theory".
 *"Item analysis is not a substitute for the originality, effort and skill of the item writer and relatively poor statistical results can be overruled on logical grounds."
-* "The two most basic statistics computed and examined during item analysis are the items' difficulty and discrimination values."
+* "The two most basic statistics computed and examined during item analysis are the items' difficulty and values."
 * The difficulty if basically the average score for the item. The higher the average score, the easier it is.
 * When computing this average, you have to decide whether to ignore students who did not submit an answer, or to include them as zero score. And you have to consider that in a timed test, questions near the end are more likely to be missed.
+* For discrimination, there are different techniques for questions with and without partial scores.
+* For questions that are scored 0/1 (dichotomously scored) point biserial correlation is most commonly used.
+* You should allow for the fact that the score for this item is included in the score for the whole test. However, for tests with many questions, the correction is small.
+* Item reliability index: (Gulliksen's product) r<sub>it</sub> S<sub>i</sub>.
+* The above is Classical Test Theory.
+* Item Response Theory is based on more computer-intensive techniques, involving fitting models to the data (maximum likelihood estimation).
+* "It can be concluded that CTT and IRT should be be viewed as rival theoretical frameworks. A duet, rather than a dual bewteen CTT and IRT will provide most information to the test developer. The results obtained from a CTT based item anaysis can yiedl useful information in finding flaws in items and guiding the test developer towards choosing an appropriate IRT model. The advantages that IRT parameters offer should subsequently be used for constructiong tests for specific purposes, ..."
 R L Ebel 1972, ''Essentials of Educational Measurement'', Prentice Hall.

Documentation

Item analysis theoretical background: Difference between revisions

Revision as of 16:01, 30 November 2007

References I have got old of

References to try to get

See also