Fundamentos teóricos del análisis de ítems

Saltar a: navegación, buscar

Nota: Pendiente de Traducir. ¡Anímese a traducir esta página!.     ( y otras páginas pendientes)

I am trying to understand what the state of the art is with Item analysis. I am struggling to find my way into the literature, but have started this page to record my progress. So far the following references were found after several hours in the OU library around classmark 371.26.--Tim Hunt 05:58, 30 November 2007 (CST)

Referencias que yo tengo

J J Barnard, Item Analysis in Test Construction pp. 195-206, in Geofferey N Masteres & John P Keeves 1999 Advances in Measurement in Educational Research and Assessment, Pergamon.

  • Mentions two topics "Classical Test Theory" and "Item Response Theory".
  • "Item analysis is not a substitute for the originality, effort and skill of the item writer and relatively poor statistical results can be overruled on logical grounds."
  • "The two most basic statistics computed and examined during item analysis are the items' difficulty and values."
  • The difficulty if basically the average score for the item. The higher the average score, the easier it is.
  • When computing this average, you have to decide whether to ignore students who did not submit an answer, or to include them as zero score. And you have to consider that in a timed test, questions near the end are more likely to be missed.
  • For discrimination, there are different techniques for questions with and without partial scores.
  • For questions that are scored 0/1 (dichotomously scored) point biserial correlation is most commonly used.
  • You should allow for the fact that the score for this item is included in the score for the whole test. However, for tests with many questions, the correction is small.
  • Item reliability index: (Gulliksen's product) rit Si.
  • The above is Classical Test Theory.
  • Item Response Theory is based on more computer-intensive techniques, involving fitting models to the data (maximum likelihood estimation).
  • "It can be concluded that CTT and IRT should be be viewed as rival theoretical frameworks. A duet, rather than a dual bewteen CTT and IRT will provide most information to the test developer. The results obtained from a CTT based item anaysis can yiedl useful information in finding flaws in items and guiding the test developer towards choosing an appropriate IRT model. The advantages that IRT parameters offer should subsequently be used for constructiong tests for specific purposes, ..."

R L Ebel 1972, Essentials of Educational Measurement, Prentice Hall.

William A Mehrens & Irvin J Lehmann 1973, Measurement and Evaluation in Education and Psychology, Holt Rinehart and Winston Inc.

R L Thorndike 1971, Educational Measurement, American Council on Education.

  • Repeats the point about items at the end of a timed test being omitted by a lot of students leading to skewed statistics.

Referencias para intentar conseguir

These last two above have probably both been superseded by:

R L Thorndike 2004, Measurement and Evaluation in Psychology and Education (Seventh edition), Prentice Hall.

This looks like it might be worth getting (previous edition cited by J J Barnard):

L Crocker & J Algina 2006, Introduction to Classical and Modern Test Theory, Wadsworth Pub Co.

Otros puntos

  • Another book mentioned that sometimes you want to, for example, analyse test data by group (e.g. male/female) to look for possible discrimination.
  • There is the idea that you can look at the reliability of a test by randomly splitting the class in half, and comparing the statistics for the two halves.
  • What you really want to do is compare item scores to the property you are trying to measure in the test (student's mathematical ability), as opposed to their score on the test as a whole. However, you don't have any measure of the property you are really interested in - the overall test score is the best (only) estimate you have of that.
  • The age of the references I have read so far means that they cannot assume the processing power of modern computers. Therefore, the procedures they describe are unnecessarily simplified.


¿Qué pasa con los intentos repetidos de resolver un examen particular por el mismo estudiante? ¿Qué hace esto con el análisis?

¿Qué pasa con el modo adaptativo?


Probablemente sea suficiente con que Moodle le ofrezca a los profesores una forma facil de análisis de ítems. Esto obviamente atrapará los ítems de evaluación defectuosos.

Nosotros probablemente no deberíamos de tratar de implementar esquemas muy complicados para el análisis de ítems. Están propensos a ser mal-utilizados, lo que es más una desventaja que el poder extra que proporcionarían al usarse correctamente.

Vea también