Diferencia entre revisiones de «Uso de analítica»

De MoodleDocs
(tidy up)
(update as English Docs 3.7)
Línea 31: Línea 31:
* '''Ver intuiciones (predicciones)''' Once you have trained a machine learning algorithm with the data available on the system, you will see insights (predictions) here for each "analysable." In the included model "[[Estudiantes en riesgo de abandonar]], insights may be selected per course. ''Predictions are not limited to ongoing courses-- this depends on the model.''
* '''Ver intuiciones (predicciones)''' Once you have trained a machine learning algorithm with the data available on the system, you will see insights (predictions) here for each "analysable." In the included model "[[Estudiantes en riesgo de abandonar]], insights may be selected per course. ''Predictions are not limited to ongoing courses-- this depends on the model.''


* '''Evaluar''' This is normally done in the background as a series of scheduled tasks, but you can trigger the start of the process with this menu. Evaluate the prediction model by getting all the training data available on the site, calculating all the indicators and the target and passing the resulting dataset to machine learning backends. This process will split the dataset into training data and testing data and calculate its accuracy. Note that the evaluation process uses all information available on the site, even if it is very old. Because of this, the accuracy returned by the evaluation process may be lower than the real model accuracy as indicators are more reliably calculated immediately after training data is available because the site state changes over time. The metric used to describe accuracy is the ''[https://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthews correlation coefficient]'' (a metric used in machine learning for evaluating binary classifications)
* '''Evaluar''' This is a resource-intensive process, and will not be visible when sites have the "onlycli" setting checked (default). See [https://docs.moodle.org/en/Using_analytics#Triggering_model_evaluation Evaluating models] for more information.
 
You can also force the model evaluation process to run from the command line:
 
$ admin/tool/analytics/cli/evaluate_model.php
 
[[File:model-evaluation.jpeg|thumb]]


* '''Bitácora''' View previous evaluation logs, including the model accuracy as well as other technical information generated by the machine learning backends like ROC curves, learning curve graphs, the tensorboard log dir or the model's Matthews correlation coefficient. The information available will depend on the machine learning backend in use. [[File:log_info.png|thumb]]
* '''Bitácora''' View previous evaluation logs, including the model accuracy as well as other technical information generated by the machine learning backends like ROC curves, learning curve graphs, the tensorboard log dir or the model's Matthews correlation coefficient. The information available will depend on the machine learning backend in use. [[File:log_info.png|thumb]]
Línea 43: Línea 37:
* '''Editar''' You can edit the models by modifying the list of indicators or the time-splitting method. All previous predictions will be deleted when a model is modified. Models based on assumptions (static models) can not be edited.
* '''Editar''' You can edit the models by modifying the list of indicators or the time-splitting method. All previous predictions will be deleted when a model is modified. Models based on assumptions (static models) can not be edited.


* '''Habilitar / Deshabilitar''' The scheduled task that trains machine learning algorithms with the new data available on the system and gets predictions for ongoing courses skips disabled models. Previous predictions generated by disabled models are not available until the model is enabled again.
* '''Habilitar / Deshabilitar''' This allows the model to run training and prediction processes. The scheduled task that trains machine learning algorithms with the new data available on the system and gets predictions for ongoing courses skips disabled models. Previous predictions generated by disabled models are not available until the model is enabled again.


{{Nuevas características de Moodle 3.7}}
{{Nuevas características de Moodle 3.7}}
Línea 141: Línea 135:
The training set is defined in the php code for the Target. Models can only be trained if a site contains enough data matching the training criteria. Most models will require Moodle log data for the time period covering the events being analysed. For example, the [[Estudiantes en riesgo de abandonar]] model can only be trained if there is log data covering student activity in the courses that meet the training criteria. It is possible to train a model on an "archive" system and then use the model on a production system.
The training set is defined in the php code for the Target. Models can only be trained if a site contains enough data matching the training criteria. Most models will require Moodle log data for the time period covering the events being analysed. For example, the [[Estudiantes en riesgo de abandonar]] model can only be trained if there is log data covering student activity in the courses that meet the training criteria. It is possible to train a model on an "archive" system and then use the model on a production system.


=== Disparar la evaluación del modelo ===
=== Evaluar modelos ===
 
[[File:model_menu_evaluate.png|thumb]]This is a '''manual''', resource-intensive process, and will not be visible from the Web UI when sites have the "onlycli" setting checked (default). This process can be executed independently of enabling or training the model, and causes Moodle to assemble the training data available on the site, calculate all the indicators and the target and pass the resulting dataset to machine learning backends. This process will split the dataset into training data and testing data and calculate its accuracy. Note that the evaluation process uses all information available on the site, even if it is very old. Because of this, the accuracy returned by the evaluation process may be lower than the real model accuracy as indicators are more reliably calculated immediately after training data is available because the site state changes over time. The metric used to describe accuracy is the ''[https://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthews correlation coefficient]'' (a metric used in machine learning for evaluating binary classifications)


[[File:model_menu_evaluate.png|thumb]]This is normally done in the background as a series of scheduled tasks, but you can trigger the start of the process from the model menu. This causes Moodle to assemble the training data available on the site, calculate all the indicators and the target and pass the resulting dataset to machine learning backends. This process will split the dataset into training data and testing data and calculate its accuracy. Note that the evaluation process uses all information available on the site, even if it is very old. Because of this, the accuracy returned by the evaluation process may be lower than the real model accuracy as indicators are more reliably calculated immediately after training data is available because the site state changes over time. The metric used to describe accuracy is the ''[https://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthews correlation coefficient]'' (a metric used in machine learning for evaluating binary classifications)
''We recommend that all machine-learning models be evaluated before being enabled on a production site.''


You can also force the model evaluation process to run from the command line:
To force the model evaluation process to run from the command line:


  $ admin/tool/analytics/cli/evaluate_model.php
  $ admin/tool/analytics/cli/evaluate_model.php
Use the --help option to see parameters.
On small sites, you can uncheck "onlycli" in the Analytics settings page, and you can then evaluate models from the Analytics models page. However, this is not feasible with production sites.


=== Revisar resultados de evaluación ===
=== Revisar resultados de evaluación ===

Revisión del 14:16 29 ago 2019

Nota: Urgente de Traducir. ¡ Anímese a traducir esta muy importante página !.     ( y otras páginas muy importantes que urge traducir)

Vista general

The Moodle Learning Analytics API is an open system that can become the basis for a very wide variety of models. Models can contain indicators (a.k.a. predictors), targets (the outcome we are trying to predict), insights (the predictions themselves), notifications (messages sent as a result of insights), and actions (offered to recipients of messages, which can become indicators in turn).

learning analytics components.png

Most learning analytics models are not enabled by default. Enabling models for use should be done after considering the institutional goals the models are meant to support. When selecting or creating an analytics model, the following steps are important:

  • What outcome do we want to predict? Or what process do we want to detect? (Positive or negative)
  • How will we detect that outcome/process?
  • What clues do we think might help us predict that outcome/process?
  • What should we do if the outcome/process is very likely? Very unlikely?
  • Who should be notified? What kind of notification should be sent?
  • What opportunities for action should be provided on notification?

Moodle can support multiple prediction models at once, even within the same course. This can be used for A/B testing to compare the performance and accuracy of multiple models.

¡Nueva característica
en Moodle 3.7!
Moodle learning analytics supports two types of models. Machine-learning based models, including predictive models, make use of AI models trained using site history to detect or predict hidden aspects of the learning process. "Estáticos" models use a simpler, rule-based system of detecting circumstances on the Moodle site and notifying selected users. Moodle core ships with three models, Estudiantes en riesgo de abandonar and the static models Actividades próximas pendientes and Sin enseñanza. Additional prediction models can be created by using the Analytics API or by using the new web UI. Cada modelo está basado en la predicción de una única "meta" específica, o resultado (sea deseable o indeseable), basada en en número de indicadores seleccionados.

You can view and manage your system models from Site Administration > Analytics > Analytics models.

manage models.png

Modelos existentes

El núcleo estándar de Moodle contiene tres modelos, Estudiantes en riesgo de abandonar y los modelos estáticos de Actividades próximas pendientes y Sin enseñanza. Otros modelos pueden ser añadidos a su sistema al instalar plugins o al usar la Interfase web del usuario (vea debajo). Los modelos existentes pueden ser examinados y alterados desde la página de "Modelos analíticos" en la Administración del sitio: estas son algunas de las acciones que Usted puede realizar en un modelo existente:

analytics models.png
  • Obtener predicciones Train machine learning algorithms with the new data available on the system and get predictions for ongoing courses. Predictions are not limited to ongoing courses-- this depends on the model.
  • Ver intuiciones (predicciones) Once you have trained a machine learning algorithm with the data available on the system, you will see insights (predictions) here for each "analysable." In the included model "Estudiantes en riesgo de abandonar, insights may be selected per course. Predictions are not limited to ongoing courses-- this depends on the model.
  • Evaluar This is a resource-intensive process, and will not be visible when sites have the "onlycli" setting checked (default). See Evaluating models for more information.
  • Bitácora View previous evaluation logs, including the model accuracy as well as other technical information generated by the machine learning backends like ROC curves, learning curve graphs, the tensorboard log dir or the model's Matthews correlation coefficient. The information available will depend on the machine learning backend in use.
    log info.png
  • Editar You can edit the models by modifying the list of indicators or the time-splitting method. All previous predictions will be deleted when a model is modified. Models based on assumptions (static models) can not be edited.
  • Habilitar / Deshabilitar This allows the model to run training and prediction processes. The scheduled task that trains machine learning algorithms with the new data available on the system and gets predictions for ongoing courses skips disabled models. Previous predictions generated by disabled models are not available until the model is enabled again.

¡Nueva característica
en Moodle 3.7!

  • Exportar Export your site training data to share it with your partner institutions or to use it on a new site. The Export action for models allows you to generate a csv file containing model data about indicators and weights, without exposing any of your site-specific data. We will be asking for submissions of these model files to help evaluate the value of models on different kinds of sites. Please see the Learning Analytics community for more information.
  • Elementos inválidos del sitio Reporta cuales elementos en su sitio no pueden ser analizados por este modelo
  • Borrar predicciones Borra todos los datos de entrenamiento y predicciones del modelo

Modelos básicos

Estudiantes en riesgo de abandonar

Este modelo predice estudiantes que están en riesgo de no completar (abandonar) un curso Moodle, basado en bajo compromiso del estudiante. En este modelo, la definición de "abandonar" es "sin actividad del estudiante en el cuarto final del curso". El modelo de predicción usa el modelo de compromiso estudiantil de la Community of Inquiry (Comunidad de Investigación) , consistente de tres partes:

El modelo de predicción puede analizar y sacar conclusiones de una amplia variedad de cursos, y aplicar estas conclusiones para hacer predicciones acerca de nuevos cursos. El modelo no está limitado a hacer predicciones acerca del éxito del estudiante en duplicados exactos de cursos ofrecidos en elpasado. Sin embargo, hay algunas limitaciones:

  1. Este modelo requiere una cierta cantidad de datos en Moodle con los cuales hacer predicciones. Actualmente, solamente las actividades del núcleo estándar de Moodle están incluids en el conjunto de indicadores (vea debajo). Los cursos que no incluyan varias actividades del núcleo de Moodle por “rebanada de tiempo” (dependiendo del método para división del tiempo) tendrá un pobre soporte predictivo en este modelo. Este modelo de predicción será más efectivo con cursos completamente en línea o “híbrido” o "mezclado (blended)” concomponentes en línea sustanciales.
  2. Este modelo de predicción asume que los cursos tienen fechas fijas de inicio y terminación, y no está diseñado para ser usado con cursos con inscripción contínua permanente. Los modelos que soportan un rango más amplio de tipos de curso que serán incluidos en versiones futuras de Moodle. A causa de las premisas de diseño de este modelo, es muy importante el configurar apropiadamente las fechas de inicio y terminación para cada curso a usar este modelo. Si en ambos, tanto los cursos pasados como los cursos en marcha, no estuvieran bien configuradas las fechas de inicio y término, las predicciones no pueden ser exactas. Debido a que el campo de la fecha del fin del curso fue introducido solamente en Moodle 3.2 y algunos cursos podrían no tener configurada una fecha de inicio en el pasado, nosotros incluimos un script de interfase por línea de comando.:
$ admin/tool/analytics/cli/guess_course_start_and_end.php 

Este script intenta estimar las fechas de inicio y témino pasadas al revisar las bitácoras de actividad de estudiantes y las inscripciones de estudiantes. Después de correr este script, por favor revise que los resultados estimados de las fechas de inicio y terminación del script sean razonablemente correctos.

Actividades próximas pendientes

¡Nueva característica
en Moodle 3.7!

El modelo estático “actividades próximas pendientes” revisa si existen actividades futuras cercanas con fechas esperadas y produce resultados en la página del calendario del usuario.

Sin enseñanza

Las intuiciones (predicciones) de este modelo informarán a los mánagers del sitio acerca de cursos con una fecha de inicio próxima que no tienen actividad de enseñanza. Este es un modelo "estático" simple y no usa backend de aprendizaje de máquina para regresar predicciones. Basa las predicciones en asunciones como por ejemplo que no hay enseñanza si no hay estudiantes.

Crear y editar modelos

¡Nueva característica
en Moodle 3.7!

Pueden crearse nuevos modelos de aprendizaje de máquina usando la API de Analítica, al importar un modelo exportado desde otro sitio, o al usar la nueva Interfase del Usuario web en Moodle 3.7.

new model.png

(Nota: los modelos "estáticos" no pueden ser creados usando la Interfase del Usuario web en la actualidad.)

Hay cuatro componentes de un modelo que pueden ser definidos mediante la Interfase del Usuario web:

Meta

create model 2.png

Las metas representan un “bueno conocido”-- algo acerca del cual nosotros tenemos muy fuerte evidencia de su valor. Las metas deben ser diseñadas cuidadosamente para alinearlas con las prioridades curriculares de la institución. Cada modleo tiene una sola meta. El “Analizador” (contexto en el cual las metas serán evaluadas) es controlado automáticamente por la selección de la meta.

Indicadores

Indicators are data points that may help to predict targets. We are free to add many indicators to a model to find out if they predict a target-- the only limit is that the data must be available within Moodle and must have a connection to the context of the model (e.g. the user, the course, etc.). The machine learning “training” process will determine how much weight to give to each indicator in the model.

We do want to make sure any indicators we include in a production model have a clear purpose and can be interpreted by participants, especially if they are used to make prescriptive or diagnostic decisions.

Indicators are constructed from data, but the data points need to be processed to make consistent, reusable indicators. In many cases, events are counted or combined in some way, though other ways of defining indicators are possible and will be discussed later. How the data points are processed involves important assumptions that affect the indicators. In particular, indicators can be absolute, meaning that the value of the indicator stays the same no matter what other samples are in the context, or relative, meaning that the indicator compares the sample to others in the context.

indicators.png

Intervalos de análisis

analysis intervals.png

Analysis intervals control how often the model will run to generate insights, and how much information will be included in each generation cycle. The two analysis intervals enabled by default for selection are “Quarters” and “Quarters accumulative.” Both options will cause models to execute four times-- after the first, second, third and fourth quarters of the course (the final execution is used to evaluate the accuracy of the predictions against the actual outcome). The difference lies in how much information will be included. “Quarters” will only include information from the most recent quarter of the course in its predictions. “Quarters accumulative” will include the most recent quarter and all previous quarters, and tends to generate more accurate predictions (though it can take more time and memory to execute). Moodle Learning Analytics also includes “Tenths” and “Tenths accumulative” options in Core, if you choose to enable them from the Analytics Settings panel. These generate predictions more frequently.

  • "'Rango único'" indicates that predictions will be made once, but will take into account a range of time, e.g. one prediction at the end of a course. The prediction is made at the end of the range.
  • "'Próximo...'" indicates that the model generates an insight based on a snapshot of data at a given moment, e.g. the "no teaching" model looks to see if there are currently any teachers or students assigned to a course one week before the start of the term, and issues one insight warning the site administrator that no teaching is likely to occur in that empty course.
  • Todos los anteriores... (antiguamente llamado "acumulativo") and Last... methods differ in how much data is included in the prediction. Both "All previous quarters" and "Last quarter" predictions are made at the end of each quarter of a time span (e.g. a course), but in "Last quarter," only the information from the most recent quarter is included in the prediction, whereas in "All previous quarters" all information up to the present is included in the prediction.

Rango único y Sin división del tiempo methods do not have time constraints. They run during the next scheduled task execution, although models apply different restrictions (e.g. require that a course is finished to use it for training or some data in the course and students to use it to get predictions...). 'Single range' and 'No splitting' are not appropriate for students at risk of dropping out of courses. They are intended to be used in models like 'No teaching' or 'Spammer user' that are designed to make only one prediction per eligible sample. To explain this with an example: 'No teaching' model uses 'Single range' analysis interval; the target class (the main PHP class of a model) only accepts courses that will start during the next week. Once we provide a 'No teaching' insight for a course we won't provide any further 'No teaching' insights for that course.

The difference between 'Single range' and 'No splitting' is that models analysed using 'Single range' will be limited to the analysable elements (the course in students at risk model) start and end dates, while 'No splitting' do not have any time contraints and all data available in the system is used to calculate the indicators.

Note: Although the examples above refer to courses, analysis intervals can be used on any analysable element. For example, enrolments can have start and end dates, so an analysis interval could be applied to generate predictions about aspects of an enrollment. For analysable elements with no start and end dates, different analysis intervals would be needed. For example, a "weekly" analysis interval could be applied to a model intended to predict whether a user is likely to log in to the system in the future, on the basis of activity in the previous week.

Procesador de predicciones

This setting controls which machine learning backend and algorithm will be used to estimate the model. Moodle currently supports two predictions processors:

  • PHP machine learning backend - implements logistic regression using php-ml (contributed by Moodle)
  • Python machine learning backend - implements single hidden layer feed-forward neural network using TensorFlow.

You can only choose from the predictions processors enabled on your site.

regression.png
ffnn.png

Each prediction processor may support multiple algorithms in the future.

Entrenamiento de modelos

Machine-learning based models require a training process using previous data from the site. "Static" models make use of sets of pre-defined rules, and do not need to be trained.

There are two main categories of machine-learning based analytics models: supervised and unsupervised.

  • Supervised models must be trained by using a data set with the target values already identified. For example, if the model will predict course completion, the model must be trained on a set of courses and enrollments with known completion status.
  • Unsupervised models look for patterns in existing data, e.g. grouping students based on similarities in their behavior in courses.

At the present time, Moodle Learning Analytics only supports supervised models.

While we hope to include pre-trained models with the Moodle core installation in the future, at the current time we do not have large enough data sets to train a model for external use. (If you would like to help contribute data for this effort, please see the Moodle Learning Analytics Working Group.)

Datos de entrenamiento

The model code includes criteria for "training" and "prediction" data sets. For example, only courses with enrolled students and an end date in the past can be used to train the Estudiantes en riesgo de abandonar model, because it is impossible to determine whether a student dropped out until a course has ended. On the other hand, for this model to make predictions, there must be a course with students enrolled that has started, but not yet ended.

The training set is defined in the php code for the Target. Models can only be trained if a site contains enough data matching the training criteria. Most models will require Moodle log data for the time period covering the events being analysed. For example, the Estudiantes en riesgo de abandonar model can only be trained if there is log data covering student activity in the courses that meet the training criteria. It is possible to train a model on an "archive" system and then use the model on a production system.

Evaluar modelos

model menu evaluate.png

This is a manual, resource-intensive process, and will not be visible from the Web UI when sites have the "onlycli" setting checked (default). This process can be executed independently of enabling or training the model, and causes Moodle to assemble the training data available on the site, calculate all the indicators and the target and pass the resulting dataset to machine learning backends. This process will split the dataset into training data and testing data and calculate its accuracy. Note that the evaluation process uses all information available on the site, even if it is very old. Because of this, the accuracy returned by the evaluation process may be lower than the real model accuracy as indicators are more reliably calculated immediately after training data is available because the site state changes over time. The metric used to describe accuracy is the Matthews correlation coefficient (a metric used in machine learning for evaluating binary classifications)

We recommend that all machine-learning models be evaluated before being enabled on a production site.

To force the model evaluation process to run from the command line:

$ admin/tool/analytics/cli/evaluate_model.php

Use the --help option to see parameters.


On small sites, you can uncheck "onlycli" in the Analytics settings page, and you can then evaluate models from the Analytics models page. However, this is not feasible with production sites.

Revisar resultados de evaluación

model menu log.png

You can review the results of the model training process by accessing the evaluation log.

model log.png
log info.png

Check for warnings about evaluation completion, model accuracy, and model variability.

invalid site elements.png

You can also check the invalid site elements list to verify which site elements were included or excluded in the analysis. If you see a large number of unexpected elements in this report, it may mean that you need to check your data. For example, if courses don't have appropriate start and end dates set, or enrolment data has been purged, the system may not be able to include data from those courses in the model training process.


Exportar e importar modelos

Models can also be exported from one site and imported to another.

export menu.png

Exporting models

You can export the data used to train the model, or the model configuration and the weights of the trained model.

export dialogue.png

Note: the model weights are completely anonymous, containing no personally identifiable data! This means it is safe to share them with researchers without worrying about privacy regulations.

Importar modelos

When a model is imported with weights, the site administrator has the option to evaluate the trained model using site data, or to evaluate the model configuration by re-training it using the current site data.

import model.png
evaluate model.png