Project Inspire

Project Inspire
Project state	Development in progress
Tracker issue	MDL-57791
Discussion	Analytics and reporting
Assignee	Elizabeth Dalton David Monllaó

Welcome to Project Inspire!

Moodle Project Inspire is intended to identify and validate indicators of student, teacher, and institutional engagement in educational activities for the purpose of developing learning analytics software features with the following functions:

Description of learning engagement and progress,
Diagnosis of learning engagement and progress,
Prediction of learning progress, and
Prescription (recommendations) for improvement of learning progress.

Project Inspire will provide learning analytics tools within Moodle Core. These analytics will be based on inputs that will be extracted and validated using data from as many participants from the Moodle community as possible.

Project Inspire was first discussed at MoodleMoot Australia 2016:

<mediaplayer>https://www.youtube.com/watch?v=MHv4vp1hxQc</mediaplayer>

Background

Software packages are now available to support learning analytics within a number of Learning Management Systems, both commercial and open source, including Moodle. However, most of the existing tools, whether for Moodle or other Learning Management Systems, suffer from one or more key limitations:

They are very general ”descriptive” analytics systems, requiring considerable skill to interpret
They make predictions based on very primitive indicators, such as logins or clicks, and/or
They rely on proprietary ”black box” algorithms that cannot be examined or validated by institutions.

Moodle HQ is introducing Project Inspire to overcome these limitations with a next-generation learning analytics system that will go beyond descriptive analytics to provide powerful predictive analytics (in Phase I) and diagnostic and prescriptive analytics (in future phases).

Timeline

Project Inspire will be released over multiple phases:

Phase I (Predictive Analytics for Course Completion) with Moodle 3.3

...

How to Participate

In order to deliver a major qualitative improvement in these analytics tools, we will need to collect sample data from a wide variety of Moodle-using institutions. We have prepared a set of tools to help institutions participate:

A Client Data Sharing Agreement, which specifies what data will be collected, how individual personally identifying information (PII) will be protected, and how the data will be used
An Institution Site Survey, a short questionnaire designed to gather key institution-specific goals and practices related to learning analytics
A new local plugin, ”Anonymise”, which is used on a *copy*of a Moodle database to de-identify both individual and institutional information before sharing with Moodle PTY LTD. (See MDLSITE-4902 for more information.)

Institution Configuration Variations

During our initial data collection process, we are asking institutions to provide information on how they use Moodle via a Site Survey. The purpose of this survey is to determine which configuration parameters may need to be available to site administrators or course creators to tailor analytics output for the highest accuracy and applicability to the needs of the institution.

Goals for Learning Analytics

Initial predictions in Project Inspire will be based on identifying students at risk of not completing a course successfully. However, we recognize that there are many ways in which an institution might define "successful completion," e.g. a passing final grade, attainment of defined competencies, or course completion as defined in Moodle. In the long term, we also know that there are other potential uses of analytics, including:

Determine “best practices” for successful students
Identify and support teachers not meeting expectations
Determine “best practices” of successful teachers
Identify and prompt enhancement of courses not meeting Instructional Design expectations
Determine “best practices” of most successful course designs

We ask about these goals in our Site Survey.

Type of Institution

We also understand that different types of institutions may intrinsically require different analytics indicators and calculations. For this reason, we are collecting and analyzing data on the following institution types:

K-12
1. K-Primary (to age 11)
2. Middle School (ages 12-14)
3. High school (ages 15-18)
Higher Education
1. Community College (first 2 years post-secondary)
2. Bachelor’s degree (first 4 years post-secondary)
3. Graduate (Masters, Ph.D, etc.)
Corporate Training
1. For a certification
2. Regulatory requirements (e.g. Title IX)
3. non-certification/regulatory
Community Training (e.g. for NGO)

Role Tracking

Because Moodle supports custom defined roles per site, we will need each site to identify which roles should be considered "students" and "teachers" for analytics purposes. We will also be mindful that there can be other significant roles in the learning analytics environment, e.g. mentors, parents, managers, etc.

Use of Course and Activity Completion Criteria

Course and Activity Completion Criteria are supported in Moodle for all courses and activities. All Activities provide a manual completion option, as well as one or more automated options. The most basic automated Activity completion criterion is whether the activity is viewed by the learner. This criterion is provided for all Activity plugins. Other completion criteria are supported by plugin developers, but all Completion status information is stored in a common method within Moodle. Course Completion criteria may be defined for any course, and may include combinations of Activity completion, satisfaction of grade requirements, earning of Badges, or accomplishment of Competencies. Not all institutions use the Completion Criteria for Activities or for Courses. For those that do, however, this could be a flexible, well-supported way to indicate which Activities in a course are considered the most critical for success. By identifying whether an institution uses Completion in this manner, we can fine-tune the analytics systems to improve predictions and other outputs.

Course Schedule Types

Institutions also vary in whether courses have fixed start and end dates vs. self-paced "open" courses. Beginning with Moodle 3.2, a new "Course End Date" field will be provided with each course. In courses with fixed start and end dates, predictions about learner success can be based on the end date. However, in self-paced courses with open-ended dates, this prediction type is irrelevant. In our earliest version of Project Inspire, we expect to support predictions of success by fixed end dates. In future versions, we may also support a prediction of learner course completion date for use in open-ended courses.

Term End Procedures

Not all institutions have distinct "terms," though many have an equivalent concept, e.g. a fiscal or calendar year that is used to determine when one session of a course ends and a new one begins. Institutions have a variety of means of administering new term "rollover," e.g.

Archive and reset existing courses
Import previous courses to new courses
Re-create courses from scratch
Implement a new Moodle server

These procedures have definitive implications for learning analytics systems, as well as for our initial research data collection process. We are exploring how to support as wide a range of administrative procedures as possible while maintaining the value of activity data from prior course sessions.

Large Enrollment Procedures

In addition to differences in procedures between terms, institutions also have various ways of handling the enrollment of large numbers of students in a given course within a single term, e.g.

Create one Moodle course per teacher and class of students
Create one large Moodle course and assign multiple teachers, and place students in groups

Again, our intent is to support as many commonly used procedures as possible, with the understanding of the need to abstract data sufficiently to allow observations in one environment to be useful in other environments.

Instructional Mode

The instructional mode of the course also affects the data available for analytics and the types of predictions that can be made.

Online

In a fully online course, all learner activities are mediated through Moodle. This mode of instruction offers the most data with which to make predictions about learner success. However, even in this mode there are often "offline" activities, such as reading assignments, that may not be observable by the Moodle system.

Face to Face

Moodle is also used to support classes that meet regularly in a face to face setting. In these cases, Moodle may be used to provide access to common resources such as a syllabus, or as a way for students and teachers to stay in touch outside of the classroom. This mode of instruction offers the least data with which to make predictions about learner success. Institutions primarily based in this mode of instruction may want to consider whether Moodle can be used to record information about classroom activity, especially activity that is already being recorded in another form, like attendance. Plugins designed for in-class activities like Realtime Quiz or ipal (a module that provides in-class "clicker" features) may also be worth considering.

Hybrid/Blended

In a Hybrid or Blended course, some learner activities are mediated through Moodle, but some are offered in a face to face setting. Future versions of Moodle will need to record the proportion of online vs. face to face effort, in order to know how much online activity to expect of learners. Again, if face-to-face data can also be collected via Moodle, the accuracy of predictions and guidance can be improved.

Philosophy of Learning

Although learning analytics are becoming a much-lauded feature in contemporary LMS development and integration, very few (if any) of these analytical systems are based on theories of curriculum or learning, drawing rather from practices in web commerce and other forms of business analytics. As an example, the GISMO tool (Mazza and Botturi, 2007) was developed based on indicators suggested by instructors via a simple survey. Dringus (2012) expresses particular concern that too much emphasis will be placed on the most easily collectable data (e.g. production volume), rather than extractions of semantic information.

We recognize that different institutions have different requirements for analytics, based on the goals and purposes of the institution. This example of different institutional priorities and their implications is based on Schiro's categories of Curriculum Theory:

Curriculum Theory	Academic Scholar	Social Efficiency	Learner Centered	Social Reconstruction
Learning Theory	Cognitivism	Behaviorism	Constructivism	Social Reconstruction
Outcomes	Student Rankings	Competencies	Learner Satisfaction	Learner Empowerment
Sample Indicators	Assignment Grades	Completion Time	Participation	Group Participation

Goal and Indicator Validation

Externally Stored Outcomes

Often, the outcomes of interest to a learner or institution are not stored within Moodle itself, but in an external system (e.g. a Student Information System or Human Resources system). For analysis purposes, tools are under development to simplify the import of final grades (or other indicator of student success status) as new grade item across entire Moodle site from a single csv file.

Generalizability

One might ask, what remains of the data after so much textual content has been encrypted or redacted? The answer is simply the structure of the course itself and the patterns of engagement of learners and instructors within that content and with one another.

Metadata Schema

Open Algorithms

Privacy

Because the analysis needed to develop Project Inspire requires the collection of detailed data from many institutions, extensive measures are in place to ensure the privacy and anonymity of participants.

Legislative Requirements

Both the United States and the European Union have strong data privacy protection laws, especially for students and minors. In the European Union, Directive 95/46/EC refers to anonymisation in Recital 26 to exclude anonymised data from the scope of data protection legislation:

”Whereas the principles of protection must apply to any information concerning an identified or identifiable person; whereas, to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person; whereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable; whereas codes of conduct within the meaning of Article 27 may be a useful instrument for providing guidance as to the ways in which data may be rendered anonymous and retained in a form in which identification of the data subject is no longer possible;”

In the United States, studies involving student data fall under FERPA.

FERPA (34 CFR §99.31(b)(2)) allows an educational agency or institution, or a party that has received education records or information from education records, such as a State educational authority, to release de-identified student-level data (microdata) from education records for the purpose of educational research by attaching a code to each record that may allow the researcher to match information received from the same source under the specified conditions. These conditions require that the coded de-identified microdata are used only for educational research purposes, that the party receiving the data is not allowed any access to the information about how the descriptor is generated and assigned, and that the code cannot be used to identify the student or to match the information from education records with data from any other source. Furthermore, a record descriptor may not be based on a student's social security number or other personal information.

This project is exempt from the requirement to obtain individual informed consent under the Code of Federal Regulations, Title 45, Public Welfare, Department of Health and Human Services, Part 46, Protection Of Human Subjects §46.101 (b)(1)(ii), involving research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods, and §46.101 (4), involving the collection or study of existing data, recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.

Preserving confidentiality and privacy of this data has been treated with the utmost consideration. Following recommendations from Daries et al (2014), both personally identifying information (PII) and data fields that could be used in combination to uniquely identify an individual are either encrypted or generalized. Log data is currently maintained within the Moodle server, restricted to system administrators with access to the database. Extraction of this data will be de-identified by salted one-way encryption or summarization of all uniquely identifying values. Relationships between data will remain intact per individual, but it will not be possible to decode the reference of the data to determine individual identity.

Individual De-Identification

The measures taken to ensure individual anonymity include:

Replace all Personally Identifiable Information data in User, Course, and Category records (short text fields, e.g. names, email addresses, course and category names and ID numbers) with unique, consistent identifiers not based on user identifiable information, e.g. keyed hashed values appended with a literal text field identifier such as”_firstname”

Replace all long text fields (e.g. forum posts, activity descriptions) with ”dummy” text of the same length (e.g. repeated null words)

Replace all attached files with ”dummy” files of approximately the same size and type

Within-Institution De-Identification

Because individual courses may have low enrollment and generally have only one faculty member assigned, individual course names and textual contents are also encrypted to prevent exposure of student and instructor identity.

Institutional De-Identification

Institutional identification becomes an issue for individuals when an institution is small enough that identifying an individual’s relationship with that institution becomes a potential means of identifying the individual. Institutions may also be concerned about revealing data about their internal processes, strengths, and potential weaknesses.

Secure Transmission

Moodle Pty Ltd uses public key encryption with an SSL connection to support the secure transmission of data across the Internet. Access by Moodle Pty Ltd staff to servers storing institutional data is controlled by user IDs and passwords.

Future Considerations

Project Inspire is currently in Phase I, in which we are collecting data for analysis to inform the first rollout of the Descriptive and Predictive analytics features. Following this phase, we will be expanding the scope of Project Inspire to include other aspects of Moodle use.

Third-Party Activity Plugins

In Phase I of Project Inspire, we will limit analysis to Moodle Core plugins. The construction of third-party Moodle plugins varies widely, and data storage and access are not always consistent between plugins. A future version of Project Inspire may prompt new API requirements for data from third-party plugins to be included in analytics.

Custom Fields

Custom fields (e.g. profile fields) will not be included in the analysis and predictions of Phase I. Analytics systems require either numerical data or well-defined, restricted categorical data in order to make predictions. In order to be able to include custom fields (including the custom fields proposed for other contexts by the Moodle User's Group) within analytics, the data stored in these fields will need to be identified according to a known metadata schema, e.g. IEEE LOM.

Data Updates

Project Inspire Phase I will be based on "snapshot" data provided by participating institutions. Future versions may consider ways to incorporate streamed data in the algorithm improvement process.

Textual Analysis

We are aware of progress in the incorporation of textual analysis (e.g. forum posts) in learning analytics predictions. This category of features will not be present in Phase I, but may be considered for inclusion in the future.

External Activity

Much of the data relevant to learner progress is kept in systems outside of Moodle, including Student Information Systems, Human Resources databases, post-course survey systems, etc. As common APIs are developed to incorporate this data into learning analytics systems, we may be able to incorporate more of this data into Project Inspire.

Documentation