Difference between revisions of "BioAuth Plugin"

Jump to: navigation, search
(Introduction)
(Introduction)
Line 12: Line 12:
 
</span>
 
</span>
  
===Introduction===
+
==Introduction==
  
 
The purpose of the BioAuth plugin is to provide a mechanism for verifying a user's identity based on behavioral biometrics. This is accomplished by capturing keystrokes, mouse movement, or text from a user and matching it against a known template for that user. The initial release of BioAuth will only support test-taker identity verification through keystroke dynamics.
 
The purpose of the BioAuth plugin is to provide a mechanism for verifying a user's identity based on behavioral biometrics. This is accomplished by capturing keystrokes, mouse movement, or text from a user and matching it against a known template for that user. The initial release of BioAuth will only support test-taker identity verification through keystroke dynamics.
  
====Background====
+
===Background===
  
 
A '''capture session''' (or just '''session''') is the duration in which some data is continuously captured from a user. This could be a quiz or some activity on a forum. A user may have multiple sessions capture while they are logged in. Sessions should capture data from an independent activity or task which lasts anywhere from roughly 10 to 60 minutes depending on the task involved.
 
A '''capture session''' (or just '''session''') is the duration in which some data is continuously captured from a user. This could be a quiz or some activity on a forum. A user may have multiple sessions capture while they are logged in. Sessions should capture data from an independent activity or task which lasts anywhere from roughly 10 to 60 minutes depending on the task involved.

Revision as of 13:30, 14 June 2013

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

BioAuth: A Moodle plugin for determining Quiz authorship
Project state Community bonding period
Tracker issue CONTRIB-4337
Discussion XXX
Assignee Vinnie Monaco

GSOC '13

Introduction

The purpose of the BioAuth plugin is to provide a mechanism for verifying a user's identity based on behavioral biometrics. This is accomplished by capturing keystrokes, mouse movement, or text from a user and matching it against a known template for that user. The initial release of BioAuth will only support test-taker identity verification through keystroke dynamics.

Background

A capture session (or just session) is the duration in which some data is continuously captured from a user. This could be a quiz or some activity on a forum. A user may have multiple sessions capture while they are logged in. Sessions should capture data from an independent activity or task which lasts anywhere from roughly 10 to 60 minutes depending on the task involved.

A subtask is a logical task which must be performed within a session. This could be answering a question on a quiz or posting a response in a forum. Each session is comprised of several subtasks, which contribute so the goal of the session (completing a quiz, participating in a discussion).

A keystroke event is the atomic event captured by the hardware keyboard of a desktop or laptop computer. It consists of a press time, release time, and key identity. The key is identified by it's location on the keyboard, not the ASCII code of the letter which was actually printed on screen. For example, 'A' is the same as 'a,' even though they have different ASCII codes, and left shift is not the same as right shift, even though they have the same ASCII codes. The identity of the key can usually be determined by the key code, which should be unique for every key on the keyboard. Care must be taken to correlate key codes across different browsers and operating systems, as they may be platform dependent. For more information on events and keycodes, see Quirksmode on JavaScript keys.

A stylometry event is a portion of text that was emitted to the screen in a continuous manner. Stylometry events are separated by a change in focus or cursor position. Events that segment consecutive stylometry events (ending the current stylometry event and beginning a new one) include:

  • Changing the window focus
  • Clicking on a different area of text in the current entry
  • Using the arrow keys to navigate a text area
  • Moving to a different text area.

Using the delete key to modify text which was already printed on screen does not end the current stylometry event, although it does modify the text which is contained in the current stylometry event. Upon completion of a subtask, the sequence of stylometry events will roughly consist of the final text that the user actually typed and appeared on screen.

Authentication

The raw data is processed in several stages in order to make an authentication decision, and either confirm or deny the identity of an individual:

Feature extraction → Fallback Procedure → Outlier Removal → Normalization → Binary Classification

The feature extraction is dependent on the data source (keystroke, stylometry, mouse, etc), while every stage after that can be performed independent of the data captured. This allows the system to be robust and extensible to other biometrics.

Feature Extraction

During feature extraction, the raw data (keystroke, stylometry) is mapped to a feature space. Features usually consist of measurements on some distribution in the raw data. The hold time is the duration a key is held, or press time – release time. There are several types of transition times, the most common being release-press and press-press latencies. For example, take two consecutive keystrokes, [A,B]. The hold time of A would be (A_release - A_press) and the transition times, (B_press – A_release), (B_press – A_press).

The hold times of A form a distribution from which the mean and standard deviation may be taken. Similarly for the transition times between A and B, assuming that the sequence [A,B] occurs with a high enough frequency in the sample provided. For missing data, a fallback procedure is used, described in the next section.

Fallback Procedure

Dealing with arbitrary can lead to missing observations. To compensate for this, a fallback procedure is used to compensate for missing data. Missing features may be computed from observations which are known to be correlated with the missing data.

It is known that the timing information for closely grouped keys correlate well enough to used them in an instance of missing data. For this reason, a fallback hierarchy based on the physiology of the touch-typist is used.

Outlier Removal

Due to inconsistencies in the data provided, outliers must be removed. It is common for a user to take a break while typing, creating a large transition time between keystrokes. These can easily be removed by excluding observations which fall outside the mean observation, usally +/- 2 standard deviations.

Normalization

TODO

Binary Classification

TODO

Design

TODO

Schedule

The work will be completed under the following schedule:

Dates Task Status
May 27 - June 17 Draft of design, to be accomplished during Community Bonding Period.

Also, deadline for submitting a paper to MRC2013.

In progress
June 17 - July 1 Binary classifier and authentication decision functions Not done
July 1 - July 15 Key logger

DB model

Not done
July 15 - August 1 Keystroke Feature extractor Not done
August 1 -August 11 UI for results and settings Not done
August 15- Sept. 1 Beta testing with trusted volunteers (recruit 10-15 volunteers).

Bug fixes and final report.

Not done
Sept. 1 - Sept. 23 Prepare for release.

Scrub code, tests, documentation.

Not done

See also