Biometric Authentication System

The dichotomy model is a powerful, but little-explored, technique for biometric authentication (verification). A comparison of this technique to other authentication techniques could produce an outstanding dissertation in the area of biometrics, especially if it is supported by comparative experiments which could be performed on our extensive biometric (especially keystroke) databases.

This project team will develop a generic authentication system, preferably in Java, using the dichotomy model. The dichotomy model was used in Dr. Cha's dissertation, see key paper for this project, and in an on-line fingerprint verification study. Also, see the subset of dichotomy slides from a conference paper.

Your generic dichotomy-model authentication system will accept feature-vector data in the specified format (see data format link on main Projects page). For example, take the stylometry example data given to the class and repeated below (in blue). This example feature data file contains 6 feature vectors, 2 from each of 3 subjects (so we have 3 pattern classes), with each vector having only 2 measurements so we are operating in 2D feature space.

Stylometry biometric data example created September 2007
MaryJones/F/26, bachelors degree, Dell laptop, structured email task, 2, 0.13668, 0.53375
MaryJones/F/26, bachelors degree, Dell laptop, structured email task, 2, 0.14378, 0.56275
JohnSmith/M/27, masters degree, Compaq handheld, free email task, 2, 0.53628, 0.43865
JohnSmith/M/27, masters degree, Compaq handheld, free email task, 2, 0.43628, 0.53865
ChrisHill/F/02-04-1983, PhD degree, Dell desktop, free email task, 2, 0.39734, 0.92862
ChrisHill/F/02-04-1983, PhD degree, Dell desktop, free email task, 2, 0.49924, 0.98861

The above three-class example file can easily be converted into a two-class dichotomy-model authentication data file. The two authentication classes are the within-class (same person) and the between-class (different people) categories. We perform the conversion by taking all possible difference vectors. In this case, there are only 3 within-class vector pairs, one for each of the three people, and 12 (6*4/2, each of the 6 instances can be compared with the 4 instances from other people, then divide by 2 to eliminate duplicates) between-class vector pairs. In general, if n people provide m biometric samples each, there are m*(m-1)*n/2 within-class pairs and m*m*n*(n-1)/2 between-class pairs (see the key paper reference above). The number of between-class pairs usually far exceeds the number of within-class pairs. Sometimes both the number of within-class and between-class pairs can be large (possibly in the millions), and then the training and test samples can be generated at random and not explicitly elaborated as indicated here. For each pair, a difference vector is computed by taking the absolute difference between each vector component. Because our biometric features are in the range 0-1, the difference vector features will also be in the range 0-1.

For the illustrative file above, for example, the feature vector record for the first within-class (same person) pair and for the first between-class (different people) pair would be:
same, ?, ?, ?, 2, |0.13668-0.14378|, |0.53375-0.56275|
different, ?, ?, ?, 2, |0.13668-0.53628|, |0.53375-0.43865|
This conversion procedure can easily be implemented, and we recommend using Java for coding. You are to use this procedure to convert the feature data that you receive from the mouse movement, stylometry, and keystroke teams into dichotomy-model authentication data for further processing by your team and by the Data Mining team.

After implementing the dichotomy-model conversion, authentication system performance results will be obtained on data from the various biometric front-end teams. A textbook (Guide to Biometrics, by Bolle, et al., Springer 2004, ISBN 0387400893) was provided to the team (book must be returned at the end of the semester) that describes the performance statistics, namely False Accept Rate (FAR) and False Reject Rate (FRR), that should be obtained on the mouse movement, stylometry, and keystroke data.

Summary of tasks to perform

  1. Write a conversion program to convert files of n-class feature data (in the specified format) into files of 2-class (inter and intra-class) dichotomy-model feature data.
  2. Prepare sets of inter and intra-class data for training and testing. The testing sets must be independent (different) from the training sets. These data sets might be the output of the conversion program. These data sets should also be made available to Team 6 for further experimentation.
  3. Implement the nearest-neighbor technique to obtain accuracy results on the data. This technique simply computes the Euclidean distance of each testing sample to all the training samples, and assigns the test sample to the class of the nearest training sample. The form of the results we are looking for is shown in the table below. For example, the first results row of the table indicates that we have mouse data, test sample sizes for inter and intra-class of 50 each, 12% False Reject Rate (6 samples falsely rejected out of the 50 intra-class samples), 16% False Accept Rate (8 samples falsely accepted out of the 50 inter-class samples), and a performance of 86% (the remaining 86 samples correctly classified).

    Biometric Test Test Sizes FRR FAR Performance
    Mouse 10 Fixed Buttons 50-50 12.0% 16.0% 86.0%
    Stylometry PC Email 100-100 5.0% 5.0% 95.0%
    Keystroke Copy/Desktop-Desktop 500-500 1.2% 0.2% 99.3%
    Keystroke Copy/Desktop-Laptop 500-500 4.2% 3.2% 96.3%

    Although the initial training and testing data sets might be rather small, we should strive for training and testing data sets of approximately 1000 samples each (500 inter and 500 intra-class pairs). Also, for the mid-semester checkpoint we would like to see at least one small experiment completed.