Most of the projects this semester concern biometrics. The common biometrics include face, iris, fingerprint, voice print, etc. This semester's biometric projects include the less common biometrics of mouse movements, stylometry, and keystroke patterns. We have chosen the less common biometrics because it is easier to perform new and unique research and to publish in these areas. All biometrics have authentication and identification applications. In authentication (verification) applications a user is either accepted or rejected (binary response, yes you are the person you claim to be or no you are not). In identification applications a user is identified from within a population of, say, n users (one-of-n response), which is usually a more difficult problem.
This semester we will continue projects on the mouse movement, stylometry, and keystroke biometrics. These continued projects will have new directions/focuses compared to the earlier ones. In the earlier projects we attacked the identification problem to establish the feasibility of the biometric, reasoning that a good result (reasonably high recognition accuracy) on identification would be more significant than one on the easier authentication problem. Also, for ease of implementation we used a simple classification technique called nearest neighbor. This semester we will focus more on the authentication problem, which is usually considered the more important one and the one for which comparable evaluation statistics can be obtained.
This semester's biometric projects will focus on the front-end system components, data gathering and feature extraction, and provide the feature data to the Authentication and Data Mining teams for back-end classification processing. The feature vector files should be in the following text readable format. The first record identifies the file (type of biometric and other important characteristics of the data), and each subsequent record consists of the ID of the user, the date the biometric sample was taken, the number of features in the feature vector, and the feature vector data. A more precise format may be provided later by the instructor or the Data Mining team. If the date of capture for the earlier-collected data is not known (most likely), try to estimate it (say, within a month) from discussions with the customers or previous team members, otherwise use a default of 1/1/1900. System performance results can be reported in the technical papers of both the front-end biometric team and the back-end team obtaining the results.
Although we probably cannot undertake all six of these projects this semester, it is anticipated that we will do at least four and possibly five of them.