Evaluating a Biometric Age Estimation System

Using Scientific Method and Hypothesis Testing

In this exercise we will conduct an experiment using a biometric system found on the internet. Because many biometric systems are available on the internet other similar experiments can easily be designed.

This exercise uses Microsoft's Age Estimator that estimates a person's age from a photo. For reasonable statistical analysis the exercise works best for a class or group of at least 10 people, preferably 20-30 or more.

**Before performing the exercise, the following should be discussed:**

- Biometrics as a component of cybersecurity (covered in earlier presentation)
- The nature of research, particularly the scientific method and hypothesis testing.
- Preview the Excel Spreadsheet

**Possible hypotheses:**
Now, discuss possible hypotheses related to the age estimator and come to a consensus on a particular hypothesis to test.
Several plausible hypotheses are mentioned here but we will go with the first one for this exercise.

- For adult users, the age estimator tends to underestimate a person's age so they feel younger and good about themselves.
- For young people, like your high school students, the age estimator tends to overestimate a person's age so they see themselves as older, more mature, and important.
- For a group with a reasonable age distribution, the age estimation error increases as a person's age increases.

**This exercise consists of steps typical used by the scientific method:**

**Collecting data:**the data required for this exercise consists of the actual and machine estimated ages of all the participants (the students in the classroom or, in this case, the workshop participants).- One or several participants take photos of all participants, easily done with smartphones.
- One or several participants enter the photos into the estimator to obtain the age estimates.

**Performing an analysis of the data:**the data are entered into a prepared spreadsheet that automatically performs an analysis of the data.- One participant enters the actual and estimated age data into the spreadsheet.

**Determining whether or not the hypothesis is confirmed:**- The spreadsheet results are examined and discussed to determine whether or not the hypothesis is confirmed.
- Note that inferential statistics can be used to rigorously test an hypothesis. Although this involves considerable math, and such procedures could be investigated by advanced students.

**Key ingredient of exercise:**

- The prepared spreadsheet that automatically calculates the histogram and associated statistics -- min, max, mean, median, mode, and standard deviation -- once the actual and estimated age data are entered.

**Student learning outcomes:**

- Students learn about a biometric system, in this case an age estimator.
- Students learn how to do research using the scientific method and hypothesis testing.
- They learn the methodology of collecting data, analyzing the data, and hypothesis testing.
- They can also learn about experimental controls to minimize extraneous variables.
- In this experiment, for example, photo capture could be limited to one camera/smartphone so as not to introduce variations due to different camera resolutions.

- They learn mathematical measures used in statistics: min, max, mean (average), median, mode, and standard deviation.
- Students also learn about histograms. Probability distributions can also be taught here since the probability distribution in this case is obtained directly from the histogram by dividing the counts by the number of age estimates.

- They learn from the histogram and the mean, median, and mode values that it is easy to get a rough idea whether the hypothesis is true or not.
- More precise statistical hypothesis tests can determine the degree to which the hypothesis is true or not by obtaining the probability of the null hypothesis, usually looking for a p-value < 5% for reasonable significance (a Google search will find material on statistical hypothesis testing).

- Students learn how to develop a sophisticated spreadsheet to analyze experimental data.
- For example, have one or several students examine and explain the formulas used in the spreadsheet.