Interactive Visual System*
The goal of visual pattern recognition during the past fifty years
has been the development of automated systems that rival or even
surpass human accuracy, at higher speed and lower cost.
Human interaction is considered, if at all,
only to deal with "rejects" in the final step.
There are pronounced differences between human and machine cognitive abilities.
Humans apply to recognition a rich set of contextual constraints and
superior noise filtering abilities to excel in gestalt tasks,
like object-background separation.
Computers, however, can store thousands of images and associations
between them, never forget a name or a label, and compute geometric
moments and probability distributions.
These differences suggest that a system that combines human and
machine abilities can, in some situations, outperform both.
This is the general goal of CAVIAR
(Computer Assisted Visual InteraActive Recognition).
Interesting research problems include:
- Segmentation is an extremely difficult problem for computer vision,
while human has great
The research on interactive segmentation, which means strong segmentation
with assistance from human, will have
great impact not only on visual pattern recognition,
but also on computer vision, content-based image/video retrieval, image/video
database, and model-based image/video coding.
- Interactive classifier may be different from traditional classifier,
since its purpose is not to find the top one candidate,
while it is to eliminate the most irrelevent candidates,
and to order the possible candidates to help the user making decisions.
- System architecture, integration, and evaluation are other
interesting things to show what is the performance improvement
of interactive recogntion comparing to fully automatic system and
layperson on some applications.
This application of CAVIAR is designed to recognize wild flowers,
or other families of similar objects,
more accurately than machine vision and faster than most laypersons.
It draws on the technologies
of sequential pattern recognition, image database, expert systems,
pen computing, and digital camera technology.
For a description of a rudimentary system implemented on a laptop,
see the paper
Interactive Visual Pattern Recognition
by G. Nagy and J. Zou, Proc. Int. Conf. Pattern Recognition (ICPR), 2002.
This project concerns the development of the system on a handheld computer
together with the direct capture and processing of associated photos.
Suggested steps to follow are:
If the code does not fit into the handheld,
or cannot be ported/converted to run on the handheld,
it is likely due to the size of the pattern recognition portion of the code.
The alternative is to run that code on a server, but that may also
be difficult or impractical.
Thus, for a minimal system,
the code that provides the human interaction should be
ported/converted and the interaction tested by having the system limit
the choices based on number of petals and other features entered by the user.
Photos of the resulting choices can then be shown to the user
for his/her final decision.
- Develop a project plan and make initial decisions, such as:
- Determine whether existing laptop code, or portion thereof,
can be ported to a handheld
- Determine the difficulty of using the handheld as a client
to a remote server running the pattern recognition code
- Decide on a handheld computer (Palm OS) and digital camera combination
so that equipment can be purchased with our limited budget
- If possible, port or convert the laptop code to run on the handheld
- Test the interactive portion by limiting the choices
based on number of petals, etc.
- Have the system process photos from the connected camera
- Test the overall system -- photo processing combined with human interaction
It has been suggested that IPAQ or Sharp Zaurus would be suitable
handheld platforms. Both take a plug-in camera, and both run linux.
We are not sure whether IPAQ has a camera driver for linux. TWAIN
is Windows oriented. Other candidates might be PALM O/S and Ricoh.
Yet another is HP Jornada 586 Pocket PC with HP pocket camera, as used in a
CMU sign translation project demonstrated at ICPR2002;
although they are trying to solve a simpler problem fully automatically,
their hardware requirements appear to be the same as CAVIAR.
* Much of the project description background was taken directly from
Jie Zou's Web site.