Early visual processing as a method to speed up computations on visual input data has long been
discussed in the computer vision community. The general target of a such approaches is to filter nonrelevant
information from the costly higher-level visual processing algorithms. By insertion of this
additional filter layer the overall approach can be speeded up without actually changing the visual
processing methodology.
Being inspired by the layered architecture of the human visual processing apparatus, several approaches
for early visual processing have been recently proposed. Most promising in this field is the
extraction of a saliency map to determine regions of current attention in the visual field. Such saliency
can be computed in a bottom-up manner, i.e. the theory claims that static regions of attention emerge
from a certain color footprint, and dynamic regions of attention emerge from connected blobs of textures
moving in a uniform way in the visual field. Top-down saliency effects are either unconscious through
inherent mechanisms like inhibition-of-return, i.e. within a period of time the attention level paid to
a certain region automatically decreases if the properties of that region do not change, or volitional
through cognitive feedback, e.g. if an object moves consistently in the visual field. These bottom-up
and top-down saliency effects have been implemented and evaluated in a previous computer vision
system for the project JAST.
In this paper an extension applying evolutionary processes is proposed. The prior vision system utilized
multiple threads to analyze the regions of attention delivered from the early processing mechanism.
Here, in addition, multiple saliency units are used to produce these regions of attention. All of these
saliency units have different parameter-sets. The idea is to let the population of saliency units create
regions of attention, then evaluate the results with cognitive feedback and finally apply the genetic
mechanism: mutation and cloning of the best performers and extinction of the worst performers considering
computation of regions of attention. A fitness function can be derived by evaluating, whether
relevant objects are found in the regions created.
It can be seen from various experiments, that the approach significantly speeds up visual processing,
especially regarding robust ealtime object recognition, compared to an approach not using saliency
based preprocessing. Furthermore, the evolutionary algorithm improves the overall performance of
the preprocessing system in terms of quality, as the system automatically and autonomously tunes
the saliency parameters. The computational overhead produced by periodical clone/delete/mutation
operations can be handled well within the realtime constraints of the experimental computer vision
system. Nevertheless, limitations apply whenever the visual field does not contain any significant
saliency information for some time, but the population still tries to tune the parameters - overfitting
avoids generalization in this case and the evolutionary process may be reset by manual intervention.
|