Monday, September 14, 2009

GaZIR: Gaze-based Zooming Interface for Image Retrieval (Kozma L., Klami A., Kaski S., 2009)

From the Helsinki Institute for Information Technology, Finland, comes a research prototype called GaZIR for gaze based image retrieval built by Laszlo Kozma, Arto Klami and Samuel Kaski. The GaZIR prototype uses a light-weight logistic regression model as a mechanism for predicting relevance based on eye movement data (such as viewing time, revisit counts, fixation length etc.) All occurring on-line in real time. The system is build around the PicSOM (paper) retrieval engine which is based on tree structured self-organizing maps (TS-SOMs). When provided a set of reference images the PicSOM engine goes online to download a set of similar images (based on color, texture or shape)

Abstract
"We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking."


Fig1. "Screenshot of the GaZIR interface. Relevance feedback gathered from outer rings influences the images retrieved for the inner rings, and the user can zoom in to reveal more rings."

Fig2. "Precision-recall and ROC curves for userindependent relevance prediction model. The predictions (solid line) are clearly above the baseline of random ranking (dash-dotted line), showing that relevance of images can be predicted from eye movements. The retrieval accuracy is also above the baseline provided by a naive model making a binary relevance judgement based on whether the image was viewed or not (dashed line), demonstrating the gain from more advanced gaze modeling."

Fig 3. "Retrieval performance in real user experiments. The bars indicate the proportion of relevant images shown during the search in six different search tasks for three different feedback methods. Explicit denotes the standard point-and-click feedback, predicted means implicit feedback inferred from gaze, and random is the baseline of providing random feedback. In all cases both actual feedback types outperform the baseline, but the relative performance of explicit and implicit feedback depends on the search task."
  • László Kozma, Arto Klami, and Samuel Kaski: GaZIR: Gaze-based Zooming Interface for Image Retrieval. To appear in Proceedings of 11th Conference on Multimodal Interfaces and The Sixth Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), Boston, MA, USA, Novermber 2-6, 2009. (abstract, pdf)

No comments: