Eye-tracking holds numerous promises for improving the mixed reality experience. While eye-tracking devices are capable of accurate gaze mapping on 2D surfaces, depth estimation of gaze points remains a challenging problem. Most gaze-based interaction applications are supported by estimation techniques that find a mapping between gaze data and corresponding targets on a 2D surface. This approach inevitably leads to a biased outcome, as the nearest objects in the line of sight will tend to be the target of interest more often. One viable solution would be to estimate gaze as a 3D coordinate (x, y, z) rather than the traditional 2D coordinate (x, y). This article first introduces a new comprehensive 3D gaze dataset collected in a realistic scene setting. Data was collected using a head-mounted eye-tracker and a depth estimation camera. Next, we present a novel depth estimation model, trained on the new gaze dataset to accurately predict gaze depth based on calibrated gaze vectors. This method could help develop a mapping between gaze and 3D objects on a 3D plane. The presented model improves the reliability of depth measurement of visual attention in real scenes as well as the accuracy of depth-based scenes in virtual reality environments. Improving situational awareness using 3D gaze data will benefit several domains, particularly human-vehicle interaction, autonomous driving, and augmented reality.
|