In this work, we aim to address the needs of human analysts to automatically summarize the content of large swaths of overhead imagery. We present our approach to this problem using deep neural networks, providing detection and segmentation information to enable fine-grained description of scene content for human ingestion. Four different perception systems were run on blocks of large-scale satellite imagery: (1) semantic segmentation of roads, buildings, and vegetation; (2) zone segmentation to identify commercial, industrial, residential, and airport zones; (3) classification of objects such as helipads, silos, and water towers; and (4) object detection to find vehicles. Results are filtered based on a user's zoom level in the swath, and subsequently summarized as textual bullets and statistics. Our framework blocks the image swaths at a resolution of approximately 30cm for each perception system. For semantic segmentation, overlapping imagery is processed to avoid edge artifacts and improve segmentation results by voting for the category label of each pixel in the scene visible from multiple chips. Our approach to zone segmentation is based on classification models that vote for a chip belonging to a particular zone type. Regions surrounded by chips classified as a particular category are assigned a higher score. We also provide an overview of our experience using OpenStreetMap (OSM) for pixel-wise annotation (for semantic segmentation), image-level labels (for classification), and end-to-end captioning methods (image to text). These capabilities are envisioned to aid the human analyst through an interactive user interface, whereby scene content is automatically summarized and updated as the user pans and zooms within the imagery.
KEYWORDS: Video, Visualization, Image segmentation, Semantic video, Video surveillance, Data storage, RGB color model, Visual process modeling, Convolution, Video processing
In this work, we aim to address the needs of human analysts to consume and exploit data given the proliferation of overhead imaging sensors. We have investigated automatic captioning methods capable of describing and summarizing scenes and activities by providing textual descriptions using natural language for overhead full motion video (FMV). We have integrated methods to provide three types of outputs: (1) summaries of short video clips; (2) semantic maps, where each pixel is labeled with a semantic category; and (3) dense object description to capture object attributes and activities. We show results obtained from VIRAT and Aeroscapes publicly available datasets.
Automated semantic labeling of complex urban scenes in remotely sensed 2D and 3D data is one of the most challenging steps in producing realistic 3D scene models and maps. Recent large-scale public benchmark data sets and challenges for semantic labeling with 2D imagery have been instrumental in identifying state of the art methods and enabling new research. 3D data from lidar and multi-view stereo have also been shown to provide valuable additional information to enable improved semantic labeling accuracy. In this work, we describe the development of a new large-scale data set combining public lidar and multi-view satellite imagery with pixel-level truth for ground labels and instance-level truth for building labels. We demonstrate the use of this data set to evaluate methods for ground and building labeling tasks to establish performance expectations and identify areas for improvement. We also discuss initial steps toward further leveraging this data set to enable machine learning for more complex semantic and instance segmentation and 3D reconstruction tasks. All software developed to produce this public data set and to enable metric scoring are also released as open source code.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.