KEYWORDS: Image segmentation, 3D image processing, Image analysis, 3D displays, Communication engineering, Image classification, Edge detection, Imaging systems, Digital image processing, Radio over Fiber
With increasing demands of 3D contents, conversion of many existing two-dimensional contents to three-dimensional
contents has gained wide interest in 3D image processing. It is important to estimate the relative depth map in a single-view
image for the 2D-To-3D conversion technique. In this paper, we propose an automatic conversion method that
estimates the depth information of a single-view image based on degree of focus of segmented regions and then
generates a stereoscopic image. Firstly, we conduct image segmentation to partition an image into homogeneous regions.
Then, we construct a higher-order statistics (HOS) map, which represents the spatial distribution of high-frequency
components of the input image. the HOS is known to be well suited for solving detection and classification problems
because it can suppress Gaussian noise and preserve some of non-Gaussian information. We can estimate a relative depth
map with these two cues and then refine the depth map by post-processing. Finally, a stereoscopic image is generated by
calculating the parallax values of each region using the generated depth-map and the input image.
As digital broadcasting technologies have been rapidly progressed, users' expectations for realistic and interactive
broadcasting services also have been increased. As one of such services, 3D multi-view broadcasting has received much
attention recently. In general, all the view sequences acquired at the server are transmitted to the client. Then, the user
can select a part of views or all the views according to display capabilities. However, this kind of system requires high
processing power of the server as well as the client, thus posing a difficulty in practical applications. To overcome this
problem, a relatively simple method is to transmit only two view-sequences requested by the client in order to deliver a
stereoscopic video. In this system, effective communication between the server and the client is one of important
aspects.
In this paper, we propose an efficient multi-view system that transmits two view-sequences and their depth maps
according to user's request. The view selection process is integrated into MPEG-21 DIA (Digital Item Adaptation) so
that our system is compatible to MPEG-21 multimedia framework. DIA is generally composed of resource adaptation
and descriptor adaptation. It is one of merits that SVA (stereoscopic video adaptation) descriptors defined in DIA
standard are used to deliver users' preferences and device capabilities. Furthermore, multi-view descriptions related to
multi-view camera and system are newly introduced. The syntax of the descriptions and their elements is represented in
XML (eXtensible Markup Language) schema. If the client requests an adapted descriptor (e.g., view numbers) to the
server, then the server sends its associated view sequences. Finally, we present a method which can reduce user's visual
discomfort that might occur while viewing stereoscopic video. This phenomenon happens when view changes as well as
when a stereoscopic image produces excessive disparity caused by a large baseline between two cameras. To solve for
the former, IVR (intermediate view reconstruction) is employed for smooth transition between two stereoscopic view
sequences. As well, a disparity adjustment scheme is used for the latter.
Finally, from the implementation of testbed and the experiments, we can show the valuables and possibilities of our
system.
KEYWORDS: Computer programming, Video, Video processing, Cameras, Visualization, Video compression, Video coding, Image visualization, Image quality, Internet
The progress of data transmission technology through the Internet has spread a variety of realistic contents. One of
such contents is multi-view video that is acquired from multiple camera sensors. In general, the multi-view video
processing requires encoders and decoders as many as the number of cameras, and thus the processing complexity
results in difficulties of practical implementation.
To solve for this problem, this paper considers a simple multi-view system utilizing a single encoder and a single
decoder. In the encoder side, input multi-view YUV sequences are combined on GOP units by a video mixer. Then, the
mixed sequence is compressed by an H.264/AVC encoder. The decoding is composed of a single decoder and a
scheduler controlling the decoding process. The goal of the scheduler is to assign approximately identical number of
decoded frames to each view sequence by estimating the decoder utilization of a GOP and subsequently applying frame
skip algorithms. Furthermore, in the frame skip, efficient frame selection algorithms are studied for H.264/AVC
baseline and main profiles based upon a cost function that is related to perceived video quality.
Our proposed method has been performed on various multi-view test sequences adopted by MPEG 3DAV.
Experimental results show that approximately identical decoder utilization is achieved for each view sequence so that
each view sequence is fairly displayed. Finally, the performance of the proposed method is compared with a simulcast
encoder in terms of bit-rate and PSNR using a rate-distortion curve.
Stereoscopic conversion of two-dimensional (2-D) video is considered based upon image motions. In general, a stereoscopic camera with two imaging sensors is required for stereoscopic video, while the stereoscopic conversion directly converts 2-D video to 3-D stereoscopic video. Image motions that are needed to generate a stereoscopic image are computed by any motion estimation algorithms. However, it is well know that motion vectors are generally far from a true motion, thereby posing a difficulty in the utilization of such data. To overcome the aforementioned problem, each image frame is classified into either a primary frame (PF) or a secondary frame (SF) depending upon the reliability of motion vectors. Furthermore, the stereoscopic generation method of the PF is presented.
For the performance evaluation of our proposed method, we apply it to five test sequences and measure the accuracy of our proposed method. Experimental results show that our proposed method has the accuracy more than about 90 percent in terms of the detection ratio of primary frames. Furthermore, a variety of test sequences encoded in MPEG are directly applied and support that our proposed method is well designed.
KEYWORDS: Video, Cameras, Video compression, 3D video compression, Prototyping, 3D image processing, 3D displays, Video processing, Stereoscopic displays, 3D video streaming
In this paper, we propose a design of multi-view stereoscopic HD video transmission system based on MPEG-21 Digital Item Adaptation (DIA). It focuses on the compatibility and scalability to meet various user preferences and terminal capabilities. There exist a large variety of multi-view 3D HD video types according to the methods for acquisition, display, and processing. By following the MPEG-21 DIA framework, the multi-view stereoscopic HD video is adapted according to user feedback. A user can be served multi-view stereoscopic video which corresponds with his or her preferences and terminal capabilities. In our preliminary prototype, we verify that the proposed design can support two deferent types of display device (stereoscopic and auto-stereoscopic) and switching viewpoints between two available viewpoints.
KEYWORDS: Image compression, Computer programming, Cell phones, Personal digital assistants, Video compression, Chromium, Mobile devices, LCDs, RGB color model, Image processing
Recently, in mobile environments, a variety of character images have been provided to mobile phones and PDAs. Most of character images are an indexed image with 256 index values, requiring a lossless compression due to the property of the indexed data. Encoded image data is delivered to user clients from a server of content providers. Mobile devices are usually equipped with low processing power compared with desktop devices so that the simplicity of decoding process as well as the satisfactory compression ratio of encoding process is required. Especially, the complexity of a decoder needs to be reduced for the fast view of images on the LCD monitor.
Many lossless compression schemes have been proposed and adopted in practical fields. We propose an efficient compression scheme satisfying these requirements. The main difference distinguishing the conventional approaches and our proposed method is that ours is mainly interested in character image sequence, where each sequence is composed of twelve images. In experiments performed on many test sequences, we show that our proposed method produces a compression ratio of 3.32:1.
This paper presents a new approach of combining real video and synthetic objects. The purpose of this work is to use the proposed technology in the fields of advanced animation, virtual reality, games, and so forth. Computer graphics has been used in the fields previously mentioned. Recently, some applications have added real video to graphic scenes for the purpose of augmenting the realism that the computer graphics lacks in. This approach called augmented or mixed reality can produce more realistic environment that the entire use of computer graphics. Our approach differs from the virtual reality and augmented reality in the manner that computer- generated graphic objects are combined to 3D structure extracted from monocular image sequences. The extraction of the 3D structure requires the estimation of 3D depth followed by the construction of a height map. Graphic objects are then combined to the height map. The realization of our proposed approach is carried out in the following steps: (1) We derive 3D structure from test image sequences. The extraction of the 3D structure requires the estimation of depth and the construction of a height map. Due to the contents of the test sequence, the height map represents the 3D structure. (2) The height map is modeled by Delaunay triangulation or Bezier surface and each planar surface is texture-mapped. (3) Finally, graphic objects are combined to the height map. Because 3D structure of the height map is already known, Step (3) is easily manipulated. Following this procedure, we produced an animation video demonstrating the combination of the 3D structure and graphic models. Users can navigate the realistic 3D world whose associated image is rendered on the display monitor.
KEYWORDS: Video, Motion estimation, Cameras, 3D image processing, Image processing, Linear filtering, 3D vision, Video processing, Visualization, 3D displays
We present a new method for converting monoscopic video to stereoscopic video. The key characteristic of our proposed method is that it can process non-horizontal camera/object motion existing in most of image scenes. It is well known that the non-horizontal motion causes the vertical parallax to human eyes and accordingly visual discomfort. The proposed methodology is composed of four major steps. First, given a current video frame, we estimate a motion vector for each block by a conventional block matching motion estimation algorithm. The motion vector is composed of horizontal and vertical disparities. Second, the norm of the motion vector is computed for each block. Here, the vertical disparity is eliminated due to the usage of the norm of the motion vector. Due to the unreliability of estimated motion vectors, a low-pass filter is performed on the norm of the motion vector in order to enhance the reliability. Third, each block is shifted to the horizontal direction by the norm of the motion vector, which is transformed to binocular parallax. The shift of blocks in the horizontal direction eliminates the effects of the vertical disparity. Finally, all the shifted blocks are synthesized and a synthesized image is then generated. A stereoscopic image pair being composed of the original image and its associated synthesized image is produced. With proper 3D viewing devices, users can feel 3D depth from seeing the stereoscopic image. Preliminary experiments have demonstrated that stable stereoscopic image pairs can be produced by applying our proposed method to a variety of monoscopic video with non-horizontal camera panning and/or object motion.
We present a recursive estimation technique for recovering FOE from unreliable motion or optical flow. The estimation of FOE is of importance to the analysis of camera motion, especially in the case that the camera motion is purely translational. Our work is based on the observation that there is strong dependence between FOE estimation and motion flows. Therefore, as the FOE depends on the motion flow, a good motion flow can be obtained from accurate FOE. We assume that the camera motion is purely translational and there is no object motion in the scene. The technique used for the elimination of unreliable motion flow is orthogonal regression method. We combine FOE estimation with the elimination algorithm of unreliable motion flows. Experiments using both simulation and real scenes show that our proposed method works robustly under the condition that the percentage of outliers is varying.
Humans subjectively evaluate the content of a scene. FOr content-based indexing and retrieval, we index and retrieve the scene containing moving objects because they remain in our memory longer than static scene. The importance of processing moving objects has been demonstrated in image compression, content-based data processing, and a variety of video processing techniques. This paper proposes the method of segmenting and then compressing the image including moving objects. The image scene is usually composed of the motion region (MR) and static region (SR). For simplicity, the camera motion region is assigned to the SR, because the region has similar characteristics with the SR. The MR is extracted by our segmentation technique. We propose a line scan-based segmentation method being composed of motion estimation and label assignment. The dominant region owning the largest number of blocks with the same label is classified as SR. The region excluding the SR is MR. Then, the MR is processed by lossless compression or lossy with low-compression ratio to preserve the high quality, and the SR by a lossy method with high compression ratio. Rather than applying separate methods to MR and SR, we use a hybrid compression method based on DCT. Experiments on test video clips show the increase of the compression ratio with respect to the lossless compression and better visualization of the moving objects compared with the lossy compression.
Teleradiology is defined as the practice of radiology at a distance. Medical images are acquired from one location and are transmitted to one or more distant sites where they are displayed for a diagnosis. Timely availability of medical images via a variety of communication networks is one of the primary goals of teleradiology. In this paper, we propose a medical image compression that can be effectively utilized in the teleradiology system using low-bit rate communication networks. For this purpose, we make use of regions of interest (ROIs) that may be clinically important in medical images. Our study shows that our proposed compression method can reduce the transmission time significantly if the ratio of ROI in the image is small. For example, if the twenty percent of an image belongs to ROI, (ROI ratio equals 0.2), the compression ratio is increased by the scale of about three compared with lossless compression. Accordingly the transmission time is reduced by the same scale. As well, by preserving the clinically important regions, the risk of wrong diagnosis is much reduced compared with lossy compression.
In most medical images, regions of interest (ROIs) that may include clinically important information exist and occupy a small portion of the image. Based on this observation, we present compression methods that can effectively compress medical images with ROIs. They are implemented in a manner that ROIs are reversibly compressed and non-ROI (the region outside of ROIs) is irreversibly compressed. In this paper, we present and analyze the three different compression schemes: a DCT-based compression, a DCT/HINT compression, and a HINT-based compression. These methods compress ROIs by reversible compression and non-ROI by irreversible compression. Our current study shows that compression ratio decreases exponentially as ROI ratio (the portion of ROIs in the image) increases. Also, it showed that RMSE (Root-Mean-Squared Error) is not much dependent upon the ROI ratio. To verify this, we tested seven heart X-ray images, twelve head MR images, ten abdomen CT images, and ten chest CT images. Our experimental results showed that the DCT-based compression is the best among the three proposed methods in terms of compression ratio, algorithm complexity, and quality of a reconstructed image.
Since most medical images are composed of different image characteristics, it has been demonstrated that applying a combined method to various image components can preserve a higher image quality. The major image components are: (a) smooth areas, (b) sharp edges, (c) texture, and (d) noise. In practice, sharp edges and general textures are two main components to be concerned for radiological images compression. A unified perspective of transform coding is reviewed to find out how a high compression ratio can be achieved with a lossy compression technique. Theoretically, the image quality in resolution power is associated with a composed module transfer function (MTF) when an image obtained from x-ray device coupling with a digitization module and processed by a lossy compression. It is very difficult to use a global MTF (or a band of MTF) to represent such a system. In this paper, we concentrate on clinical considerations for various applications in radiological image compression. Three different applications and associated compression strategies are discussed. Based on these compression strategies, we believe that many compression methods are suitable for clinical implementation with some clinical guidance and technical modifications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.