PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Hundreds of experiments over the last decade on the retrieval of OCR documents performed by the Information Science Research Institute have shown that OCR errors do not significantly affect retrievability. We extend those results to show that in the case of proximity searching, the removal of running headers and footers from OCR text will not improve retrievability for such searches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Logical structure extraction of book documents is significant in electronic document database automatic construction. The tables of contents in a book play an important role in representing the overall logical structure and reference information of the book documents. In this paper, a new method is proposed to extract the hierarchical logical structure of book documents, in addition to the reference information, by combining spatial and semantic information of the tables of contents in a book. Experimental results obtained from testing on various book documents demonstrate the effectiveness and robustness of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical journals into MEDLINE, the premier bibliographic citation database at NLM. Currently, a rule-based algorithm (called ZoneCzar) is used for labeling important bibliographical fields (title, author, affiliation, and abstract) on medical journal article page images. While rules have been created for medical journals with regular layout types, new rules have to be manually created for any input journals with arbitrary or new layout types. Therefore, it is of interest to label any journal articles independent of their layout styles. In this paper, we first describe a system (called ZoneMatch) for automated generation of crucial geometric and non-geometric features of important bibliographical fields based on string-matching and clustering techniques. The rule based algorithm is then modified to use these features to perform style-independent labeling. We then describe a performance evaluation method for quantitatively evaluating our algorithm and characterizing its error distributions. Experimental results show that the labeling performance of the rule-based algorithm is significantly improved when the generated features are used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article we describe the approach taken by the first
web search engines, discuss the state of the art, and present
some of the challenges for the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Retrieving document sby subject matter is the general goal of information retrieval and othe rcontent access systems. There are aspects of textual content, however, which form equally valid election critiria. One such aspect is that of sentiment or polarity - indicating the author's opinion or emotional relationship with some topic. Recent work in this are has treated polarity effectively as a discrete aspect of text. In this paper we present a lightweight but robust approach to combining topic and polarity thus enabling content access systems to select content based on a certain opinion about a certain topic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a decision tree based adaptive binarization method for text retrieval in color document images. This method extends Ni-Black windowed thresholding technique and hue (H), saturation (S) and value (V) are employed. First, an observation window is retrieved, and based on standard deviation of H, S and V, a pre-defined decision tree is used for selecting proper variables that should be employed. Secondly, Karhunen-Loeve Transform (KLT) is used for eliminating correlation and reducing dimension. Finally, center point of the window is classified based on 2-D standard normal distribution. The result shows that our binarization method generates better result than Ni-Black and other global thresholding binarization method such as Otsu’s in color document images. A comparison using a commercial OCR system shows that our method can be used in various situations for high quality text retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing word image retrieval algorithms suffer from either low retrieval precision or high computation complexity. We present an effective and efficient approach for word image matching by using gradient-based binary features. Experiments over a large database of handwritten word images show that the proposed approach consistently outperforms the existing best handwritten word image retrieval algorithm. Dynamic Time Warping (DTW) with profile-based shape features. Not only does the proposed approach have much higher retrieval accuracy, but also it is 893 times faster than DTW.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to overcome poor readability of text and recognizability of image features in low resolution thumbnails, a novel image representation of compound document images - a SmartNail representation - is presented. SmartNails are replacements or supplements to traditional thumbnails for compound documents and contain cropped and scaled image and text segments. Image- and text-based analysis are merged to generate a layout for a particular display size with selected readable text and recognizable image regions. The analysis is efficiently performed by using information from document layout analysis and JPEG 2000 compressed file headers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Slide identification is very important when creating e-Learning materials as it detects slides being changed during lecture movies. Simply detecting the change would not be enough for e-Learning purposes. Because, which slide is now displayed in the frame is also important for creating e-Learning materials. A matching technique combined with a presentation file containing answer information is very useful in identifying slides in a movie frame. We propose two methods for slide identification in this paper. The first is character-based, which uses the relationship between the character code and its coordinates. The other is image-based, which uses normalized correlation and dynamic programming. We used actual movies to evaluate the performance of these methods, both independently and in combination, and the experimental results revealed that they are very effective in identifying slides in lecture movies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of multimedia meeting recordings and analysis, we introduce a new kind of multimedia alignment, which aims at reunifying documents with all kind of temporal media. The alignment proposed in this article uses the similarities that exist between the documents’ content and the speech transcript’s content in order to provide temporal indexes to printable documents. Several document content alignment strategies are discussed in this article and evaluated at various levels of granularity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a block adaptive binarization (BAB) using a modified quadratic filter (MQF) to binarize business card images of ill conditions acquired by personal digital assistant (PDA) cameras. In the proposed method, a business card image is first partitioned into blocks of 8×8 and the blocks are then classified into character blocks (CBs) and background blocks (BBs) for locally adaptive processing. Each CB is windowed with 24×24 rectangular window
centering around the CB and the windowed blocks are improved by the preprocessing filter MQF, in which the scheme of threshold selection in QF is modified. The 8×8 center block of the improved block is binarized with the threshold. A binary image is obtained tiling each binarized block in its original position. Experimental results show that the quality of binary images obtained by the proposed method is much better than that by the conventional global binarization (GB)
using QF. In addition, the proposed method yields about 43% improvement of character recognition rate over the GB using QF.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Symbolic Indirect Correlation (SIC) is a new classification method for unsegmented patterns. SIC requires two levels of comparisons. First, the feature sequences from an unknown query signal and a known multi-pattern reference signal are matched. Then, the order of the matched features is compared with the order of matches between every lexicon symbol-string and the reference string in the lexical domain. The query is classified according to the best matching lexicon string in the second comparison. Accuracy increases as classified feature-and-symbol strings are added to the reference string.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Guiding a recognition task using a language model is commonly accepted as having a positive effect on accuracy and is routinely used in automated speech processing. This paper presents a quantitative study of the impact of the use of word models in online handwriting recognition applied to form-filling tasks on handheld devices. Two types of word models are considered: a dictionary, typically from few thousands and up to hundred-thousand words; and a grammar or regular expression generating a language several orders of magnitude bigger than the dictionary. It is reported that the improvement in accuracy obtained by the use of a grammar compares with the gain provided by the use of a dictionary. Finally, the impact of the word models on user acceptance of online handwriting recognition in a specific form-filling application is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Using handwritten characters we address two questions (i) what is the group identification performance of different alphabets (upper and lower case) and (ii) what are the best characters for the verification task (same writer/different writer discrimination) knowing demographic information about the writer such as ethnicity, age or sex. The Bhattacharya distance is used to rank different characters by their group discriminatory power and the k-nn classifier to measure the individual performance of characters for group identification. Given the tasks of identifying the correct gender/age/ethnicity or handedness, the accumulated performance of characters varies between 65% and 85%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k-Nearest-Neighbor (k-NN)) are used to identify different scripts at the word level (glyphs separated by white space). These three classifiers are applied to a variety of bilingual dictionaries and their performance is compared. Experimental results show the capability of Gabor filter to capture script features and the effectiveness of these three classifiers for script identification at the word level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text segmentation plays a crucial role in a text recognition system. A comprehensive method is proposed to solve Tibetan/English text segmentation. 2 algorithms based on Tibetan inter-syllabic tshegs and discirminant function, respectively, are presented to perform skew detection before text line separation. Then a dynamic recursive character segmentation algorithm integrating multi-level information is developed. The encouraging experimental results on a large-scale Tibetan/English mixed text set show the validity of proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a general framework for character segmentation in complex multilingual documents, which is an endeavor to combine the traditionally separated segmentation and recognition processes into a cooperative system. The framework contains three basic steps: Dissection, Local Optimization and Global Optimization, which are designed to fuse various properties of the segmentation hypotheses hierarchically into a composite evaluation to decide the final recognition results. Experimental results show that this framework is general enough to be applied in variety of documents.
A sample system based on this framework to recognize Chinese, Japanese and Korean documents and experimental performance is reported finally.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tibetan optical character recognition (OCR) system plays a crucial role in the Chinese multi-language information processing system. This paper proposed a new statistical method to perform multi-font printed Tibetan/English character recognition. A robust Tibetan character recognition kernel is elaborately designed. Incorporating with previous English character recognition techniques, the recognition accuracy on a test set containing 206,100 multi-font printed characters reaches 99.67%, which shows the validity of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To take care of variability involved in the writing style of different individuals, a scheme for off-line Oriya isolated handwritten numeral recognition is presented here. Oriya is a popular script in India. The scheme is mainly based on features obtained from water reservoir concept as well as topological and structural features of the numerals. Reservoir based features like number of reservoirs, their size, heights and positions, water flow direction, topological feature like number of loops, centre of gravity positions of loops, the ratio of reservoir/loop height to the numeral height, profile based features, features based on jump discontinuity etc. are some of the features used in the recognition scheme. The proposed scheme is tested on 3550 data collected from different individuals of various background and we obtained an overall recognition accuracy of about 97.74%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a concise definition of the skew angle of document, based on mathematical morphology. This definition has the advantages to be applicable both for binary and grey-scale images. We then discuss various possible implementations of this definition, and show that results we obtain are comparable to those of other existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an efficient algorithm for inverse halftoning of scanned color document images to resolve problems with interference patterns such as moire and graininess when the images are displayed or printed out. The algorithm is suitable for software implementation and useful for high quality printing or display of scanned document images delivered via networks from unknown scanners. A multi-resolution approach is used to achieve practical processing speed under software implementation. Through data-driven, adaptive, multi-scale processing, the algorithm can cope with a variety of input devices and requires no information on the halftoning method or properties (such as coefficients in dither matrices, filter coefficients of error diffusion kernels, screen angles, or dot frequencies). Effectiveness of the new algorithm is demonstrated through real examples of scanned color document images, as well as quantitative evaluations with synthetic data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic understanding of document images is a hard problem. Here we consider a sub-problem, automatically extracting content from filled form images. Without pre-selected templates or sophisticated structural/semantic analysis, we propose a novel approach based on clustering the component-block-projection-vectors. By combining spectral clustering and minimal spanning tree clustering, we generate highly accurate clusters, from which the adaptive templates are constructed to extract the filled-in content. Our experiments show this approach is effective for a set of 1040 US IRS tax form images belonging to 208 types.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.