The paper describes the use of Conditional Random Fields(CRF) utilizing contextual information in automatically
labeling extracted segments of scanned documents as Machine-print, Handwriting and Noise. The result of
such a labeling can serve as an indexing step for a context-based image retrieval system or a bio-metric signature
verification system. A simple region growing algorithm is first used to segment the document into a number of
patches. A label for each such segmented patch is inferred using a CRF model. The model is flexible enough
to include signatures as a type of handwriting and isolate it from machine-print and noise. The robustness of
the model is due to the inherent nature of modeling neighboring spatial dependencies in the labels as well as
the observed data using CRF. Maximum pseudo-likelihood estimates for the parameters of the CRF model are
learnt using conjugate gradient descent. Inference of labels is done by computing the probability of the labels
under the model with Gibbs sampling. Experimental results show that this approach provides for 95.75% of the
data being assigned correct labels. The CRF based model is shown to be superior to Neural Networks and Naive
Bayes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.