The paper proposes an approach for matching of digitized copies of business documents. This task arises when comparing two versions of the same document, genuine and forgery, to find possible modifications, for example in the banking sector during the conclusion of contracts in paper form to avoid possible fraud. The matching method of two documents based on comparison images of text lines using Variational Autoencoder (VAE) trained on genuine images and calculation Fisher information metric to find modifications. Experiments were conducted on the public Payslips dataset (in French). The results show the high quality and reliability of finding document forgeries and are compared to the results of the method which applies OCR and image matching.
The paper proposes an approach to training a convolutional neural network using information on the level of distortion of input data. The learning process is modified with an additional layer, which is subsequently deleted, so the architecture of the original network does not change. As an example, the LeNet5 architecture network with training data based on the MNIST symbols and a distortion model as Gaussian blur with a variable level of distortion is considered. This approach does not have quality loss of the network and has a significant error-free zone in responses on the test data which is absent in the traditional approach to training. The responses are statistically dependent on the level of input image’s distortions and there is a presence of a strong relationship between them.
This paper considers problems regarding the development of stochastic models consistent with the results of character image recognition in video stream. Assumptions about their structure and properties are formulated for the constructed models. The description of the model components defines the Dirichlet distribution and its generalizations. The parameters of these distributions are determined using statistical estimation methods. The Akaike information criterion is used to rank models. The verification of the agreement of the proposed theoretical distributions to the sample data is carried out.
This paper discusses a task of document recognition on a sequence of video frames. In order to optimize the processing speed an estimation is performed of stability of recognition results obtained from several video frames. Considering identity document (Russian internal passport) recognition on a mobile device it is shown that significant decrease is possible of the number of observations necessary for obtaining precise recognition result.
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.