Comparison of neural network training algorithms for handwriting recognition and scope for optimization.
Recent times have seen the digitization of most text documents. The documents which have been maintained in the form of hard-copies are now being converted into scanned soft copies. However, scanned documents have one small shortcoming-they cannot be edited or viewed in document viewers. As a result of this, the documents whose scanned copies show fringing effects and noise cannot be presented as the original formatted copies of the hard-copy. This has driven the industry towards optical character recognition (OCR) which simply means recognition of characters (alphabets and numerals) from scanned documents or images. Handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs.
Closely observing the capability of human mind in the recognition of handwriting, we find that humans are able to recognize characters even though they might be seeing that style for the first time. This is possible because of their power to visualize parts of the known styles into the unknown character. In this paper we try to depict the same power into machines.
Early offline Character Recognition software required training the program on a specific font before it could be accurately input. Similarly, when inputting handwriting, the program would have to be process the handwriting that would be incredibly time consuming with very low accuracy. The methods used are now relatively static, with only a little bit research going into developing entirely new methods, and most research going into refining existing procedures to make them ever more accurate.
The currently used techniques like histogram and Dynamic Time Warping (DTW) for segmentation and SOM and HMM for recognition give efficient results but at the cost of complexity and high processing time.
The concept proposed here is a solution crafted to perform character recognition of hand-written scripts in English, a language having official status worldwide.
The algorithm used to implement the entire OCR paradigm was based on preprocessing, character recognition and finally display of output. Pre-processing includes image acquisition, binarizing the image and then segmenting it. The following flowchart depicts the algorithm:
[FIGURE 1 OMITTED]
Pre-Processing of Image
The preprocessing of the image before the final detection is carried out using grayscale and threshold. The image is converted into binary and 1 and 0 is assigned to white and black respectively. Any noise with pixel value less than 30 pixels is removed.
The image is cropped such that all the blank space around the image is removed. The cropped image is then scanned horizontally until it reaches the row whose sum of pixel values is 0 (completely black). This row represents the end of the line. The image is clipped at this point and stored in a new variable. This process is continued till the end of the image is not reached. Thus each line of the image is extracted and stored in a matrix.
The principle of line segmentation is implemented for alphabet segmentation. The segmented line is scanned vertically till a column whose sum of pixel value equal to 0 is reached. The image is segmented at this point, stored in a new matrix and the process continues. In this way all the alphabets of the line in a capital letter text are segmented.
For word segmentation, average value of the width of the segmented words of the line is taken. Through trial and error a value equal to 1.5 times the average width of the alphabet was decided as an appropriate threshold for determining the space between two words.
Segmentation of cursive handwriting proved to be a challenge due to the variation of handwriting from person to person. For this reason the segmentation of cursive alphabets was kept to be user dependent. The user was made to write 3 sample characters at the top of the page. From these characters we calculated the ink size, space threshold and average character width.
Character Recognition Algorithm
The basic database which was prepared for the entire character recognition process contained font in standard capital handwriting. This includes font database used in template matching. In case of template matching, the font was used as the database; in neural networks, they were used to train the neural network with different techniques.
The most straight forward technique used is the template matching technique in which the templates of alpha-numerals of various types are iteratively compared to the query alpha- numeral. The objective criteria on which the matching takes place is the correlation coefficient.
Consider two series x(i) and y(i) where i=0,1,2...N-1. The correlation coefficient r is defined as:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)
The algorithm implemented can be listed as follows:
1. A database of alpha numerals of capital handwritten alphabets was created. Each character in the database was sized to 40*30, 26 alphabets and 10 numerals.
2. The input image was then filtered to remove noise and was then converted into a binary image.
3. The segmented character from the input image was then correlated in spatial domain with every character in the database (36 characters).
4. According to the correlation principle, the character from the database which gives the highest correlation coefficient value is the closest match which is selected as the output.
Template Matching has been abbreviated as TM for further use in this paper.
Back-Propagation Feed-Forward Neural Network
The first type of neural network to which the OCR problem was tested for was BPNN which is a feed-forward network. The network was designed to have three hidden layers with 12, 16 and 20 neurons respectively.
The algorithm used can be listed as:
1. Database creation: A database of alpha numerals was created. Each character in the database was sized to 40x30.
2. Target data matrix creation: The target data matrix had to be of size 1*1200. Each element in the row was assigned a unique value for a particular character.
3. The network was simulated using segmented characters from an input image.
Learning Vector Quantization
The LVQ method was used to classify the training data into clusters that aided in the recognition process while simulating the neural network.
The LVQ was tested with bit-mapping, Fan-beam transformation and Radon transformation. For LVQ the following abbreviations have been used: LVQ with bit mapping LVQ-B; LVQ with Fan-bean LVQ-F; LVQ with Radon LVQ-R.
Linear Vector Quantization using Bit Mapping: In Bit-mapping LVQ the segmented characters are directly fed as an input to Neural Network for training.
The algorithm can be stated as:
1. Data base creation: A database of alpha-numerals was created. Each character in the database was sized to 40*30.
2. Target data matrix creation: The target data matrix had to be of size 40*30. The numbers of columns were made equal to total characters in the database i.e. 36 as in Bit-mapping LVQ the input image is converted into corresponding matrix. Target data matrix changes for LVQ-F and LVQ-R which is as described below.
3. The output is in terms of values 1 to 36; denoting the alpha numerals.
Linear Vector Quantization using Fan-beam Transformation: This method produces Fan-beam projections of the database as Target data matrix. Likewise Fanbeam projections of segmented characters of the input image are provided as input to the neural network for recognition.
Linear Vector Quantization using Radon Transformation: This method performs Radon transform on the database to produce Target data matrix. Likewise Radon projections of segmented characters of the input image are provided as input to the neural network for recognition.
Selection of Output
It was observed that though the networks with different training algorithms and database gave nearly same efficiencies the correctly recognized characters of each network were to some extent different. We developed a Hybrid algorithm which exploited this difference of the networks.
A robust hybrid algorithm has been suggested here with the help of which the best recognized character is selected as the final output.
Here a pool of differently trained networks is put together. The networks are trained using LVQ algorithm with Radon and Fan-beam projections. Each of the networks is trained with a different database.
The input character is passed through each of the networks and the output is separately obtained. The character recognized most number of times is given as the output. In case of a conflict template matching is done of the input character with the contestant outputs and the one with higher correlation coefficient is given as the output.
The LVQ training method along with Hybrid algorithm is abbreviated as LVQ-H.
The accuracy of segmentation was found to vary with style of writing.
The Accuracy of segmentation can be summarized as:
The techniques of template matching and neural network were applied to the various random texts and the results were compared. Back Propagation Neural Network (BPNN) was found to give the most incorrect recognitions.
The results were compared on the basis of Percentage Accuracy (PA). PA discusses the percentage of correctly recognized alphabets in the given input. The database was tested for 20 test images with around 300 characters (plain text as well as cursive) and the cumulative character-wise accuracy are tabulated for comparison.
The PA can be summarized as:
Another important parameter is Training Time. It refers to the time required to train a network using a particular algorithm. The training time measured for various algorithms have been taken as a percentage of time taken by BPNN, since it requires maximum time to train.
Segmentation and Spacing Errors
1. Errors in segmentation show a high dependence on the handwriting of the user.
2. The segmentation errors for cursive handwriting were higher than that of plain text handwriting. This is primarily because the cursive handwriting have typical indentations at the edges and hence the spacing between the subsequent alpha-numerals changes.
Target Sequence Error
In this type of error, instead of detecting the alphabet correctly, an alphabet that was placed on either side of the given alphabet in the database is detected.
It occurs only in the Back-Propagation Feed Forward Networks, as the mapping in BPNN is non-linear and not clustering based; the errors are independent of the features or topology of the image and are only dependent on the physical proximity of the input in the database.
Feature Based Error
This error involves recognition of the characters with their similar looking counter parts like 'O' and 'C' or 'H' and 'N'. It occurs in most of the methods including template matching and LVQ based Feed Forward Neural Network.
In template matching, correlation coefficient is found out for the input matrix and the template from database
The LVQ based Feed-Forward Neural Network involves clustering of the inputs based on their physical proximity at the training epoch. Thus it formulates a NN which clusters these inputs. Thus it is very likely that the clustering of the Characters may go wrong due to feature based proximity of a few alpha-numerals.
Database Related Errors
The errors were also found out to be dependent on the type of the database.
The database used must be a very generalized so that there will not be any user dependency.
This particular error was resolved with the help of the hybrid algorithm which took into account different database before finalizing the output.
In Bit-mapping LVQ the segmented characters are directly fed as an input to Neural Network for training. Since the segmented characters are not processed, it yields poor results.
Radon projections yielded acceptable results when used with LVQ based NN as in Radon the segmented characters are feature extracted by means of projection. But the results were not as good for Radon transform because of the parallel line projection parallax problem.
In Fan-beam LVQ NN the segmented characters undergoes feature extraction by means of conical projection. Thus better feature extraction is possible as compared to Radon.
In Radon and Fan-beam the number of elements of the image matrix are increased as compared to Bit-mapping due to projection method, while in Bitmapping size of the matrix remains same as the image. The number of elements depends on the distance in pixels from the vertex to the center of rotation and rotation angle.
Advantage of Hybrid Algorithm
1. It was observed that the use of more than one type of network for recognition greatly improved the efficiency. We created six LVQ networks, three each of Radon and Fan-beam with three different databases.
2. Among these databases one was standard block handwriting while one database was cursive handwriting.
3. It was noted that out of six on an average around four networks gave correct recognition which clearly depicts that if the networks were individually tested the results would have suffered.
4. The only drawback of this algorithm was that the processing time increased six times since the character had to go through six different networks before the final output could be given.
Training and Processing Time Analysis
The training time was found to vary depending upon the training parameters for any given method.
BPNN takes the maximum time to train. This is because of more number of hidden layers which creates a large number of inter connections between the neurons.
LVQ bit mapping takes comparatively lesser time to train which is due to the lesser number of elements in the input.
LVQ Radon and Fan-beam took more time to train owing to the greater number of elements created due to the projection method.
Thus, there is a tradeoff between bit mapping and projection LVQ methods in terms of efficiency and training time.
It is important to note that the BPNN yields particularly inaccurate results and has high processing times and hence must be avoided.
The processing time for template matching is extremely low because of direct correlation calculated between the character and the database.
The average calculation speed for calculating the Radon Transform is higher compared to the rate of evaluation of Fan-beam projection of any character
Bit-mapping based techniques proved to be the fastest techniques as the procedure for calculation of transform is absent
Although the hybrid algorithm has the highest processing time it is the most efficient method and hence the method should be applied only when the primary concern is accuracy. For instances, where time is of concern LVQ with Fan-beam is the best solution.
 "Offline Handwriting Recognition using Genetic Algorithm "ShashankMathur; VaibhavAggarwal; Himanshu Joshi; Anil Ahlawat Sixth International Conference on Information Research and Applications -i.Tech 2008, Varna, Bulgaria, June-July 2008.
 "A Real-time DSP-Based Optical Character Recognition System for Isolated Arabic characters using the TI TMS320C6416T" HaidarAlmohri, John S. Gray, HishamAlnajjar, University of Hartford, Proceedings of The 2008 IAJC-IJME International Conference ISBN 978-1-60643-379-9
 "Optical Character Recognition for printed Tamil text using Unicode" Seethalakshmi R, Sreeranjani T.R., Balachandar T.,Abnikant Singh, Markandey Singh, RitwajRatan, Sarvesh Kumar,(Shanmugha Arts Science Technology and Research Academy,Thirumalaisamudram, Thanjavur, Tamil Nadu, India), September 2005.
 "Word base line detection in handwritten text recognition systems" Kamil R. Aida-zade and Jamaladdin Z. Hasanov
 "Off-Line Cursive Handwritten Tamil Character Recognition" R. Jagadeesh Kannan And R. Prabhakar, Rmk Engineering College, Chennai ,India
 "An Offline System for Handwritten Signature Recognition" Ioana Barbantan; Camelia Vidrighin; Raluca Borca, Technical University of Cluj Napoca
 "A New Segmentation Algorithm for Online Handwritten Word Recognition Persian Script" Sara Izadi; Mehdi Haji; Ching Y. Suen ,Centre for Patternin Recognition and Machine Intelligence, 1455 de Maisonneuve Blvd. West,Montral, Qubec, Canada, H3G 1M8
Rishu Singh, Shaadab Zafar, Amit Ghodake Sanjay Gandhe and Neha Vartak
Sardar Patel Institute of Technology, India
Table1: Segmentation Accuracy Segmentation Type Accuracy (%) Plain Text (Capital) 98 Cursive (Capital) 94 Cursive (Small) 82 Table 2: Percentage Accuracy Training Algorithm Plain Text (%) Cursive (%) Overall PA TM 62 54 59 BPNN 38 26 34 LVQ-B 62 58 61 LVQ-F 74 62 70 LVQ-R 68 60 65 LVQ-H 84 78 82
|Printer friendly Cite/link Email Feedback|
|Author:||Singh, Rishu; Zafar, Shaadab; Ghodake, Amit; Gandhe, Sanjay; Vartak, Neha|
|Publication:||International Journal of Computational Intelligence Research|
|Date:||Jul 1, 2011|
|Previous Article:||On genetic algorithm and multiple preprocessors assisted feature boosting for electronic nose signal processing.|
|Next Article:||(n,t)-closeness with faster checking algorithm.|