|  
        
           
            
            |  |  |  |  |  |  |  |  |  |  |   
            |  |  |   |  |   
            | CharacTell's Unique Approach to Handwriting Recognition 
 The recognition kernels of all the leading ICR technologies use feature extraction of the 
image of a character. Generally, 
hundreds or thousands of features are extracted from each character. These features are then 
investigated using many 
methods, such as neural network. All these methods require a large learning set, requiring 
tens of thousands of samples 
for 'learning' each character.
 Up to today, the recognition of lower case letters in the form-processing market was 
considered to be almost unsolvable. 
Human handwriting includes many letters that may have ambiguous meanings: some people write 
the letter "e" or "r" in 
exactly the same way that others write the letter "c" or "v" respectively. Only CharacTell's 
approach can recognize and 
distinguish such problematic characters.
 
 Currently, handwriting recognition products achieve about 85% recognition rate for mixed 
upper- and lowercase letters 
(non-cursive) and about 90% for upper cases only. This means that almost every second word 
requires human correction. 
Meanwhile, research has shown that people find handwriting recognition software to be useful 
only when it succeeds in 
recognizing over 97% of the characters.
 
 CharacTell's JustICR, uses a new approach, which can be compared to the microbiology 
world of DNA. JustICR first creates a 
'string' from the character's image. This string can be regarded as a "DNA chain". The DNA 
chain is made of sub-strings that 
can be called "genes".
 
 JustICR then tries to identify the character. The recognition process of a character is analog 
to the problem of finding the father 
of the baby from the DNA chains. Each gene is matched with a database of genes that were 
produced from the learning set. 
Each gene is given a certain weight for each possible recognition result. The number of genes 
in each DNA chain is less than 
or equal to 28.
 
 ICR experts feel that nothing can be new in the field of handwriting recognition. Can the 
character genes be represented as 
features? Well, no. The number of possible genes is huge while the number of features is fixed 
by the algorithm. Each gene can 
exist or not, while the features are generally integers and not Booleans. The only information 
contained in the genes are if they 
do or do not exist, and their location in the DNA chain.
 
 Now comes the interesting part. The number of samples that are required to teach JustICR a new 
handwriting is extremely low. 
In fact, after teaching one sample for each character we already have a reasonable 
recognition rate.
 
 CharacTell's new product, SoftWriting, makes good use of JustICR's short learning 
curve. When SoftWriting tries to 
recognize an additional document, it uses the learning data from the previous document. Using 
the short learning curve 
behavior, SoftWriting achieves very high recognition quality from the first document it 
recognizes. The algorithm of the first 
document works as follows: after recognizing a small fraction of the document, all the 
recognized words that appear in the 
dictionary are used as the training set for the others. This method can improve the 
recognition rate from 50% per word to 90% 
per word from the first document submitted.
 
 SoftWriting uses several proprietary technologies other than the JustICR's recognition 
capabilities. Unlike most of the other 
recognition engines that turn the images into a black and white image, SoftWriting scans the 
images in gray/color bitmap. 
SoftWriting includes a special algorithm that converts the gray/color images to black and 
white images. This algorithm is 
extremely important because scanning pads with blue pens generally create images of poor 
quality that are difficult to recognize 
after conversion, a proprietary algorithm that analyzes the lines, words and connected 
characters is applied. The recognition 
kernel also uses a dictionary in order to achieve best results.
 |  |   
            |  |  |  |