The recognition kernels of all the leading ICR technologies use feature extraction of the
image of a character. Generally, hundreds or thousands of features are extracted from each
character. These features are then investigated using many methods, such as neural network.
All these methods require a large learning set, requiring tens of thousands of samples for
'learning' each character.
CharacTell's ACR (first marketed in our API product JustICR) introduced a new approach,
which can be compared to the microbiology world of DNA. First a 'string' is created from each
character's image. This string can be regarded as a "DNA chain". The DNA chain is made of
sub-strings that can be called "genes".
Based on this "DNA chain, the character is identified. The recognition process of a character
can be thought of as analogous to the problem of identifying the father of a baby from the DNA
chains of many fathers. Each gene is matched with a database of genes that were produced
from the learning set. Each gene is given a certain weight for each possible recognition result.
The number of genes in each DNA chain is less than or equal to 28.
ICR experts feel that nothing can be new in the field of handwriting recognition. Can the character
genes be represented as features? Well, no. The number of possible genes is huge while the
number of features is fixed by the algorithm. Each gene can exist or not, while the features are
generally integers and not Booleans. The only information contained in the genes are if they exist
or not in any given sample and, if they do, their location in the DNA chain.
Now comes the interesting part. The number of samples that is required to teach ACR a new
character, font or handwriting style is extremely low. This means that the privilege or creating
custom classifiers - to improve recognition results or address new business problems - is not
reserved only to the developers of the technology. Now users can do so and expand the scope
of their application of ICR.
CharacTell has implemented ACR in all its products, each time optimizing it to the application
at hand - open notes, data in forms, etc.
Click here for a white paper about ACR.
|