Printer Friendly
The Free Library
14,670,786 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

ABBYY Announces Fine Reader Engine 7.0 SDK.


Business Editors/High-Tech Writers

FREMONT, Calif.--(BUSINESS WIRE)--Sept. 17, 2003

Next Generation Toolkit Adds Extended Functionality to Address

Traditional Vertical Markets As Well As Specialized Data and

Document Capture Projects

ABBYY(R), a leading developer of document recognition and linguistic technologies, today announced ABBYY FineReader Engine 7.0, the next generation of the company's Software Development Kit (SDK (Software Developer's Kit) See developer's toolkit and Windows SDK.

SDK - Software Developers Kit (or "Software Development Kit").
) for integrating ABBYY OCR OCR
 in full optical character recognition

Scanning and comparison technique intended to identify printed text or numerical data. It avoids the need to retype already printed material for data entry.
, ICR (Intelligent Character Recognition or Image Character Recognition) The machine recognition of hand-printed characters as well as machine printing that is difficult to recognize. , OMR (Optical Mark Reader) A scanner that reads marks on specific areas of the page. See mark sensing.

OMR - Optical Mark Reader
 and barcode recognition technologies into Windows applications.

With the announcement of FineReader Engine 7.0, ABBYY dramatically expands the reach of its technology from traditional vertical markets (including finance, government and healthcare), by offering extended functionalities to address such niche projects as library archiving and Chinese and Japanese recognition. In addition to core platform enhancements to overall accuracy, document analysis, and export functions, FineReader Engine 7.0 adds sophisticated new modules for recognition of ancient and historical texts, PDF files, invoices, barcodes, and Asian characters.

"ABBYY's goal is to deliver recognition technologies that help organizations transform documents into manageable data that can be processed, searched, indexed, edited, sent, or tabulated. As recognition technologies become more advanced, the true technological challenge in achieving this lies in the ability to address specialized texts and document formats," explained Vadim Tereshchenko, FineReader division vice president. "With FineReader 7.0, we offer new add-on modules offering technology breakthroughs that expand the ability of our software to perform in key vertical markets, as well as niche markets with specific conversion needs."

With the release of ABBYY FineReader Engine 7.0, developers will gain access to the powerful functionality of a high-level OCR system which is already being used by many leading companies worldwide such as Cardiff, Kofax, Lexmark, Panasonic, Toshiba, and ZyLab. ABBYY FineReader, ABBYY's flagship OCR application based on the FineReader Engine, has won more than 100 awards worldwide since 1998.

Platform Enhancements in 7.0

FineReader Engine 7.0 is based on an entirely new recognition platform that offers the following enhancements:

Recognition Accuracy

Enhancements to ABBYY's proprietary IPA IPA - International Phonetic Alphabet  Technology and other tools for fine-tuning recognition increase FineReader's accuracy significantly over previous versions. A major contributor to the overall letter, word, and line recognition accuracy is the addition of new structural character models. In addition, new image preprocessing A preliminary processing of data in order to prepare it for the primary processing or for further analysis. The term can be applied to any first or preparatory processing stage when there are several steps required to prepare data for the user.  algorithms increase the technology's ability to read documents that have text printed over an image, low-contrast documents, and poorly scanned pages. These improvements in accuracy are possible due to further enhancements of the two image preprocessing technologies that aid in recognizing this kind of text: Adaptive Binarization and Intelligent Background Removal. Adaptive Binarization uses a "dynamic" or "intelligent threshold" technique, which tunes the image contrast line by line and word by word, optimizing the characters' quality in order to achieve the most accurate recognition results. Intelligent background removal removes textures and background "noise" even on complex or degraded documents that could interfere with the recognition of text properly.

Improved Document and Image Analysis

FineReader Engine 7.0 offers a new algorithm, Multilevel Document Analysis, (MDA (1) (Monochrome Display Adapter) The first IBM PC monochrome video display standard for text. Due to its lack of graphics, MDA cards were often replaced with Hercules cards, which provided both text and graphics. See PC display modes and Hercules Graphics. ), a process that examines the document at various levels -- from characters to words, lines, and paragraphs. Ultimately, FineReader Engine reconstructs the entire document. With this sophisticated document and image analysis algorithm, FineReader Engine "understands" each formatting element on a document. As a result, applications developed using the FineReader Engine will be able to retain complex layouts, such as placement of images and columns on the page, formatting of tables, and font sizing. Other key benefits include improved recognition accuracy of complex tables, multiple-column documents with images, HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
 formatting, and bullet points.

New Export and Synthesis Capabilities

ABBYY FineReader 7.0 also delivers significant improvements in export and synthesis, which include:

-- Improved PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format.  Export. FineReader Engine now creates linearized

PDF files that are optimized for publishing on the Web.

-- Improved WYSIWYG (What You See Is What You Get) Pronounced "wiz-ee-wig." It refers to displaying text and graphics on screen the same as they will print on paper or display on a Web page.  HTML Output

The retention of complex formatting elements (like text flowing around non-rectangular images) has been improved in HTML. The resulting HTML files are now smaller in size, which is particularly important for documents published on the Internet.

-- Output to Microsoft PowerPoint

-- Smaller file sizes when exporting results to Microsoft Word

New Image Input Formats:

FineReader supports image input of JPEG JPEG
 in full Joint Photographic Experts Group

Standard computer file format for storing graphic images in a compressed form for general use. JPEG images are compressed using a mathematical algorithm.
 2000 files.

Extended Functionality with New Add-On Modules

With the development of FineReader Engine 7.0, ABBYY focuses on fine-tuning its technology to deliver special features and functions that address niche applications. FineReader Engine's add-on modules offer specialized functionality to support software developers, system integrators and VARs working with specific types of projects, documents or files. FineReader Engine 7.0 add-on modules include:

1. PDF Opening

ABBYY FineReader Engine 7.0 uses an intelligent opening scheme for PDF documents. FineReader Engine 7.0 now recognizes PDFs in the following manner: it first extracts the text layer from the PDF file, then takes the original image from the same PDF and performs standard recognition, and finally compares the recognition results against the extracted text. This approach ensures higher recognition accuracy, particularly with PDF documents that have unusual encoded underlying text.

2. FineReader XIX: Fraktur/ Black Letter Script Recognition

FineReader 7.0 offers the industry's first omnifont OCR solution for "Fraktur frak·tur  
n.
A style of black letter formerly used in German manuscripts and printing.



[German, from Latin fr
" or "Black Letter" prints used in ancient texts from the 19th and 20th centuries. FineReader will recognize elaborate, calligraphic-type prints as well as old-style roman-type characters, such as the elongated e·lon·gate  
tr. & intr.v. e·lon·gat·ed, e·lon·gat·ing, e·lon·gates
To make or grow longer.

adj. or elongated
1. Made longer; extended.

2. Having more length than width; slender.
 "s" used in early English or French texts. This feature, developed together with the European METAe archiving project, has been tested by leading universities. Well-suited for archiving a variety of old books and documents, FineReader XIX module includes dictionaries to support German, English, French, Italian, and Spanish.

3. Extended XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
 Output Module

The Extended XML Output module exports recognition results tagged with document structure information, including the location of graphics, tables, paragraphs and even characters, as well as the full formatting information about characters, paragraphs and tables. Post-recognition processing makes it easy to export this information to external applications, such as document management and content management systems and databases (MS SQL Server See Microsoft SQL Server. , Oracle or MS SharePoint). XML output is offered in the following formats:

-- Native XML (includes all information of the recognized

document)

-- Microsoft Word XML. Recognized files can be exported

recognized as native XML files using Microsoft Word 2003

defined schema.

-- ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers.  XML Output. A special ASCII XML Output module has been

designed for DMS (1) (Document Management System) See document management.

(2) (Defense Messaging System) An X.500-compliant messaging system developed by the U.S. Dept. of Defense.
 and archiving applications. Resulting files

contain information about character positions and character

confidence levels and can be easily indexed. Automatically

eliminates those parts of text which have a low confidence

level.

4. Chinese and Japanese Recognition

ABBYY FineReader Engine 7.0 now has an add-on module for Chinese (Traditional, and Simplified) and Japanese (Hiragana (Japanese) hiragana - The cursive formed Japanese kana syllabary. Hiragana is mostly used for grammatical particles, verb-inflection, and Japanese words which are not written in kanji or which are too difficult for an educated person to read or write in kanji. , Katakana (Japanese) katakana - The square-formed Japanese kana syllabary. Katakana is mostly used to write foreign names, foreign words, and loan words as well as many onomatopeia, plant and animal names.  and Kanji) OCR. Seamlessly integrated with the core engine, this module allows developers to use FineReader Engine's existing API (Application Programming Interface) to execute recognition for Chinese and Japanese texts. Functions include: recognition of multi-language documents (Chinese-English and Japanese-English texts), automatic recognition of vertical and horizontal texts, automatic detection of text blocks, tables, columns and pictures on a document, manual drawing of recognition zones, detailed information about recognized characters, and export of recognized text into RTF (Rich Text Format) A document format from Microsoft for encoding text and graphics. It was adapted from IBM's DCA format and supports ANSI, IBM PC and Macintosh character sets. , XML, HTML, TXT TXT Text
TXT Text File (filename extension)
TXT Textile
TXT Teletext
TXT Tecnologia per a Tothom
TXT Textron Corporation (stock symbol) 
, CSV (1) (Comma Separated Value) Same as comma delimited.

(2) (Computer System Validation) See software validation.

CSV - comma separated values
, and DBF file formats. Companies with Pan-pacific conversion projects can be benefit from this module.

5. Document Analysis for Invoices

A special OCR module developed for the financial and banking market segments, Document Analysis for Invoices can be used as a pre-processing engine for the conversion of semi-structured documents such as invoices, payment drafts, checks and transfers. In this pre-processing role, the module is designed to find as much text on these documents as possible, including characters and numbers -- even if this information is located within stamps, logos or small text areas.

In contrast to standard OCR, this specialized OCR module assumes all printed information on documents is text and ensures that important text information is not incorrectly identified as graphic elements and that words or numerical values are not separated into multiple characters. As a result, a maximum of textual information is obtained from the document, including the coordinates, and is available for analysis, field-by-field processing and parsing, which are performed at subsequent processing stages by other systems.

6. OMR (Optical Mark Recognition) Module

The Optical Mark Recognition (OMR) module recognizes simple check marks, grouped check marks, model check marks and check marks with "corrections" made by hand.

7. 2D Barcode Recognition Module (PDF417)

The 2D barcode module recognizes PDF417, the industry standard for 2D barcodes. It is ideal for recognizing and categorizing product labels, and packages. PDF417 encodes up to 1.1 kilobytes of data, including text and graphics information.

Specifications

The FineReader Engine SDK consists of a set of DLLs (Dynamic Link Libraries) and an API that conforms to the COM (1) (Computer Output Microfilm) Creating microfilm or microfiche from the computer. A COM machine receives print-image output from the computer either online or via tape or disk and creates a film image of each page.  (Component Object Model) standard and is easily accessed with Visual Studio.NET, C/C C/C Center to Center
C/C Combustion Chamber
C/C Command/Control
C/C Crew Chief
C/C cabin cruiser (US DoD)
C/C chief complaint (medical)
C/C Channel-to-Channel
C/C Communication and Collaboration
++, Visual Basic or any other development tool supporting COM components. The FineReader Engine offers complete access to low-level OCR/ICR/OMR/barcode functionality and does not require a graphical user interface graphical user interface (GUI)

Computer display format that allows the user to select commands, call up files, start programs, and do other routine tasks by using a mouse to point to pictorial symbols (icons) or lists of menu choices on the screen as opposed to having to
. FineReader Engine 7.0 is backward compatible with version 6.0.

ABBYY also offers a version of its OCR development tool kit for the Linux platform. FineReader Engine software development kit for the Linux platform supports a Linux-based programming and operating environment and provides access to ABBYY OCR functionality through an application programming interface (API) and via the Command Line interface.

Trial Version

ABBYY offers a free, 60-day fully functional trial version of ABBYY FineReader Engine 7.0 to allow prospective customers the ability to test Engine 7.0 under real working conditions without any limitation of functionality. To obtain an evaluation copy, please contact an ABBYY salesperson at www.abbyyusa.com.

Pricing and Availability

ABBYY FineReader Engine 7.0 will be available towards the end of 2003. ABBYY offers flexible pricing options that allow developers to select the type of licensing model that is best suited to their product and sales strategy. For additional product information, visit ABBYY's website at http://www.abbyy.com

About ABBYY USA

ABBYY USA is a member of the ABBYY Software House Group. ABBYY specializes in the development of software for optical character recognition optical character recognition (OCR), method for the machine-reading of typeset, typed, and, in some cases, hand-printed letters, numbers, and symbols using optical sensing and a computer.  (OCR), intelligent character recognition In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) system that allows fonts and different styles of hand writing to be learned by a computer during processing to improve accuracy and recognition levels.  (ICR), linguistics, semantics, and electronic lexicography lexicography, the applied study of the meaning, evolution, and function of the vocabulary units of a language for the purpose of compilation in book form—in short, the process of dictionary making. Early lexicography, practiced from the 7th cent. B.C. . Leading products from ABBYY include the FineReader line of OCR, ICR and barcode software, and FineReader development tools. ABBYY OCR/ICR technologies are licensed by leading companies worldwide. For more information about ABBYY, visit: www.abbbyyusa.com or contact ABBYY USA, 3823 Spinnaker Ct., Fremont, CA 94538. Phone: (510) 226-6717. Fax: (510)226-6069. E-mail: info@abbyyusa.com .
COPYRIGHT 2003 Business Wire
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Publication:Business Wire
Date:Sep 17, 2003
Words:1741
Previous Article:Marcus A. New, CEO of Stockgroup Information Systems Inc., Talks With Traders Nation About Current Company Events.
Next Article:Tapwave Licenses Fathammer's X-Forge 3D Game Development Kit As Preferred Game Development Tools for the Tapwave Zodiac Platform.



Related Articles
Pliable Display Technology SDK 2.1 and PDT 3D SDK released by IDELIX.
Internet Scanner 7.0, RealSecure Server 7.0 and Desktop 7.0.(frlm Internet Security Systems)
Scansoft ships Dragon NaturallySpeaking SDK server edition.(Software Development Kit )
Prizes & giveaways.(Association of Records Managers & Administrators 2004 Expo)
Trial version of new software.(What's New: Looking for higher-education and technology products and services? Start here.)
Showa Denko K.K. (SDK) Heat Exchangers Adopted in Honda's New Civic Models.
Showa Denko K.K. (SDK) Maps Out New Business Plan 'PASSION Project'.
Showa Denko K.K. (SDK) Develops World's Largest Sintered Compacts of CBN.
Showa Denko K.K. (SDK) to Increase Production Capacity of VGCF Carbon Nanofibers.
Showa Denko K.K. (SDK) Revises Performance Forecast for 2006.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles