CambridgeDocs Announces Version 2.01 of the PDF-XML Converter; Integration of PDF-XML Converter into xDoc Converter platform allows unlocking PDF content in batch and server modes.BOSTON -- CambridgeDocs Corp. (www.cambridgedocs.com) today announced the release of Version 2.01 of its xDoc PDF-XML Converter and integration of it into its xDoc Converter Desktop and Server products, significantly enhancing an already powerful platform for extracting document content to meaningful XML XML in full Extensible Markup Language. Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. . The PDF file See PDF. format is widely used because it combines content security with high-fidelity document rendering. Its drawback is that the very same mechanisms that protect PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format. source content also result in it being exceptionally difficult to update, index or share with other document systems as anything other then closed, un-interpretable PDF files. CambridgeDocs' PDF-XML Converter overcomes these issues by enabling PDF content to be converted to XML. As XML, the previously PDF content can be meaningfully used for indexing by search engines, XML repositories and content management systems -- for example allowing it to be stored as chapters, sections, tables or cells within any repository for fast, easy and accurate re-use. "Integration of the PDF-XML Converter into the java based xDoc Converter Server gives us a previously un-reachable degree of flexibility in managing our clients PDF content," says Spencer Ewald, President of NXTBook Media NXTbook Media is a digital publishing company that provides the creation, distribution, and tracking of digital magazines and other digital publications. NXTbooks use Adobe Flash and XML to display digital publications both online and offline. . "As we extend our client's reach into their customer base by presenting their content with the look and feel of actual magazine and catalog pages, we are now able to provide high-lighted search results within the actual images of the original content, combining meaningful access to PDF data as well as the layout the designers originally had in mind." The xDoc PDF-XML Converter extracts PDF content to XML and provides best-of-breed functionality for enabling conversion that yields: --Stylistic XML, including format, layout and content information --Extraction of financial data --Organization of related XML "chunks", such as financial tables --Compatibility with existing target XML schemas or DTD's, such as Docbook or DITA --Conversion to HTML/XHTML, with visual information than surpasses even Google's "view as HTML HTML in full HyperText Markup Language Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. " functionality --Conversion to simple text Version 2.01 adds the PDF-XML Converter as a special module in the xDoc Converter 2.01 platform and includes sample conversions of PDF documents into a variety of XML formats, such as Docbook and DITA. The release also adds a new and improved user interface, called the TableDef interface for extracting financial data using positioning and textual clues. The integration of the PDF-XML Converter into the xDoc Converter enables easy access to its functionality by consolidating download, installation and licensing processes. It also provides access to xDoc's rich Visual Mapping tool and works with xDoc's Adobe(R) Acrobat(R) plug-in. The PDF-XML Conversion functionality is available for download now at www.cambridgedocs.com/downloads.htm. About CambridgeDocs CambridgeDocs is a leader in the emerging market for XML-based content integration. This market deals with the integration of legacy content with new XML-based systems (e.g. Content Management, Enterprise Information Portals, EAI (Enterprise Application Integration) Refers to various techniques used to share data and business processes in large enterprises. When companies acquire another organization, disparate information systems have to be made to work together. , and Web Services) and standards (e.g. Docbook, DITA, XBRL (EXtensible Business Reporting Language) A specification for publishing financial information in the XML format. It is designed to provide a standard set of XML tags for exchanging accounting information and financial statements between companies and analysts. , S1000D, MIL-3001, SPL (1) (Systems Programming Language) The assembly language for the HP 3000 series. See assembly language for an SPL program example. (2) (Structured Programming Language) See structured programming. 1. , HRXML HRXML Human Resources eXtensible Markup Language HRXML Human Resources Xml , RIXML RIXML Research Information Exchange Markup Language , FPML, NewsML, or any custom XML schema/DTD's, etc.). Towards this end, CambridgeDocs provides a technology platform & services for taking existing unstructured and semi-structured internal and external content (e.g. MS Word, HTML, PDF, Quark, etc.), and transforming it into "meaningful XML". Once transformed, the content can be made available for delivery through XML-based Web Services, classified and indexed within Enterprise Information Portals, and aggregated, assembled and published in multiple different formats including support for wireless and mobile devices. |
|
||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion