CambridgeDocs Announces PDF XML Converter for Transforming PDF into XML; End User Tool Allows for Low-Cost Conversion of PDF Files to XML, to XHTML, XSL:FO and RTF.Business Editors/High-Tech Writers CAMBRIDGE, Mass.--(BUSINESS WIRE)--Dec. 9, 2003 CambridgeDocs today announced the release of its PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format. XML XML in full Extensible Markup Language. Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. Converter, a stand-alone utility for those who want to extract and leverage content that is stored in Adobe PDF files. This utility, part of the CambridgeDocs XML Content Backbone, showcases the power of XML as both an intermediate and destination format for documents. The PDF XML Converter is capable of performing diverse transformations from a source PDF file. "More and more organizations see PDF as a strategic standard for the distribution and consumption of electronic documents. Many companies will only distribute printed information - reports, memos, documentation and invoices," said Rizwan Virk, CTO (Chief Technical Officer) The executive responsible for the technical direction of an organization. See CIO and salary survey. of CambridgeDocs. "However, it is difficult to get information from PDF files. The PDF XML Converter allows a low-cost way for individual users to extract content out of PDF files and to transform them into another format." The XML conversion extracts richly formatted XML from a PDF document. The XHTML (EXtensible HTML) A markup language for Web pages from the W3C. XHTML combines HTML and XML into a single format (HTML 4.0 and XML 1.0). Like XML, XHTML can be extended with proprietary tags. Also like XML, XHTML must be coded more rigorously than HTML. conversion creates an HTML HTML in full HyperText Markup Language Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. version of a PDF document, including images, vector graphics and more. The XSL (eXtensible Stylesheet Language) A standard from the W3C for describing a style sheet for XML documents. It is the XML counterpart to the Cascading Style Sheets (CSS) in HTML and is compatible with CSS2. :FO conversion creates an XSL:FO (XSL: Formatting Objects) document from a PDF document. "XML, XHTML, and XSL:FO are emerging open standards for the representation of text-oriented, or unstructured content. PDF is currently the most popular format used when distributing unstructured content," said Kedron Wolcott, co-founder and VP Engineering of CambridgeDocs. "This utility makes it possible to reuse the content in a PDF document by converting it into XML." The PDF XML Converter can be used with CambridgeDocs' other XML-related product offerings for a complete desktop-to-enterprise document/content integration strategy. The PDF XML Converter is also able to transform a PDF file into RTF (Rich Text Format) A document format from Microsoft for encoding text and graphics. It was adapted from IBM's DCA format and supports ANSI, IBM PC and Macintosh character sets. for editing in Microsoft Word. Adobe PDF (Portable Document Format (file format) Portable Document Format - (PDF) The native file format for Adobe Systems' Acrobat. PDF is the file format for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents. ) has become a de-facto standard for sharing printed materials electronically. One of its strengths is to position items at specific points on the page. Because of this, content published as PDF files have been difficult to edit and modify. The PDF XML Converter is available for immediate download from the CambridgeDocs website (http://www.cambridgedocs.com). It retails for $495, but is being offered for a special introductory price of $199. About CambridgeDocs CambridgeDocs is a leader in the emerging market for XML-based content integration and publishing. This market deals with the integration of legacy content with new XML-based systems (e.g. Content Management, Enterprise Information Portals, EAI, and Web Services) and standards (e.g. DocBook, HRXML HRXML Human Resources eXtensible Markup Language HRXML Human Resources Xml , RIXML RIXML Research Information Exchange Markup Language , IRXML, FPML, DAS-XML, NewsML, any custom XML schema/DTD's). Towards this end, CambridgeDocs provides a technology platform & services for taking existing unstructured and semi-structured internal and external content (e.g. MS Word, HTML, PDF, Quark), and transforming them into "meaningful XML". Once transformed, the content can be made available for delivery through XML-based Web Services, classified and indexed within Enterprise Information Portals, and aggregated, assembled and published in different formats including support for wireless and mobile devices. The xDoc Converter is the first step in CambridgeDocs' strategy for providing Content Interoperability via a middleware platform, the CambridgeDocs XML Content Backbone. The CambridgeDocs XML Content Backbone allows for sharing, indexing, migrating, repurposing, republishing and delivery of content between numerous legacy formats and a variety of enterprise content systems. |
|
||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion