schlitt.info - php, photography and private stuff ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Tobias Schlitt :Date: Wed, 23 Dec 2009 13:19:22 +0100 :Revision: 1 :Copyright: CC by-nc-sa ================================ Convert from and to OpenDocument ================================ :Keywords: opendocument, ODF, eZ Components, PHP, convert, ODT, eZ, tutorial :Description: Tutorial on how to read, create and style OpenDocument in PHP using the eZ Document component from eZ Components. :Abstract: Yesterday the latest stable release of the `eZ Components`__ project, number `2009.2`__, was rolled. For this release I worked on support for OpenDocumentText__ (ODT) in the Document component. In this article I show you how you can import OpenDocumentText documents and convert them into any of the supported formats of the component, how to export data into ODT and how to apply styles to the generated documents. You will also see how ODT and PDF can be exported with the very same styling information to make them look almost identical. __ http://ezcomponents.org/ __ http://ezcomponents.org/resources/news/news-2009-12-21 __ http://en.wikipedia.org/wiki/OpenDocument Yesterday the latest stable release of the `eZ Components`__ project, number `2009.2`__, was rolled. For this release I worked on support for OpenDocumentText__ (ODT) in the Document component. In this article I show you how you can import OpenDocumentText documents and convert them into any of the supported formats of the component, how to export data into ODT and how to apply styles to the generated documents. You will also see how ODT and PDF can be exported with the very same styling information to make them look almost identical. __ http://ezcomponents.org/ __ http://ezcomponents.org/resources/news/news-2009-12-21 __ http://en.wikipedia.org/wiki/OpenDocument Before I get going with the technical stuff, I'd like to thank `Derick Rethans`__ for his efforts in the eZ Components project and the amazing cooperation in the past ~4 years at eZ Systems. `Derick is leaving the company`__ by the end of the year and I want to wish him all the best for the future and especially the upcoming changes. It was a pleasure working with you, mate! __ http://derickrethans.nl/ __ http://derickrethans.nl/good_bye_ez_systems.php In the first version with ODT, the Document component only supports the FODT (flat ODT) format for import and export. This variant of OpenDocumentText consists of a single plain XML file, not a ZIP package as normal ODT files do. However, such files should be supported by the most `OpenOffice.org`__ versions and they can even contain images and other media data. I hope I can implement support for real ODT files as a first step for the next release. __ http://openoffice.org To create an FODT with OpenOffice.org, simply choose the file format in the save dialog. If your instance of OpenOffice.org does not have this format, check your distribution for a package like *openoffice.org-filter-binfilter* (Ubuntu). Supporting versions of OpenOffice.org will also open existing FODT files. If your desktop environment did not register the filetype correctly, just force it to open *.fodt* files with OpenOffice.org. ------------- Importing ODT ------------- With version 1.3 of the Document component it is capable of importing FODT files. As usual, you can convert the content of such files into the internal format of the component (`Docbook XML`__) and from there you can export any of the other supported formats. The following example shows importing an ODT and exporting RST:: loadFile( 'example.fodt' ); $docbook = $odt->getAsDocbook(); $converter = new ezcDocumentDocbookToRstConverter(); $rst = $converter->convert( $docbook ); file_put_contents( 'example.txt', $rst ); ?> __ http://www.docbook.org/ In ``$odt`` a new instance of ``ezcDocumentOdt`` is created. This object is capable of writing, validating and, as seen in the example, reading FODT files. The method ``getAsDocbook()`` performs the actual conversion to Docbook XML and returns an instance of ``ezcDocumentDocbook``. You could now save the Docbook XML at this stage or, as shown, go on converting it. An ``ezcDocumentDocbookToRstConverter`` is used to convert from Docbook to `reStructuredText`__, a format commonly used for documentation in software projects. As you might have guessed, ``$rst`` contains an instance of ``ezcDocumentRst``, which is saved in the last line of the snippet. You could have called ``save()`` on the RST document to get its text content. This is not necessary here since the magic ``__toString()`` method has the very same effect. __ http://docutils.sourceforge.net/rst.html You can review the `input FODT`__ and the `output RST`__ online to see the conversion. The import mechanism does not only convert the semantical elements contained in the ODT file, which are quite few, but also performs some heuristic magic to recognize e.g. emphasis text. I hope I find time to work some more on this aspect and expose an API for it in the future. __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/example.fodt __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/example.txt -------------- Generating ODT -------------- The opposite way, generating FODT from a Docbook XML document, can be done as easy as the import was. For just a plain conversion, you can use the ``ezcDocumentOdt`` class, import an ``ezcDocumentDocbook`` and save it. For more tuning opportunities, you should better go for an ``ezcDocumentDocbookToOdtConverter``. You'll see in the next section, why. :: setFilters( array( new ezcDocumentXhtmlElementFilter(), new ezcDocumentXhtmlMetadataFilter(), new ezcDocumentXhtmlXpathFilter( '//div[@id="opensource_blog_0712_scalar_type_hints_in_php"]' ), ) ); $xhtml->loadFile( '0712_scalar_type_hints_in_php.html' ); $docbook = $xhtml->getAsDocbook(); $converter = new ezcDocumentDocbookToOdtConverter(); $odt = $converter->convert( $docbook ); file_put_contents( '0712_scalar_type_hints_in_php.fodt', $odt ); ?> This example loads an XHTML file, which I stored locally `from my website`__ using Firefox. First, the XHTML content is read. Since web pages usually contain more than just the plain content, e.g. navigation and ads, an additional filter is appended to the ``ezcDocumentXhtml`` instance. The ``ezcDocumentXhtmlXpathFilter`` extracts the nodes identified by the given XPath expression and uses them as the document content instead of the full document. __ /opensource/blog/0712_scalar_type_hints_in_php.html The loaded XHTML document is again converted to Docbook XML in the same way you saw when FODT was loaded. Again similar, a converter instance (``ezcDocumentDocbookToOdtConverter``) is used to perform the conversion to FODT and the saving of the document works also the same. You can find the source article online `in my blog`__ and the `generated FODT for download`__. __ /opensource/blog/0712_scalar_type_hints_in_php.html __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/0712_scalar_type_hints_in_php.fodt ----------- Styling ODT ----------- Styling an exported FODT document works almost exactly like with PDF styling in the Document component: Using a sub-set of CSS. The coolest thing with this is, that you can create PDF and ODT from the same source with the same style sheet and they will look almost identical. The following example is a bit longish and therefore split into 2 parts. It's based on the previous example, so I don't repeat the loading of the XHTML. To apply styling information to a ODT or PDF you need to define a style sheet (file or string) in a format similar to CSS, the so called PCSS:: article { font-family: DejaVuSans; font-size: 10pt; font-weight: normal; color: #000000; } article > section > title { color: #444578; font-size: 24pt; font-family: DejaVuSans; font-weight: bold; } article > section > section > title { color: #444578; font-size: 20pt; font-family: DejaVuSans; font-weight: normal; border-bottom: 1pt solid #444578; } /* ... */ literallayout { margin: 10pt 30pt; padding: 10pt; } emphasis { color: #444578; font-weight: bold; } ulink { color: #444578; text-decoration: underline; } The addressing rules are based on Docbook XML elements. This example defines default formatting for the pages on the ``
`` element, which is the root of the XML. Since Docbook uses a nesting model to define sections and therefore headlines. Therefore you need the shown rules to define different levels of headings. I skipped some of such here to shorten the example. The ```` element is commonly used for listings, ```` is an in-line tag to accent text passages and ```` is used for web links. As you can see, the formatting rules look exactly like CSS, except for that not all of the style attributes defined in CSS are supported (yet). To utilize this style sheet, it needs to be loaded during the ODT export. Note that a default style sheet is always loaded and that your custom style sheet must therefore only re-define what you desire. :: options->styler->addStylesheetFile( 'ezc.pcss' ); $odt = $converter->convert( $docbook ); file_put_contents( '0712_scalar_type_hints_in_php_styled.fodt', $odt ); // … ?> The ODT converter comes with a default styling mechanism which uses a PCSS definition. In future it will be possible to implement custom styling mechanisms, for whatever reason. To load the style sheet file, you just need to call the ``addStylesheetFile()`` method on the default styler. This way, you can also add multiple style sheets, which may override each others definitions in the order they get loaded. You know this mechanism from CSS. That's all you need to do for styling the ODT. `Download the result`__ to validate its beauty. :) __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/0712_scalar_type_hints_in_php_styled.fodt To render a PDF with the very same style sheet, a little bit more work is necessary:: options->driver = new ezcDocumentPdfHaruDriver(); $pdf->options->driver->registerFont( 'DejaVuSans', ezcDocumentPdfHaruDriver::FONT_PLAIN, array( '/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf' ) ); $pdf->options->driver->registerFont( 'DejaVuSans', ezcDocumentPdfHaruDriver::FONT_BOLD, array( '/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf' ) ); $pdf->loadStyles( 'ezc.pcss' ); $pdf->createFromDocbook( $docbook ); file_put_contents( '0712_scalar_type_hints_in_php_styled.pdf', $pdf ); ?> This example uses the `libharu driver`__ for rendering the PDF. You need the `pecl/haru`__ extension installed to reproduce it. Since the PDF converter, in contrast to ODT, needs to have the actual font files at hand, you need to register the fonts used in your style sheet. __ http://ezcomponents.org/docs/api/latest/introduction_Document.html#writing-pdf __ http://pecl.php.net/package/haru The method ``registerFont()`` expects the name of the font used in the PCSS, the variation of the font (e.g. bold) for which the font file is used and a list of font files that contain the font definition. You can specify multiple font files here, since some drivers cannot cope with some font formats. Using TTF is fine with Haru, though. After registering the fonts, you need to load the style sheet and are ready to render the PDF. That's all. You can compare the results of the ODT__ and PDF__ exports, to see they look almost exactly the same. __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/0712_scalar_type_hints_in_php_styled.fodt __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/0712_scalar_type_hints_in_php_styled.pdf .. hint:: Note: The Haru output driver suffers from a little bug__ which keeps the PDF from being rendered, as this article is written. You can `download a patch`__ included in the bug report to make it working and I expect the issue from being fixed soonish after X-mas. __ http://issues.ez.no/IssueView.php?Id=15987 __ http://issues.ez.no/IssueView.php?Id=15987&ProjectId=1&Anchor=Attachment6881 ---------- Conclusion ---------- In the first release of ODT support in the eZ Document component you can import and export FODT files, which can be read and written by most OpenOffice.org distributions. You can style exported ODTs using PCSS, a CSS sub-set, as you can do when exporting PDF documents. For future versions of the Document component I want to implement real ODT support, which should not be that hard since its basically handling the ZIP archives. In addition I want to implement even better and customizable detection of semantics in ODT, which has very few semantics included. Another idea is to provide templating for adding headers and footers, as well as supporting additional style information. If you want to reproduce the above examples and maybe use them for own experiments, you can `download a complete package`__ including sources, examples and results. More `documentation on the Document component`__ can be found on the eZ Components website. Feedback is very welcome as a comment to this blog. __ http://files.schlitt.info/blog/0716_convert_from_to_opendocument/examples.tar.bz2 __ http://ezcomponents.org/s/Document In this sense, merry X-Mas and a happy new 2010! Cheers! .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79 Trackbacks ========== Comments ======== - sardius@SMSDAM at Wed, 30 Dec 2009 02:54:37 +0100 I just initiated a download , thank you very much for the script conversion to and from ODT is what i need now. - sth at Sun, 17 Jan 2010 10:18:04 +0100 It Looks fine to me. It will be try it. - Stefan at Tue, 26 Jan 2010 12:52:07 +0100 Just makes me angry. An error occured in eZ Components: 'Visitor error: Warning: 'You try to convert an invalid docbook document. This may lead to invalid output.' in line 0 at position 0.' This happens when trying to convert an *.fodt file as well as when trying to convert (valid!) DocBook 4.5 or DocBook 5.0 files. It seems the DocBook converter is still a bit messy. The ezcDocumentDocbook validateFile method complains about illegal elements that are clearly allowed at their position within the document (i.e. articleinfo). - Toby at Tue, 26 Jan 2010 13:05:08 +0100 Hi Stefan, I'm sorry, but our DocBook handler just handles a sub-set of the original DocBook standard. It is mainly meant for internal storage of all incoming markup languages. This is also the reason why DocBook is not officially listed as a supported format here: /opensource/blog/0716_convert_from_to_opendocument We use DocBook as the intermediate format for conversions and therefore only need a certain subset of markup from it. - Alex Petrov at Tue, 28 Sep 2010 21:22:41 +0200 I think that this information will be interesting and useful for anyone, who deals with files in odt format: we offer and advise you to use powerful and free of charge odt converter. You don’t even need to register – everything is in an online mode and without any payment. The site’s name is odt-converter.com.
We will continue creating similar free services further, so we’ll appreciate your voluntary contribution. You do not have such opportunities now? It’s not a problem. Tell your friends about our service - and we’ll thank you for that. Come and enjoy! - Hans at Sat, 08 Jan 2011 23:23:13 +0100 Do I need php5 to get it work? - Cement at Wed, 19 Dec 2012 17:35:27 +0100 Your article tells me you must have a lot of background in this topic. Can you direct me to other articles about this? I will recommend this article to my friends as well. Thanks - 24 hours locksmith Hollywood at Fri, 28 Dec 2012 11:38:42 +0100 Thanks for sharing this information. I really like your way of expressing the opinions and sharing the information - jackets at Mon, 15 Apr 2013 05:06:08 +0200 Shanxi FanShiXian deputy chief procurators MuXinCheng been detained, has said his assets 400 million yuan, said there are two hundred million yuan. Therefore, triggered a boom, the scold people scolded, but people seem to not and the former President of interviews is long, and now has become a corruption under the prison inmates.