Chapter W10. An Introduction to XML
 
Goals for this chapter: rpm packages covered in this chapter: 
  • perl-XML-Dumper
  • perl-XML-Encoding
  • perl-XML-Grove 
  • perl-XML-Parser
  • perl-XML-Twig
  • docbook-dtd30-sgml
  • docbook-dtd31-sgml
  • docbook-dtd40-sgml
  • docbook-dtd412-xml
  • docbook-dtd41-sgml
  • docbook-dtd41-xml
  • docbook-style-dsssl
  • docbook-utils
  • docbook-utils-pdf
  • openjade
  • tetext 
The human mind treats a new idea the way the body treats a strange protein
-- it rejects it
-- P. Medawar.


What is XML ?

The XML is an Markup language created by the result of HMTL and SGML. The SGML is a language capable to generate any type of format HTML, DVI, LaTex, TXT and Postscript.

X in XML, stand for Extensible. Therefore, XML stand for Extensible Markup Language. Extensible because XML may respond to any customized markup language.

For example, in HTML we have:

<bold>This paragraph is in BOLD</bold>

In XML, we can define
 

XML had been written in SGML, the international standard metalanguage for text markup systems (ISO 8879).

SGML (Standard Generalized Markup Language ) includes also tags like:

For example a valid XML document is the following:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<book>
<title>An Example Book</title>
<titleabbrev>Example</titleabbrev>
<bookinfo>
  <legalnotice><para>No notice is required.</para></legalnotice>
  <author><firstname>Norman</firstname><surname>Walsh</surname></author>
</bookinfo>
<dedication>
<para>
This book is dedicated to you.
</para>
</dedication>
<preface><title>Forword</title>
<para>
Some content is always required.
</para>
</preface>
<chapter><title>Required Chapter</title>
<para>
At least one chapter, reference, part, or article is required in a book.
</para>
</chapter>
<appendix><title>Optional Appendix</title>
<para>
Appendixes are optional.
</para>
</appendix>
</book>
For example we can see that the previous XML tags regards a book.

XML using the DocBook Definition is the latest reference in Linux Tex Processing.
 

first example in XML

All the KDE tips  are written in XML. For example the following KDE "tip":

ktip

is an XML file.

Exactly:

<tip category="Konqueror">
<html>
<p>The <b>"Location" label</b> in Konqueror is draggable.</p>
<p>This means you can create shortcuts (e.g. on the desktop or the panel)
by dragging it there with the mouse. You can also drop it on to Konsole or
edit fields to get the URL typed in there (as you can with links or files
displayed in Konqueror).</p>
</html>
</tip>

<tip category="LooknFeel">
<html>
<p>The style of all GUI elements (buttons, edit fields, toolbars and so on)
can be configured in Preferences
(K Menu->Preferences <em>or</em> Control Center)
by selecting LookNFeel -> Style.</p>
<br>
<center><img src="hicolor/48x48/apps/style.png"></center>
</html>
</tip>
 

These documents comes from the Qt XML documentation, for example.

<document>
<book>
  <title>Practical XML</title>
  <author title="Ms" name="Eris Kallisti"/>
  <chapter>
    <title>A Namespace Called fnord</title>
  </chapter>
</book>
</document>
 

What is DocBook?

DocBook or better DocBook DTD is a set of declarations, that is a Document Type Definition (DTD), where a SGML document base its references.

This definition document or DTD, is an inventory of tag names developed by a team. These definitions includes also the semantic to create a feature.

Docbook DTD was originally developed by HaL Computer Systems and O'Reilly & Associates around 1991. The specific moves to the Davenport Group in 1994, and then to its actual place: OASIS The Organization for the Advancement of Structured Information Standards.

The latest version is 4.1.2

Where are located these Docbook DTD definitions ?

In RedHat system, they are installed on: "/usr/share/sgml/docbook/xml-dtd-4.1.2/docbookx.dtd" and belong to the package docbook-dtd412-xml

[root@ftosx1 root]# rpm -ql docbook-dtd412-xml
/usr/share/doc/docbook-dtd412-xml-1.0
/usr/share/doc/docbook-dtd412-xml-1.0/40chg.txt
/usr/share/doc/docbook-dtd412-xml-1.0/41chg.txt
/usr/share/doc/docbook-dtd412-xml-1.0/ChangeLog
/usr/share/doc/docbook-dtd412-xml-1.0/readme.txt
/usr/share/sgml/docbook/xml-dtd-4.1.2
/usr/share/sgml/docbook/xml-dtd-4.1.2/calstblx.dtd
/usr/share/sgml/docbook/xml-dtd-4.1.2/catalog
/usr/share/sgml/docbook/xml-dtd-4.1.2/dbcentx.mod
/usr/share/sgml/docbook/xml-dtd-4.1.2/dbgenent.mod
/usr/share/sgml/docbook/xml-dtd-4.1.2/dbhierx.mod
/usr/share/sgml/docbook/xml-dtd-4.1.2/dbnotnx.mod
/usr/share/sgml/docbook/xml-dtd-4.1.2/dbpoolx.mod
/usr/share/sgml/docbook/xml-dtd-4.1.2/docbook.cat
/usr/share/sgml/docbook/xml-dtd-4.1.2/docbookx.dtd
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amsa.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amsb.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amsc.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amsn.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amso.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-amsr.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-box.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-cyr1.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-cyr2.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-dia.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-grk1.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-grk2.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-grk3.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-grk4.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-lat1.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-lat2.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-num.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-pub.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/ent/iso-tech.ent
/usr/share/sgml/docbook/xml-dtd-4.1.2/soextblx.dtd
[root@ftosx1 root]#

On the Web, the files are available at Oasis Web Site in Docbook Specific pages: http://www.oasis-open.org/docbook/xml/4.1.2/

Therefore DocBook documents are SGML documents that respect DocBook DTD specifics. Docbook specific are similar to XML specific but different.

There are available a PERL script to transform DTD to XML Schema: A Conversion Tool from DTD to XML Schema

A minimal DocBook book

Generally, DocBook DTD documents starts with the line:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
 

We list here a very simple document:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]>

<book id="FTLinuxCourse">
 <bookinfo>
  <title>Fast Training Linux Course</title>

  <authorgroup>
   <author>
    <firstname>Giovanni</firstname>
    <surname>Orlando</surname>
    <affiliation>
     <address>
      <email>gorlando@futuretg.com</email>
     </address>
    </affiliation>
   </author>
  </authorgroup>

  <copyright>
   <year>2001</year>
   <holder>Future Technologies</holder>
  </copyright>

  <legalnotice>
   <para>
     This documentation is copyright document by Future Technologies Inc
   </para>
  </legalnotice>
 </bookinfo>

<toc></toc>

  <chapter id="intro">
      <title>Introduction</title>
  <para>
        FTLinuxCourse is a web-based Training Course for RedHat Linux and
        other distributions.
        Is based on the idea to offer an open material (HTML files), and
        at the same time learning while browsing the FastTrainingLinuxCourse,
        full of source, shells and file scripts, browsing the local Linux FileSystem,
        and the Web, for the original ideas and documents written by the creator of
        Linux and its components, Tcl, C, C++, Expect, The Linux Kernel, X Window System,
        Qt Libraries, KDE, GNOME, Motif, PERL, Python, HTML Language, PHP, and in general
        any possible Linux subject in a well-organized training course available in six
        courses: BASE, WebMaster, X Window, Networking, Programming and System Administration.
  </para>

    <sect1>
      <title>What covers FTLinuxCourse ?</title>
      <para>
        What covers FTLinuxCourse ?
        FTLinuxCourse covers all immaginable Linux topics like programming languages:
        <function>C</function>, <function>C++</function>,
        <function>Tcl</function>, <function>Tk</function>,
        <function>Xt</function> Programming, <function>Motif</function> Programming,
        <function>Qt</function> Programming to develop KDE Applications,
        <function>Gtk</function> Programming to develop GNOME Applications
        <function>Bash</function> Shell Programming and other SysAdm
        shell script programming like <function>Expect</function>
      </para>
    </sect1>

  </chapter>

</book>
[root@ftosx1 root]#

The previous documents haves its "sections <sect1>", its "paragraphs <para>", inside the "Chapter <chapter>" and "book <book>" book definition.

To "compile" this SGML document, we run the docbook2html, db2html command:

[root@ftosx1 root]# db2html ftlinuxcourse.sgml
output is ftlinuxcourse
Using catalogs: /etc/sgml/sgml-docbook-3.1.cat
Using stylesheet: /usr/share/sgml/docbook/utils-0.6.9/docbook-utils.dsl#html
Working on: /root/ftlinuxcourse.sgml
Done.
[root@ftosx1 root]#

This process create a directory with the name of the file.

Therefore a dir called ftlinuxcourse will be created included the necessary files in the format requested.
 

These files present a very simple format.

DocBook also includes tags to create books, articles and other similar documents.

For a book, we can have the following format:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
<book>
<bookinfo>
<title>My First Book</title>
<author><firstname>Jane</firstname><surname>Doe</surname></author>
<copyright><year>1998</year><holder>Jane Doe</holder></copyright>
</bookinfo>
<preface><title>Foreword</title> ... </preface>
<chapter> ... </chapter>
<chapter> ... </chapter>
<chapter> ... </chapter>
<appendix> ... </appendix>
<appendix> ... </appendix>
<index> ... </index>
</book>
 

Using jade

Jade is an implementation of the DSSSL style language; is a DocBook tool necessary to create the different formats

Jade name stand for James DSSSL Engine. James is James Clark.

Inside Linux, jade is a symbolic link to openjade.

[root@ftosx1 root]# file /usr/bin/jade
/usr/bin/jade: symbolic link to openjade
[root@ftosx1 root]# [root@ftosx1 SPECS]# file /usr/bin/openjade
bash: [root@ftosx1: command not found
[root@ftosx1 root]# /usr/bin/openjade: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped
 

DSSSL (Document Style Semantics and Specification Language) is an International Standard, ISO/IEC 10179:1996, for specifying document transformation and formatting in a platform- and vendor-neutral
manner. DSSSL can be used with any document format for which a property set can be defined according to the Property Set Definition Requirements of ISO/IEC 10744. In particular, it can be used to specify the presentation of documents marked up according to ISO 8879:1986, Standard Generalized Markup Language (SGML).

Jade is simply a program that create DSSSL documents developed by James Clark.

Jade is used in Linux ...

[root@ftosx1 root]# jade --help
jade:E: invalid option "-"
jade:I: usage is "jade [-vCegG2s] [-b encoding] [-f error_file] [-c catalog_sysid] [-D dir] [-a link_type] [-A arch] [-E max_errors] [-i entity] [-w warning_type] [-d dsssl_spec] [-V variable[=value]] [-t (fot|rtf|tex|mif|sgml|xml)] [-o output_file] sysid..."
[root@ftos

For example in the compilation of Maximum-RPM book commands like:

/usr/bin/jade -d ../redhat-html.dsl -t sgml -ihtml -V html-index ../max-rpm.sgml ;
 

/usr/bin/jade -d ../redhat-html.dsl -t sgml -ihtml ../max-rpm.sgml
 

These command request the dsl (that is the DSSL) defintion that RedHat includes in the file redhat-html.dsl

Jade and DSSL may be used to publish DocBook documents. Is possible to consult the section: The Using Jade and DSSSL to Publish DocBook Documents

While DocBook does not includes a program to create TeX files, jade does!.

Jade haves also a Tex friend, called jadetex that is a symbolic link to virtex.

[root@ftosx1 root]# file /usr/bin/jadetex
/usr/bin/jadetex: symbolic link to virtex
[root@ftosx1 root]#

Linux Documentation in DocBook format

After the DocBook release, the entire Linux community moves slowly from old SGML development to DocBook format.

All Linux documentation is available in this format.

[root@ftosx1 src]# ls -al
total 145307
drwxr-xr-x    5 root     root          200 Oct  5 18:20 .
drwxr-xr-x   18 root     root          396 Sep 18 15:34 ..
lrwxrwxrwx    1 root     root           12 Oct  5 18:20 linux -> linux-2.4.10
drwxr-xr-x   14 1046     wine          463 Sep 23 19:37 linux-2.4.10
-rw-r--r--    1 root     root     124835840 Sep 26 09:28 linux-2.4.10.tar
drwxr-xr-x   16 root     root          675 Oct  5 13:06 linux-2.4.7-2
-rw-r--r--    1 root     root     23952949 Sep 13 13:00 linux-2.4.7-2.tgz
drwxr-xr-x    7 root     root          141 Sep  8 21:55 redhat
[root@ftosx1 src]#
 

In the Linux hierarchy the actual DocBook documentation is available in the dir:

[root@ftosx1 DocBook]# pwd
/usr/src/linux/Documentation/DocBook
[root@ftosx1 DocBook]# ls
deviceiobook.tmpl    Makefile           parport-multi.fig      procfs-guide.tmpl  videobook.tmpl
kernel-api.tmpl      mcabook.tmpl       parport-share.fig      sis900.tmpl        wanbook.tmpl
kernel-hacking.tmpl  mousedrivers.tmpl  parport-structure.fig  tulip-user.tmpl    z8530book.tmpl
kernel-locking.tmpl  parportbook.tmpl   procfs_example.c       via-audio.tmpl
[root@ftosx1 DocBook]#

DocBook format is so important, that also the Linux Makefile includes it, and allow to generate SGML, HTML, Postscript and PDF files.

[root@ftosx1 DocBook]# cd ../..
[root@ftosx1 linux]# pwd
/usr/src/linux
[root@ftosx1 linux]# more Makefile
...
sgmldocs:
        chmod 755 $(TOPDIR)/scripts/docgen
        chmod 755 $(TOPDIR)/scripts/gen-all-syms
        chmod 755 $(TOPDIR)/scripts/kernel-doc
        $(MAKE) -C $(TOPDIR)/Documentation/DocBook books

psdocs: sgmldocs
        $(MAKE) -C Documentation/DocBook ps

pdfdocs: sgmldocs
        $(MAKE) -C Documentation/DocBook pdf

htmldocs: sgmldocs
        $(MAKE) -C Documentation/DocBook html
...

For example to create the HTML version from the Linux docs, you need simply to run the command.

Is convenient to remove the lines: 16457 up to 16549 in the file Documentation/DocBook/kernel-api.sgml  and the lines, 227-9, in the file Documentation/DocBook/deviceiobook.sgml, before to run the command.

[root@ftosx1 linux]# make htmldocs

After the compilation we will find all the HTML files in its respective dirs:

[root@ftosx1 linux]# cd Documentation/DocBook/deviceiobook
[root@ftosx1 deviceiobook]# ls -al
total 39
drwxr-xr-x    3 root     root          293 Oct  5 20:52 .
drwxr-xr-x   16 root     root         1573 Oct  5 20:52 ..
-rw-r--r--    1 root     root         1717 Oct  5 20:52 bugs.html
-rw-r--r--    1 root     root         2181 Oct  5 20:52 c77.html
-rw-r--r--    1 root     root         2266 Oct  5 20:52 doingio.html
-rw-r--r--    1 root     root         1897 Oct  5 20:52 intro.html
-rw-r--r--    1 root     root         1890 Oct  5 20:52 ln21.html
-rw-r--r--    1 root     root         2960 Oct  5 20:52 mmio.html
drwxr-xr-x    2 root     root          345 Oct  5 20:51 stylesheet-images
-rw-r--r--    1 root     root         3856 Oct  5 20:52 x44.html
-rw-r--r--    1 root     root         2551 Oct  5 20:52 x65.html
-rw-r--r--    1 root     root
[root@ftosx1 deviceiobook]#
 

In the Linux hierarchy, are included the tmpl documents.

[root@ftosx1 DocBook]# more deviceiobook.tmpl
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]>

<book id="DoingIO">
 <bookinfo>
  <title>Bus-Independent Device Accesses</title>

  <authorgroup>
   <author>
    <firstname>Matthew</firstname>
    <surname>Wilcox</surname>
    <affiliation>
     <address>
      <email>matthew@wil.cx</email>
     </address>
    </affiliation>
   </author>
  </authorgroup>
...
 <legalnotice>
   <para>
     This documentation is free software; you can redistribute
     it and/or modify it under the terms of the GNU General Public
     License as published by the Free Software Foundation; either
     version 2 of the License, or (at your option) any later
     version.
   </para>
...
[root@ftosx1 DocBook]#
 

The Makefile happens as follows:

First the files are transformed to sgml ...

deviceiobook.sgml: deviceiobook.tmpl
        $(TOPDIR)/scripts/docgen <deviceiobook.tmpl >deviceiobook.sgml
 

... and then to HTML

%:      %.sgml
        @(which db2html > /dev/null 2>&1) || \
         (echo "*** You need to install DocBook stylesheets ***"; \
          exit 1)
        -$(RM) -r $@
        db2html $<
        if [ ! -z "$(JPG-$@)" ]; then cp $(JPG-$@) $@; fi
 

The program db2html, stand for docbook2html

In the same way you can run, from the Linux kernel directory the commands:

[root@ftosx1 linux]# make psdocs

and

[root@ftosx1 linux]# make pdfdocs

There are called the catalogs and the stylesheets necessary to create the document.

Using catalogs: /etc/sgml/sgml-docbook-3.1.cat
Using stylesheet: /usr/share/sgml/docbook/utils-0.6.9/docbook-utils.dsl#print

DocBook benefits

The previous example explain better the DocBook benefits. We can write a book or a project using DocBook specifics and you will have valid documents in HTML, PDF, PS and simple TXT

FTLinuxCourse Complete includes the DocBook: The Definitive Guide
 

StarOffice 6.0 XML

The lastet StarOffice suite supports a new XML-based format. All new

In details,












Of course, all applications support this XML, including StarWriter and StarImpress and othet programs.









Internally a scalc document (and also any other document) presents the following internally format.

...
<!DOCTYPE office:document-meta PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "office.dtd">
<office:document-meta xmlns:office="http://openoffice.org/2000/office" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="http://openoffice.org/2000/meta" office:version="1.0">
 <office:meta>
  <meta:generator>OpenOffice.org 638 (Linux)</meta:generator><!--638h(Build:7133)-->
  <meta:initial-creator>Giovanni Orlando</meta:initial-creator>
  <meta:creation-date>2000-09-26T13:36:13</meta:creation-date>
  <dc:date>2001-09-12T10:04:02</dc:date>
  <meta:print-date>2001-09-10T19:51:36</meta:print-date>
  <dc:language>en-US</dc:language>
  <meta:editing-cycles>953</meta:editing-cycles>
  <meta:editing-duration>P6DT1H43M35S</meta:editing-duration>
  <meta:user-defined meta:name="Info 0"/>
  <meta:user-defined meta:name="Info 1"/>
  <meta:user-defined meta:name="Info 2"/>
  <meta:user-defined meta:name="Info 3"/>
  <meta:document-statistic meta:table-count="3" meta:cell-count="261"/>
 </office:meta>
...

StarOffice 6.0 now presents a splitted form
 
 
This program is swriter.

We can access this program running soffice or any other program with File->New

This is "scalc" (StarCalc). Is the normal spreadsheet
This is sdraw (StarDraw). 

You can write flow charts 

This is simpress for presentations.

Also other olds and new programs are included, like "smath" to write formulae, "sweb" to write and "smaster" to write books

A minimal support for DataBase is present:

The standard New menu

The New menu continues to be the same standard menu. With this choice any program can open any "office" file type.
 
 






In this sense any StarOffice program is a soffice. Each program open a new task for the request.

StarWrite offers also a support for Scanners.

Also, Koffice offers this support but StarOffice haves more than

With this choice StarOffice is not only one of the more light UNIX office suite today available, but also offers
 

Exercises

  1. Modify the first example using a form with three buttons, one for each command.
Tests
  1. What means XML? What means the X in XML ?
  2. In what language XML had been written ?
  3. List two reasons for XML popularity.
  4. List two recent applications that adopt XML like a standard language ?
  5. Is possible to get a phrase in bold using XML ?
  6. What is DocBook ?
  7. Is DocBook the standard format for Linux documentation ?
  8. What does jade ?
  9. Is possible to transform files from SGML to TeX with Jade ? ... from SGML to HTML ?
  10. Is possible to generate HTML files from DocBook files ?
  11. What is DTD ?
  12. Are DTD and XML equivalents ?
  13. Why is used jade, while there are different docbook commands ?
  14. What is DSSSL ?
  15. Is possible to transfrom docbook files to TeX ?
  16. Is possible that Jade transform the files to TeX ?

 

Read the answers to the exercises.
 

Check the Interactive Exam Cram WebMaster: Try the interactive cram ...
 
 

Internet Resources for this Chapter.