Chapter 4. XmlStarlet Reference

1. Querying XML documents

XmlStarlet 'select' or 'sel' option can be used to query or search XML documents. Here is synopsis for 'xml sel' command:

XMLStarlet Toolkit: Select from XML document(s)
Usage: xml sel <global-options> {<template>} [ <xml-file> ... ]
where
  <global-options> - global options for selecting
  <xml-file> - input XML document file name/uri (stdin is used if missing)
  <template> - template for querying XML document with following syntax:

<global-options> are:
  -Q or --quiet             - do not write anything to standard output.
  -C or --comp              - display generated XSLT
  -R or --root              - print root element <xsl-select>
  -T or --text              - output is text (default is XML)
  -I or --indent            - indent output
  -D or --xml-decl          - do not omit xml declaration line
  -B or --noblanks          - remove insignificant spaces from XML tree
  -E or --encode <encoding> - output in the given encoding (utf-8, unicode...)
  -N <name>=<value>         - predefine namespaces (name without 'xmlns:')
                              ex: xsql=urn:oracle-xsql
                              Multiple -N options are allowed.
  --net                     - allow fetch DTDs or entities over network
  --help                    - display help

Syntax for templates: -t|--template <options>
where <options>
  -c or --copy-of <xpath>   - print copy of XPATH expression
  -v or --value-of <xpath>  - print value of XPATH expression
  -o or --output <string>   - output string literal
  -n or --nl                - print new line
  -f or --inp-name          - print input file name (or URL)
  -m or --match <xpath>     - match XPATH expression
  --var <name> <value> --break or
  --var <name>=<value>      - declare a variable (referenced by $name)
  -i or --if <test-xpath>   - check condition <xsl:if test="test-xpath">
  --elif <test-xpath>       - check condition if previous conditions failed
  --else                    - check if previous conditions failed
  -e or --elem <name>       - print out element <xsl:element name="name">
  -a or --attr <name>       - add attribute <xsl:attribute name="name">
  -b or --break             - break nesting
  -s or --sort op xpath     - sort in order (used after -m) where
  op is X:Y:Z, 
      X is A - for order="ascending"
      X is D - for order="descending"
      Y is N - for data-type="numeric"
      Y is T - for data-type="text"
      Z is U - for case-order="upper-first"
      Z is L - for case-order="lower-first"

There can be multiple --match, --copy-of, --value-of, etc options
in a single template. The effect of applying command line templates
can be illustrated with the following XSLT analogue

xml sel -t -c "xpath0" -m "xpath1" -m "xpath2" -v "xpath3" \
        -t -m "xpath4" -c "xpath5"

is equivalent to applying the following XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <xsl:call-template name="t1"/>
  <xsl:call-template name="t2"/>
</xsl:template>
<xsl:template name="t1">
  <xsl:copy-of select="xpath0"/>
  <xsl:for-each select="xpath1">
    <xsl:for-each select="xpath2">
      <xsl:value-of select="xpath3"/>
    </xsl:for-each>
  </xsl:for-each>
</xsl:template>
<xsl:template name="t2">
  <xsl:for-each select="xpath4">
    <xsl:copy-of select="xpath5"/>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

'select' option allows you basically avoid writting XSLT stylesheet to perform some queries on XML documents. I.e. various combinations of command line parameters will let you to generate XSLT stylesheet and apply in to XML documents with a single command line. Very often you do not really care what XSLT was created for you 'select' command, but in those cases when you do; you can always use -C or --comp switch which will let you see exactly which XSLT is applied to your input.

'select' option supports many EXSLT functions in XPath expressions.

Here are few examples which will help to understand how 'xml select' works:

EXAMPLE:

Count elements matching XPath expression:

xml sel -t -v "count(/xml/table/rec/numField)" table.xml

Input (table.xml):

<xml>
  <table>
    <rec id="1">
      <numField>123</numField>
      <stringField>String Value</stringField>
    </rec>
    <rec id="2">
      <numField>346</numField>
      <stringField>Text Value</stringField>
    </rec>
    <rec id="3">
      <numField>-23</numField>
      <stringField>stringValue</stringField>
    </rec>
  </table>
</xml>

Output:

3

Let's take a close look what it did internally. For that we will use '-C' option

$ xml sel -C -t -v "count(/xml/table/rec/numField)"
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:exslt="http://exslt.org/common"
 xmlns:math="http://exslt.org/math"
 xmlns:date="http://exslt.org/dates-and-times"
 xmlns:func="http://exslt.org/functions"
 xmlns:set="http://exslt.org/sets"
 xmlns:str="http://exslt.org/strings"
 xmlns:dyn="http://exslt.org/dynamic"
 xmlns:saxon="http://icl.com/saxon"
 xmlns:xalanredirect="org.apache.xalan.xslt.extensions.Redirect"
 xmlns:xt="http://www.jclark.com/xt"
 xmlns:libxslt="http://xmlsoft.org/XSLT/namespace"
 xmlns:test="http://xmlsoft.org/XSLT/"
 extension-element-prefixes=
   "exslt math date func set str dyn saxon xalanredirect xt libxslt test"
 exclude-result-prefixes="math str">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:param name="inputFile">-</xsl:param>
<xsl:template match="/">
  <xsl:call-template name="t1"/>
</xsl:template>
<xsl:template name="t1">
  <xsl:value-of select="count(/xml/table/rec/numField)"/>
</xsl:template>
</xsl:stylesheet>

Ignoring some XSLT stuff to make it brief:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:param name="inputFile">-</xsl:param>
<xsl:template match="/">
  <xsl:call-template name="t1"/>
</xsl:template>
<xsl:template name="t1">
  <xsl:value-of select="count(/xml/table/rec/numField)"/>
</xsl:template>
</xsl:stylesheet>

Every -t option is mapped into XSLT template. Options after '-t' are mapped into XSLT elements:

  • -v to <xsl:value-of>

  • -c to <xsl:copy-of>

  • -e to <xsl:element>

  • -a to <xsl:attribute>

  • -s to <xsl:sort>

  • -m to <xsl:for-each>

  • -i to <xsl:if>

  • and so on

By default subsequent options (for instance: -m) will result in nested corresponding XSLT elements (<xsl:for-each> for '-m'). To break this nesting you would have to put '-b' or '--break' after first '-m'.

Below are few more examples:

EXAMPLE

Count all nodes in XML documents. Print input name and node count after it.

xml sel -t -f -o " " -v "count(//node())" xml/table.xml xml/tab-obj.xml

Output:

xml/table.xml 32
xml/tab-obj.xml 41

EXAMPLE

Find XML files matching XPath expression (containing 'object' element)

xml sel -t -m //object -f xml/table.xml xml/tab-obj.xml

Result output:

xml/tab-obj.xml

EXAMPLE

Calculate EXSLT (XSLT extentions) XPath value

echo "<x/>" | xml sel -t -v "math:abs(-1000)"

Result output:

1000

EXAMPLE

Adding elements and attributes using command line 'xml sel'

echo "<x/>" | xml sel -t -m / -e xml -e child -a data -o value

Result Output:

<xml><child data="value"/></xml>

EXAMPLE

Query XML document and produce sorted text table

xml sel -T -t -m /xml/table/rec -s D:N:- "@id" \
  -v "concat(@id,'|',numField,'|',stringField)" -n xml/table.xml

Result Output:

3|-23|stringValue
2|346|Text Value
1|123|String Value

Equivalent stylesheet

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
<xsl:param name="inputFile">-</xsl:param>
<xsl:template match="/">
  <xsl:call-template name="t1"/>
</xsl:template>
<xsl:template name="t1">
  <xsl:for-each select="/xml/table/rec">
    <xsl:sort order="descending" data-type="number" 
      case-order="upper-first" select="@id"/>
    <xsl:value-of select="concat(@id,'|',numField,'|',stringField)"/>
    <xsl:value-of select="'&#10;'"/>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

EXAMPLE

Predefine namespaces for XPath expressions

xml sel -N xsql=urn:oracle-xsql -t -v /xsql:query xsql/jobserve.xsql

Input (xsql/jobserve.xsql)

$ cat xsql/jobserve.xsql
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="jobserve.xsl"?>
<xsql:query connection="jobs" xmlns:xsql="urn:oracle-xsql" max-rows="5">
  SELECT substr(title,1,26) short_title, title, location, skills
  FROM job
  WHERE UPPER(title) LIKE '%ORACLE%'
  ORDER BY first_posted DESC
</xsql:query>

Result output

  SELECT substr(title,1,26) short_title, title, location, skills
  FROM job
  WHERE UPPER(title) LIKE '%ORACLE%'
  ORDER BY first_posted DESC

EXAMPLE

Print structure of XML element using xml sel (advanced XPath expressions and xml sel command usage)

xml sel -T -t -m '//*' \
-m 'ancestor-or-self::*' -v 'name()' -i 'not(position()=last())' -o . -b -b -n \
xml/structure.xml

Input (xml/structure.xml)

<a1>
  <a11>
    <a111>
      <a1111/>
    </a111>
    <a112>
      <a1121/>
    </a112>
  </a11>
  <a12/>
  <a13>
    <a131/>
  </a13>
</a1>

Result Output:

a1
a1.a11
a1.a11.a111
a1.a11.a111.a1111
a1.a11.a112
a1.a11.a112.a1121
a1.a12
a1.a13
a1.a13.a131

This example is a good demonstration of nesting control. Here is corresponding XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
<xsl:param name="inputFile">-</xsl:param>
<xsl:template match="/">
  <xsl:call-template name="t1"/>
</xsl:template>
<xsl:template name="t1">
  <xsl:for-each select="//*">
    <xsl:for-each select="ancestor-or-self::*">
      <xsl:value-of select="name()"/>
      <xsl:if test="not(position()=last())">
        <xsl:value-of select="'.'"/>
      </xsl:if>
    </xsl:for-each>
    <xsl:value-of select="'&#10;'"/>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

EXAMPLE

Print all links of xhtml document

xml sel --net --html -T -t -m "//*[local-name()='a']" \
   -o 'NAME: ' -v "translate(. , '&#10;', ' ')" -n \
   -o 'LINK: ' -v @href -n -n \
   http://xmlstar.sourceforge.net/

Sample output

NAME: XmlStarlet SourceForge Site
LINK: http://sourceforge.net/projects/xmlstar/

NAME: XmlStarlet CVS Source
LINK: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/xmlstar/

NAME: XmlStarlet on Freshmeat.Net
LINK: http://freshmeat.net/projects/xmlstarlet/

NAME: XMLStarlet Sourceforge forums
LINK: http://sourceforge.net/forum/?group_id=66612

NAME: XMLStarlet mailing list
LINK: http://lists.sourceforge.net/lists/listinfo/xmlstar-devel