XmlStarlet 'select' or 'sel' option can be used to query or search XML documents. Here is synopsis for 'xml sel' command:
XMLStarlet Toolkit: Select from XML document(s) Usage: xml sel <global-options> {<template>} [ <xml-file> ... ] where <global-options> - global options for selecting <xml-file> - input XML document file name/uri (stdin is used if missing) <template> - template for querying XML document with following syntax: <global-options> are: -C or --comp - display generated XSLT -R or --root - print root element <xsl-select> -T or --text - output is text (default is XML) -I or --indent - indent output -D or --xml-decl - do not omit xml declaration line -B or --noblanks - remove insignificant spaces from XML tree -N <name>=<value> - predefine namespaces (name without 'xmlns:') ex: xsql=urn:oracle-xsql Multiple -N options are allowed. --net - allow fetch DTDs or entities over network --help - display help Syntax for templates: -t|--template <options> where <options> -c or --copy-of <xpath> - print copy of XPATH expression -v or --value-of <xpath> - print value of XPATH expression -o or --output <string> - output string literal -n or --nl - print new line -f or --inp-name - print input file name (or URL) -m or --match <xpath> - match XPATH expression -i or --if <test-xpath> - check condition <xsl:if test="test-xpath"> -e or --elem <name> - print out element <xsl:element name="name"> -a or --attr <name> - add attribute <xsl:attribute name="name"> -b or --break - break nesting -s or --sort op xpath - sort in order (used after -m) where op is X:Y:Z, X is A - for order="ascending" X is D - for order="descending" Y is N - for data-type="numeric" Y is T - for data-type="text" Z is U - for case-order="upper-first" Z is L - for case-order="lower-first" There can be multiple --match, --copy-of, --value-of, etc options in a single template. The effect of applying command line templates can be illustrated with the following XSLT analogue xml sel -t -c "xpath0" -m "xpath1" -m "xpath2" -v "xpath3" \ -t -m "xpath4" -c "xpath5" is equivalent to applying the following XSLT <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:call-template name="t1"/> <xsl:call-template name="t2"/> </xsl:template> <xsl:template name="t1"> <xsl:copy-of select="xpath0"/> <xsl:for-each select="xpath1"> <xsl:for-each select="xpath2"> <xsl:value-of select="xpath3"/> </xsl:for-each> </xsl:for-each> </xsl:template> <xsl:template name="t2"> <xsl:for-each select="xpath4"> <xsl:copy-of select="xpath5"/> </xsl:for-each> </xsl:template> </xsl:stylesheet> XMLStarlet is a command line toolkit to query/edit/check/transform XML documents (for more information see http://xmlstar.sourceforge.net/) Current implementation uses libxslt from GNOME codebase as XSLT processor (see http://xmlsoft.org/ for more details)
'select' option allows you basically avoid writting XSLT stylesheet to perform some queries on XML documents. I.e. various combinations of command line parameters will let you to generate XSLT stylesheet and apply in to XML documents with a single command line. Very often you do not really care what XSLT was created for you 'select' command, but in those cases when you do; you can always use -C or --comp switch which will let you see exactly which XSLT is applied to your input.
'select' option supports many EXSLT functions in XPath expressions.
Here are few examples which will help to understand how 'xml select' works:
EXAMPLE:
Count elements matching XPath expression:
xml sel -t -v "count(/xml/table/rec/numField)" table.xml
Input (table.xml):
<xml> <table> <rec id="1"> <numField>123</numField> <stringField>String Value</stringField> </rec> <rec id="2"> <numField>346</numField> <stringField>Text Value</stringField> </rec> <rec id="3"> <numField>-23</numField> <stringField>stringValue</stringField> </rec> </table> </xml>
Output:
3
Let's take a close look what it did internally. For that we will use '-C' option
$ xml sel -C -t -v "count(/xml/table/rec/numField)" <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" xmlns:math="http://exslt.org/math" xmlns:date="http://exslt.org/dates-and-times" xmlns:func="http://exslt.org/functions" xmlns:set="http://exslt.org/sets" xmlns:str="http://exslt.org/strings" xmlns:dyn="http://exslt.org/dynamic" xmlns:saxon="http://icl.com/saxon" xmlns:xalanredirect="org.apache.xalan.xslt.extensions.Redirect" xmlns:xt="http://www.jclark.com/xt" xmlns:libxslt="http://xmlsoft.org/XSLT/namespace" xmlns:test="http://xmlsoft.org/XSLT/" extension-element-prefixes="exslt math date func set str dyn saxon xalanredirect xt libxslt test" exclude-result-prefixes="math str"> <xsl:output omit-xml-declaration="yes" indent="no"/> <xsl:param name="inputFile">-</xsl:param> <xsl:template match="/"> <xsl:call-template name="t1"/> </xsl:template> <xsl:template name="t1"> <xsl:value-of select="count(/xml/table/rec/numField)"/> </xsl:template> </xsl:stylesheet>
Ignoring some XSLT stuff to make it brief:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="no"/> <xsl:param name="inputFile">-</xsl:param> <xsl:template match="/"> <xsl:call-template name="t1"/> </xsl:template> <xsl:template name="t1"> <xsl:value-of select="count(/xml/table/rec/numField)"/> </xsl:template> </xsl:stylesheet>
Every -t option is mapped into XSLT template. Options after '-t' are mapped into XSLT elements:
-v to <xsl:value-of>
-c to <xsl:copy-of>
-e to <xsl:element>
-a to <xsl:attribute>
-s to <xsl:sort>
-m to <xsl:for-each>
-i to <xsl:if>
and so on
By default subsequent options (for instance: -m) will result in nested corresponding XSLT elements (<xsl:for-each> for '-m'). To break this nesting you would have to put '-b' or '--break' after first '-m'.
Below are few more examples:
EXAMPLE
Count all nodes in XML documents. Print input name and node count after it.
xml sel -t -f -o " " -v "count(//node())" xml/table.xml xml/tab-obj.xml
Output:
xml/table.xml 32 xml/tab-obj.xml 41
EXAMPLE
Find XML files matching XPath expression (containing 'object' element)
xml sel -t -m //object -f xml/table.xml xml/tab-obj.xml
Result output:
xml/tab-obj.xml
EXAMPLE
Calculate EXSLT (XSLT extentions) XPath value
echo "<x/>" | xml sel -t -v "math:abs(-1000)"
Result output:
1000
EXAMPLE
Adding elements and attributes using command line 'xml sel'
echo "<x/>" | xml sel -t -m / -e xml -e child -a data -o value
Result Output:
<xml><child data="value"/></xml>
EXAMPLE
Query XML document and produce sorted text table
xml sel -T -t -m /xml/table/rec -s D:N:- "@id" -v "concat(@id,'|',numField,'|',stringField)" -n xml/table.xml
Result Output:
3|-23|stringValue 2|346|Text Value 1|123|String Value
Equivalent stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="no" method="text"/> <xsl:param name="inputFile">-</xsl:param> <xsl:template match="/"> <xsl:call-template name="t1"/> </xsl:template> <xsl:template name="t1"> <xsl:for-each select="/xml/table/rec"> <xsl:sort order="descending" data-type="number" case-order="upper-first" select="@id"/> <xsl:value-of select="concat(@id,'|',numField,'|',stringField)"/> <xsl:value-of select="' '"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
EXAMPLE
Predefine namespaces for XPath expressions
xml sel -N xsql=urn:oracle-xsql -t -v /xsql:query xsql/jobserve.xsql
Input (xsql/jobserve.xsql)
$ cat xsql/jobserve.xsql <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="jobserve.xsl"?> <xsql:query connection="jobs" xmlns:xsql="urn:oracle-xsql" max-rows="5"> SELECT substr(title,1,26) short_title, title, location, skills FROM job WHERE UPPER(title) LIKE '%ORACLE%' ORDER BY first_posted DESC </xsql:query>
Result output
SELECT substr(title,1,26) short_title, title, location, skills FROM job WHERE UPPER(title) LIKE '%ORACLE%' ORDER BY first_posted DESC
EXAMPLE
Print structure of XML element using xml sel (advanced XPath expressions and xml sel command usage)
xml sel -T -t -m '//*' \ -m 'ancestor-or-self::*' -v 'name()' -i 'not(position()=last())' -o . -b -b -n \ xml/structure.xml
Input (xml/structure.xml)
<a1> <a11> <a111> <a1111/> </a111> <a112> <a1121/> </a112> </a11> <a12/> <a13> <a131/> </a13> </a1>
Result Output:
a1 a1.a11 a1.a11.a111 a1.a11.a111.a1111 a1.a11.a112 a1.a11.a112.a1121 a1.a12 a1.a13 a1.a13.a131
This example is a good demonstration of nesting control. Here is corresponding XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="no" method="text"/> <xsl:param name="inputFile">-</xsl:param> <xsl:template match="/"> <xsl:call-template name="t1"/> </xsl:template> <xsl:template name="t1"> <xsl:for-each select="//*"> <xsl:for-each select="ancestor-or-self::*"> <xsl:value-of select="name()"/> <xsl:if test="not(position()=last())"> <xsl:value-of select="'.'"/> </xsl:if> </xsl:for-each> <xsl:value-of select="' '"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
EXAMPLE
Print all links of xhtml document
xml sel --net --html -T -t -m "//*[local-name()='a']" \ -o 'NAME: ' -v "translate(. , ' ', ' ')" -n \ -o 'LINK: ' -v @href -n -n \ http://xmlstar.sourceforge.net/
Sample output
NAME: XmlStarlet SourceForge Site LINK: http://sourceforge.net/projects/xmlstar/ NAME: XmlStarlet CVS Source LINK: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/xmlstar/ NAME: XmlStarlet on Freshmeat.Net LINK: http://freshmeat.net/projects/xmlstarlet/ NAME: XMLStarlet Sourceforge forums LINK: http://sourceforge.net/forum/?group_id=66612 NAME: XMLStarlet mailing list LINK: http://lists.sourceforge.net/lists/listinfo/xmlstar-devel