You try to extract the links from an XHTML document like this:
xml sel -t -m "//a" -c . -n page.xhtml
The document contains an <a/>
element, but
there are no matches.
<html xmlns="http://www.w3.org/1999/xhtml"><body> <a href="http://example.com">A link</a> </body></html>
The problem is the
xmlns="http://www.w3.org/1999/xhtml"
attribute
on the root element, meaning that it, and all elements below
have this url as part of their name.
To match namespaced elements you must bind the namespace to a prefix and prepend it to the name:
xml sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a" -c . -n page.xhtml
XML documents can also use different namespace prefixes, on any element in the document. In order to handle namespaces with greater ease, XMLStarlet (versions 1.2.1+) will use the namespace prefixes declared on the root element of the input document. The default namespace will be bound to the prefixes "_" and "DEFAULT" (in versions 1.5.0+). So another way to solve handle the previous example would be:
xml sel -t -m "//_:a" -c . -n page.xhtml
This feature can be disabled (versions 1.6.0+) by the global
--no-doc-namespace
option. When should you
disable it? Suppose you are writing a script that handles
XML documents that look like this:
<data xmlns:a="http://example.com"> <a:important-data>...</a:important-data> </data>
and also this:
<data xmlns:b="http://example.com"> <b:important-data>...</b:important-data> </data>
Since both documents use the same namespace they are
equivalent, even though the prefixes happen to be different.
By using --no-doc-namespace
and binding the
namespace with -N, you can be sure that XMLStarlet's
behaviour will be independant of the input document.
Delete namespace declarations and all elements from non default namespace from the following XML document:
Input (file ns2.xml)
<doc xmlns="http://www.a.com/xyz" xmlns:ns="http://www.c.com/xyz"> <A>test</A> <B> <ns:C>xyz</ns:C> </B> </doc>
Command:
xml ed -N N="http://www.c.com/xyz" -d '//N:*' ns2.xml | \ sed -e 's/ xmlns.*=".*"//g'
Output
<doc> <A>test</A> <B/> </doc>