Chapter 5. Common problems

Chapter 5. Common problems
Prev		Next

1.1. The Problem: Why does nothing match?

You try to extract the links from an XHTML document like this:

xml sel -t -m "//a" -c . -n page.xhtml

The document contains an <a/> element, but there are no matches.

<html xmlns="http://www.w3.org/1999/xhtml"><body>
     <a href="http://example.com">A link</a>
</body></html>

The problem is the xmlns="http://www.w3.org/1999/xhtml" attribute on the root element, meaning that it, and all elements below have this url as part of their name.

1.2. The Solution

To match namespaced elements you must bind the namespace to a prefix and prepend it to the name:

xml sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a" -c . -n page.xhtml

1.3. A More Convenient Solution

XML documents can also use different namespace prefixes, on any element in the document. In order to handle namespaces with greater ease, XMLStarlet (versions 1.2.1+) will use the namespace prefixes declared on the root element of the input document. The default namespace will be bound to the prefixes "_" and "DEFAULT" (in versions 1.5.0+). So another way to solve handle the previous example would be:

xml sel -t -m "//_:a" -c . -n page.xhtml

This feature can be disabled (versions 1.6.0+) by the global --no-doc-namespace option. When should you disable it? Suppose you are writing a script that handles XML documents that look like this:

<data xmlns:a="http://example.com">
  <a:important-data>...</a:important-data>
</data>

and also this:

<data xmlns:b="http://example.com">
  <b:important-data>...</b:important-data>
</data>

Since both documents use the same namespace they are equivalent, even though the prefixes happen to be different. By using --no-doc-namespace and binding the namespace with -N, you can be sure that XMLStarlet's behaviour will be independant of the input document.

1.4. Deleting namespace declarations

Delete namespace declarations and all elements from non default namespace from the following XML document:

Input (file ns2.xml)

<doc xmlns="http://www.a.com/xyz" xmlns:ns="http://www.c.com/xyz">
  <A>test</A>
  <B>
    <ns:C>xyz</ns:C>
  </B>
</doc>

Command:

xml ed -N N="http://www.c.com/xyz" -d '//N:*' ns2.xml | \
        sed -e 's/ xmlns.*=".*"//g'

Output

<doc>
  <A>test</A>
  <B/>
</doc>

Prev		Next
9. List directory as XML	Home	2. Special characters

Chapter 5. Common problems

1. Namespaces and default namespace

1.1. The Problem: Why does nothing match?

1.2. The Solution

1.3. A More Convenient Solution

1.4. Deleting namespace declarations