Using Xpath in ECMAScripts
XPath is a wonderful little tool for traversing the DOM tree, and it is available at your disposal in just about every new browser, save for Internet Explorer. The syntax can seem a tad daunting at first, but the basic parts, and the ones you'll be using the most, are quite easy to pick up.
Syntax
There is a pretty good rundown of how document.evaluate works at
developer.mozilla.org, but for
your convenience, I've included the most relevant part:
xpathExpressionis a string representing the XPath to be evaluated.contextNodeis the node inside which the search will take place.documentis most common.namespaceResolveris a function that will be passed any namespace prefixes and should return a string representing the namespace URI associated with that prefix.nullis common for HTML documents or when no namespace prefixes are used.resultTypeis an integer that corresponds to the type of resultXPathResultto return. Use named constant properties, such asXPathResult.ANY_TYPE, of the XPathResult constructor, which correspond to integers from 0 to 9.resultis an existingXPathResultto use for the results.nullis most common and will create a newXPathResult.
First try
Once we have that tidbit of information, we can try counting all paragraphs in this document:
Clicking the button will count the paragraphs only if the page is served as regular old html:
If it says there are 0 paragraphs, your browser interpreted this page as XHTML, not HTML.
A solution to the problem
The problem is that just as one enters the domain of XHTML, namespaces become very important and XPath 1.0 never use the default namespace. This is probably a source of frustration for many developers, but there is a rather simple solution for this case.
Note what is in the first parameter of the evaluate call, I no longer use //p, but
//h:p, and I have defined another function, nsr. If you look back at the
syntax, it says the third parameter is the namespaceResolver function. It gets a
string as the first parameter, which contains the namespace prefix, and the function should return
the relevant namespace uri. Since using a default namespace does not work, I simply made a namespace
with the 'h' prefix using the same namespace uri as the default. This works just fine,
because XPath works with the uri, not the prefix.
It is also worth noting that there is no need to do any alterations to the document itself.
All the tags are already defined as being in the http://www.w3.org/1999/xhtml
namespace.
Try it for yourself:
If it says there are 0 paragraphs, your browser interpreted this page as HTML, not XHTML.
So what to do with the differing results?
Of course, using browser sniffing is out of the question. If you even had the thought, you need to punish yourself by smacking yourself over the head with a large herring. For all you others that do not currently have a headache, here is the reward in the form of a solution.
0 ? 'h:' : '';
// ns now contains the string 'h:' if namespaces are supported.
var result = document.evaluate('count(//'+ns+'p)',document,nsr,1,null); // Fetch all paragraphs
document.getElementById('cc-count-result').firstChild.data =
'I counted '+result.numberValue+' paragraphs in this article. ';
}
]]>
Test it:
If the result was 0, contact me at mail at robbiegee dot com. Remember to mention which browser
you are using. PS: Internet Explorer does not support evaluate, so don't bother pointing
that out.
Getting serious
Of course, counting paragraphs is not all that interesting. Let's do something useful, like creating an automatic index of all sections within this page!
0 ? 'h:' : '';
var iterator = document.evaluate('//'+ns+'h2',document,nsr,0,null);
// Note that modifying any of the nodes will invalidate the iterator
var header, headers = new Array();
while(header = iterator.iterateNext())
headers[headers.length] = header;
var index = document.createElement('ul');
index.setAttribute('id','automatic-index');
index.style.border = '1px solid black';
index.style.background = '#eeeeee';
index.style.position = 'fixed';
index.style.top = '0px';
index.style.right = '0px';
for(var i=0,end=headers.length;i
Try it: A box should appear in the top right corner, containing links to all the headers.