You are here: > Dive Into Python > Scripts and Streams > Caching node lookups | << >> | ||||
Dive Into PythonPython from novice to pro |
kgp.py employs several tricks which may or may not be useful to you in your XML processing. The first one takes advantage of the consistent structure of the input documents to build a cache of nodes.
A grammar file defines a series of ref elements. Each ref contains one or more p elements, which can contain a lot of different things, including xrefs. Whenever you encounter an xref, you look for a corresponding ref element with the same id attribute, and choose one of the ref element's children and parse it. (You'll see how this random choice is made in the next section.)
This is how you build up the grammar: define ref elements for the smallest pieces, then define ref elements which "include" the first ref elements by using xref, and so forth. Then you parse the "largest" reference and follow each xref, and eventually output real text. The text you output depends on the (random) decisions you make each time you fill in an xref, so the output is different each time.
This is all very flexible, but there is one downside: performance. When you find an xref and need to find the corresponding ref element, you have a problem. The xref has an id attribute, and you want to find the ref element that has that same id attribute, but there is no easy way to do that. The slow way to do it would be to get the entire list of ref elements each time, then manually loop through and look at each id attribute. The fast way is to do that once and build a cache, in the form of a dictionary.
def loadGrammar(self, grammar): self.grammar = self._load(grammar) self.refs = {}for ref in self.grammar.getElementsByTagName("ref"):
self.refs[ref.attributes["id"].value] = ref
![]()
Once you build this cache, whenever you come across an xref and need to find the ref element with the same id attribute, you can simply look it up in self.refs.
def do_xref(self, node): id = node.attributes["id"].value self.parse(self.randomChildElement(self.refs[id]))
You'll explore the randomChildElement function in the next section.
<< Standard input, output, and error | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | Finding direct children of a node >> |