Essays/XMLwithAmpersand
< Essays
Jump to navigation
Jump to search
The SAX XML parser exhibits annoying, though acceptable behavior when working on a text node containing an embedded ampersand character, designated by &. This behavior, that there may be multiple callbacks from a "characters" callback, is documented on this page, which, ironically, is rendered very poorly.
Here's a method for dealing with this, adapted from some example code written by Oleg.
NB.* xmlEGboxed.ijs: use elements or attributes to fill boxed table NB. http://www.jsoftware.com/pipermail/programming/2008-December/013300.html require 'xml/sax format' saxclass 'pboxed' startDocument=: 3 : 0 LASTL=: L=: 0 [ S=: '' NB. Level counter L, leading paths S. HREF=: '' NB. Stores attributes to get HREFs. Z=: i.0 2 NB. Will contain final result. ) endDocument=: 3 : 'Z' startElement=: 4 : 0 L=: >:L [ S=: S,<y if. y-:'bookmark' do. HREF=: x getAttribute 'href' end. ) endElement=: 3 : 0 L=: <:L [ S=: }:S ) characters=: 3 : 0 s2=. _2{.S if. s2 -: ;:'bookmark title' do. if. L~:LASTL do. Z=: Z,y;HREF NB. Either initialize or else. Z=: (<y,~>(<_1 0){Z) (<_1 0)}Z end. NB. accumulate more. end. LASTL=: L ) NB. ========================================================= cocurrent 'base'
This code is designed to accumulate bookmarked URLs with their corresponding titles.
Here's some sample XML with embedded ampersands.
egSmall=: 0 : 0 <?xml version="1.0"?> <!DOCTYPE xbel PUBLIC "+//IDN python.org//DTD XML Bookmark Exchange Language 1.1//EN//XML" "http://pyxml.sourceforge.net/topics/dtds/xbel-1.1.dtd"> <xbel> <title>Bookmarks</title> <desc>Bookmarks</desc> <folder id="rdf:#$FvPhC3" folded="no"> <title>Bookmarks Toolbar Folder</title> <desc>Add bookmarks to this folder & see them displayed on the Bookmarks Toolbar </desc> <bookmark href="http://www.bogus.org/HeyHo/LetsGo.html"> <title>Getting Started & Then Some</title> </bookmark> <bookmark href="http://fxfeeds.mozilla.com/" modified="1209052290"> <title>Headlines & Deadlines</title> </bookmark> </folder> <bookmark href="http://www.jsoftware.com/" added="1146880810" visited="1209017433"> <title>J Home & Homeboys</title> </bookmark> </xbel> )
Here's the result of using the code on this example:
load 'xmlEGBoxed.ijs' process_pboxed_ egSmall +---------------------------+--------------------------------------+ |Getting Started & Then Some|http://www.bogus.org/HeyHo/LetsGo.html| +---------------------------+--------------------------------------+ |Headlines & Deadlines |http://fxfeeds.mozilla.com/ | +---------------------------+--------------------------------------+ |J Home & Homeboys |http://www.jsoftware.com/ | +---------------------------+--------------------------------------+