Legacy Documentation : not up-to-date
The original ARQ parser will be removed from Jena
Normally, both ARP and Jena are used to read files either from the local machine or from the Web. A different use case, addressed here, is when the XML source is available in-memory in some way. In these cases, ARP and Jena can be used as a SAX event handler, turning SAX events into triples, or a DOM tree can be parsed into a Jena Model.
1. Overview
To read an arbitrary SAX source as triples to be added into a Jena
model, it is not possible to use a
Model.
read
()
operation. Instead, you construct a SAX event handler of class
SAX2Model
,
using the
create
method, install these as the handler on your SAX event source, and
then stream the SAX events. It is possible to have fine-grained
control over the SAX events, for instance, by inserting or deleting
events, before passing them to the
SAX2Model
handler.
Sample Code
This code uses the Xerces parser as a SAX event stream, and adds
the triple to a
Model
using
default options.
// Use your own SAX source.
XMLReader saxParser = new SAXParser();
// set up SAX input
InputStream in = new FileInputStream("kb.rdf");
InputSource ins = new InputSource(in);
ins.setSystemId(base);
Model m = ModelFactory.createDefaultModel();
String base = "http://example.org/";
// create handler, linked to Model
SAX2Model handler = SAX2Model.create(base, m);
// install handler on SAX event stream
SAX2RDF.installHandlers(saxParser, handler);
try {
try {
saxParser.parse(ins);
} finally {
// MUST ensure handler is closed.
handler.close();
}
} catch (SAXParseException e) {
// Fatal parsing errors end here,
// but they will already have been reported.
}
Initializing SAX event source
If your SAX event source is a subclass of XMLReader
, then the
installHandlers
static method can be used as shown in the sample. Otherwise, you
have to do it yourself. The
installHandlers
code is like this:
static public void installHandlers(XMLReader rdr, XMLHandler sax2rdf)
throws SAXException
{
rdr.setEntityResolver(sax2rdf);
rdr.setDTDHandler(sax2rdf);
rdr.setContentHandler(sax2rdf);
rdr.setErrorHandler(sax2rdf);
rdr.setFeature("http://xml.org/sax/features/namespaces", true);
rdr.setFeature(
"http://xml.org/sax/features/namespace-prefixes",
true);
rdr.setProperty(
"http://xml.org/sax/properties/lexical-handler",
sax2rdf);
}
For some other SAX source, the exact code will differ, but the required operations are as above.
Error Handler
The SAX2Model handler supports the setErrorHandler method, from the Jena RDFReader interface. This is used in the same way as that method to control error reporting.
A specific fatal error, new in Jena 2.3, is ERR_INTERRUPTED, which indicates that the current Thread received an interrupt. This allows long jobs to be aborted on user request.
Options
The SAX2Model
handler supports the
setProperty
method, from the Jena
RDFReader
interface. This is used in nearly the same way to have fine grain
control over ARPs behaviour, particularly over error reporting, see
the I/O howto. Setting SAX or
Xerces properties cannot be done using this method.
XML Lang and Namespaces
If you are only treating some document subset as RDF/XML then it is
necessary to ensure that ARP knows the correct value for xml:lang
and desirable that it knows the correct mappings of namespace
prefixes.
There is a second version of the
create
method, which allows specification of the xml:lang
value from the
outer context. If this is inappropriate it is possible, but hard
work, to synthesis an appropriate SAX event.
For the namespaces prefixes, it is possible to call the
startPrefixMapping
SAX event, before passing the other SAX events, to declare each
namespace, one by one. Failure to do this is permitted, but, for
instance, a Jena Model will then not know the (advisory) namespace
prefix bindings. These should be paired with endPrefixMapping
events, but nothing untoward is likely if such code is omitted.
Using your own triple handler
As with ARP, it is possible to use this functionality, without using other Jena features, in particular, without using a Jena Model. Instead of using the class SAX2Model, you use its superclass SAX2RDF. The create method on this class does not provide any means of specifying what to do with the triples. Instead, the class implements the ARPConfig interface, which permits the setting of handlers and parser options, as described in the documentation for using ARP without Jena.
Thus you need to:
- Create a SAX2RDF using SAX2RDF.create()
- Attach your StatementHandler and SAXErrorHandler and optionally your NamespaceHandler and ExtendedHandler to the SAX2RDF instance.
- Install the SAX2RDF instance as the SAX handler on your SAX source.
- Follow the remainder of the code sample above.
Using a DOM as Input
None of the approaches listed here work with Java 1.4.1_04. We suggest using Java 1.4.2_04 or greater for this functionality. This issue has no impact on any other Jena functionality.
Using a DOM as Input to Jena
The DOM2Model
subclass of SAX2Model, allows the parsing of a DOM using ARP. The
procedure to follow is:
- Construct a
DOM2Model
, using a factory method such ascreateD2M
, specifying the xml:base of the document to be loaded, the Model to load into, optionally the xml:lang value (particularly useful if using a DOM Node from within a Document). - Set any properties, error handlers etc. on the
DOM2Model
object. - The DOM is parsed simply by calling the
load(Node)
method.
Using a DOM as Input to ARP
DOM2Model is a subclass of SAX2RDF, and handlers etc. can be set on the DOM2Model as for SAX2RDF. Using a null model as the argument to the factory indicates this usage.