wasm-demo/demo/ermis-f/python_m/cur/1788

From: paul at prescod.net (Paul Prescod)
Date: Tue, 4 May 1999 00:54:27 GMT
Subject: Parsing
Message-ID: <372E4543.68A125EC@prescod.net>
Content-Length: 1609
X-UID: 1788

I am using Aycock's package to handle some parsing but I am having trouble
because the language I am parsing is highly context sensitive. I don't
have any trouble dealing with the context-sensitivity in the so-called
"context free grammar" part of the package (the parser) but in the scanner
it is killing me.

Let's pretend I am parsing a tagged (but non-SGML) language where there is
an element "URL". Within "URL" elements, the characters < and > are
illegal: they must be escaped as \< and \>.

Elsewhere they are not. Here is the grammar I would *like* to write
(roughly):

Element ::= <URL> urlcontent </URL>
urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
Element ::= <NOT-A-URL> anychar* </NOT-A-URL>

Of course this is a made-up syntax because I don't think you can put
regular expressions in Aycock's BNF. I've used tools that do allow this so
I'm not sure how to handle it. This is also a made-up (simplified) example
so demonstrating how I can do it all in the scanner is probably not
helpful.

I could handle it if I could switch scanners mid-stream (for URL elements)
but Aycock's scanner finishes up before the parser even gets under way!
Should I scan and then parse (at a high level) and then rescan and reparse
the URLs? Is there a package that allows me to mix the lexical and
syntactic levels more?

--
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Diplomatic term: "We had a frank exchange of views."
Translation: Negotiations stopped just short of shouting and
             table-banging. (Brill's Content, Apr. 1999)