48 lines
1.8 KiB
Plaintext
48 lines
1.8 KiB
Plaintext
From: paul at prescod.net (Paul Prescod)
|
|
Date: Tue, 4 May 1999 00:54:27 GMT
|
|
Subject: Parsing
|
|
Message-ID: <372E4543.68A125EC@prescod.net>
|
|
Content-Length: 1609
|
|
X-UID: 1788
|
|
|
|
I am using Aycock's package to handle some parsing but I am having trouble
|
|
because the language I am parsing is highly context sensitive. I don't
|
|
have any trouble dealing with the context-sensitivity in the so-called
|
|
"context free grammar" part of the package (the parser) but in the scanner
|
|
it is killing me.
|
|
|
|
Let's pretend I am parsing a tagged (but non-SGML) language where there is
|
|
an element "URL". Within "URL" elements, the characters < and > are
|
|
illegal: they must be escaped as \< and \>.
|
|
|
|
Elsewhere they are not. Here is the grammar I would *like* to write
|
|
(roughly):
|
|
|
|
Element ::= <URL> urlcontent </URL>
|
|
urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
|
|
Element ::= <NOT-A-URL> anychar* </NOT-A-URL>
|
|
|
|
Of course this is a made-up syntax because I don't think you can put
|
|
regular expressions in Aycock's BNF. I've used tools that do allow this so
|
|
I'm not sure how to handle it. This is also a made-up (simplified) example
|
|
so demonstrating how I can do it all in the scanner is probably not
|
|
helpful.
|
|
|
|
I could handle it if I could switch scanners mid-stream (for URL elements)
|
|
but Aycock's scanner finishes up before the parser even gets under way!
|
|
Should I scan and then parse (at a high level) and then rescan and reparse
|
|
the URLs? Is there a package that allows me to mix the lexical and
|
|
syntactic levels more?
|
|
|
|
--
|
|
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
|
|
http://itrc.uwaterloo.ca/~papresco
|
|
|
|
Diplomatic term: "We had a frank exchange of views."
|
|
Translation: Negotiations stopped just short of shouting and
|
|
table-banging. (Brill's Content, Apr. 1999)
|
|
|
|
|
|
|
|
|