AEBNF
Altered Extended Backus–Naur Form (AEBNF) is a metasyntax notation
used to express context-free grammars. It provides a formal way to describe computer programming languages
and other formal languages. It is an alteration of the Extended Backus–Naur Form
(EBNF) metasyntax notation (ISO-14977).
Motivation to alter EBNF
When designing TemlCode, we found out missing some possibilities for efficient and exact grammar rules expression.
For example, a quite common issue was the use of general skipping rule (spaces, line feeds, tabs, comments).
In EBNF, there is just one concatenator -> ",". If we would want to express, that in between some
terminals we allow for skipping rule, but in between others we don't, we would have to expand the grammar in
variations for each an every such a case, which would make it less inspectional and more growth in size.
Another aspect was internationalization. We missed some easy way to define a modern ISO/IEC 10646 UCS Code Points
as a part of grammar definition. There is extensibility facility in EBNF with ? .. ? syntax, but it lacks
some formal, quick and easy exactness.
Short Guide to AEBNF
In this AEBNF specification, some metasyntax is the same as EBNF while other things are of different syntax.
There are also some additions related to concatenator to allow definitions of
skipper rule. Extended is also a support for defintion of UCS Code Points.
Basic AEBNF syntax elements:
Element
|
Syntax
|
terminal
|
"x" "xyz" $hex %dec ..
|
nonterminal
|
xyz XYZ_09
|
combinator
|
& + =
|
selector
|
|
|
specifier
|
! ~ * < ... >
|
group
|
( ... ) [ ... ]
|
comment
|
{ ... }
|
LITERAL Terminal -> "x" and "xyz"
String in between ".." quotation marks forms terminal symbols presentable in Ascii,
that is visible characters, digits, punctuation marks, white space characters etc.
DECIMAL Terminal -> %dec
Decimal value after "%" percent character forms terminal symbol of appropriate UCS Code Point.
HEXADECIMAL Terminal -> %hex
Hexadecimal value after "$" dollar character forms terminal symbol of appropriate UCS Code Point.
RANGE of Terminals -> ..
Range ".." can appear in between two terminals to define a gradual terminal sequence.
NON Terminal
Identification strings defines production rules - nonterminals.
AND Combinator -> &
Sequences syntax elements, one immediately following another.
PLUS Combinator -> +
Sequences syntax elements, one after another, while in between them
(in the place of "+") a detection of Skipper rule syntax elements may occur.
ANY_ORDER Combinator -> =
Sequences syntax elements in such a special way, that the order of them it's not important.
When used in combination with Grouping syntax, a special Swapper rule
can be additionaly defined with < +|& , … , +|& > placed before the
group. It overrides the Skipper rule and is used to separate parts
in between individual syntax elements. Swapper rule is applied in the place of "=" combinator,
and means: First comes some syntax element,
then left combinator < +|&, then some separation syntax elements , … ,
then right combinator +|& > and then another syntax element.
This way we can define:
<+,",",+>( [A] = [B] = [C] … [X] )
|
One time max per ABCX in any order,
while at least one of ABCX must exist.
Delimiter between ABCX is: + "," +
|
<&,",",&>[ [A] = [B] = [C] … [X] ]
|
One time max per ABCX in any order,
while none of ABCX must exist.
Delimiter between ABCX is: & "," &
|
<+,",",&>( A = B = C … X )
|
One time max per ABCX in any order,
while all of ABCX must exist.
Delimiter between ABCX is: + "," &
|
OR Selector -> |
Defines that, one of the syntax elements can be detected at a time.
REPETITION Specifier -> *
Fine tunes the exact count of repetition occurences at lower and higher
bounds. It can also specify the kind of binding combinator for consequent
occurences within bounds. Default repetition binding combinator is &.
Some examples:
*
|
repeat zero or more times
|
*<10>
|
repeat 10 times exactly
|
*<..10>
|
repeat zero or 10 times max
|
*<10..>
|
repeat 10 times min or more
|
*<+>
|
repeat zero or more times, with PLUS concatenator
|
*<&,10>
|
repeat 10 times exactly, with AND concatenator
|
*<..10,+>
|
repeat zero or 10 times max, with PLUS concatenator
|
In combination with Group syntax:
( … )* … repeat whole group one or more times
[ … ]* … repeat whole group zero or more times
NEGATION Specifier -> !
Consider the negation of recognition as a valid result.
ANY_CASE Specifier -> ~
Valid for terminals. Specifies that any Letter Case can be detected.
ZERO_OR_ONE Group -> [ … ]
Optional (non-compulsory). Can also groups more terminals and nonterminals.
Means no or one occurence of terminals and nonterminals or whole group.
ONE Group -> ( … )
Groups more terminals and nonterminals. Means extactly one occurence of whole group.
COMMENT -> { … }
Allows for comments inside a grammar definition.
Grammar of AEBNF itself in AEBNF
Nonterminal
|
Production rule
|
Digit
|
"0".."9"
|
Unsigned
|
Digit*<1..>
|
Letter
|
"A".."Z" | "a".."z"
|
Id
|
( "_" | Letter ) & [ "_" | Letter | Digit ]*
|
Skipper
|
[ %32 | %9 | %10 | %13 | %10%13 ]*
|
char
|
%34 & !%34 & %34
|
string
|
%34 & !%34*<1..> & %34
|
hex
|
"$" & ( ~"a".."f" | Digit )*<1..6>
|
dec
|
"%" & Digit*<1..7>
|
literal
|
( char | hex | dec )
|
sequence
|
( char | string | hex | dec )*
|
BINDING
|
"&" | "+"
|
SWAPPER
|
( "<" + BINDING + "," + sequence + "," + BINDING + ">" )
|
COUNT
|
"<" + <+,",",+>( [ Unsigned ] = [BINDING] ) + ">"
|
RANGEFR
|
"<" + <+,",",+>( [ Unsigned + ".." ] = [BINDING] ) + ">"
|
RANGETO
|
"<" + <+,",",+>( [ ".." + Unsigned ] = [BINDING] ) + ">"
|
RANGE
|
"<" + <+,",",+>( [ Unsigned + ".." + Unsigned ] = [BINDING] ) + ">"
|
REP
|
( "*" & [ COUNT | RANGEFR | RANGETO | RANGE ] )
|
NEG
|
( "!" )
|
NCASE
|
[ ["~"] = ["!"] ]
|
SUB
|
( "{" + Id + [Var + ["<" + Id + ">"] ] + "}" )
|
p01267
|
p0 | p1 | p2 | p6 | p7
|
p012345X
|
p0 | p1 | p2 | p3 | p4 | p5 | pX
|
Grammar
|
[SUB] + ( p0 | p1 | p2 | p3 | p4 | p5 | pX | p6 | p7 )
|
p0
|
[NEG] & Id & [REP] & [SUB]
|
p1
|
[NCASE] & sequence & [REP] & [SUB]
|
p2
|
[NCASE] & literal & ".." & literal & [ "-" & sequence ] & [REP] & [SUB]
|
p3
|
p01267 + "|" + p01267 + [ "|" + p01267 ]*<+>
|
p4
|
p01267 + "&" + p01267 + [ "&" + p01267 ]*<+>
|
p5
|
p01267 + "+" + p01267 + [ "+" + p01267 ]*<+>
|
pX
|
p01267 + "=" + p01267 + [ "=" + p01267 ]*<+>
|
p6
|
[ [SWAPPER] = [NEG] ] & ( "[" + p012345X + "]" ) & [REP] & [SUB]
|
p7
|
[ [SWAPPER] = [NEG] ] & ( "(" + p012345X + ")" ) & [REP] & [SUB]
|
As you can see, two nonterminals -> Skipper & Grammar, doesn't participate in any of
production rules, because those are special.
Grammar nonterminal is the root (starting point) of the whole grammar.
Skipper nonterminal is AEBNF specific rule used in place of "+" PLUS concatenator
to skip on spaces, tabs, comments, etc.
Tell Us What You Think