Table of Contents
TreeDL (Tree Description Language) - is a language for description of tree data structures used in programs and operations on them.
Tree structure description is clear, compact and independent of specific programming language. It can be used as specification of data structures and as source for generation of program code (in various languages) that implements these data structures and operations on them.
Operations in TreeDL language allow to define operations on trees with no changes in tree structure description. The usage of specialized notation for definition of operations eliminates errors caused by inconsistency between tree structure and operation implementation.
Tree structures are used in various kinds of problems. The main kind of problems is processing of formal text, that is text written in some language with formal syntax - in programming language, markup or data description language. TreeDL is used for description of formal text internal representation in program, similar to abstract syntax tree or document object model.
The internal representation of formal text is a main subsystem of programs that process formal texts: syntax analyzer (parser) of formal text builds its internal representation, semantic analyzer checks static semantics restrictions and fills in semantic information, other subsystems use internal representation to further analysis and results generation.
TreeDL allows to define a set of node types that can be used in a tree. A node type defines named relations between parent and children nodes and additional data related to node.
This document describes syntax and semantics of TreeDL. Language constructs are described in the following chapters. For each construct syntax grammar rule is listed, semantics is described and examples of usage are given. Appendixes contain lexical and syntax grammars and semantic restrictions on correct tree description.
Table of Contents
This chapter describes TreeDL aspects related to more than one constructs in tree structure description. These are declarative information, documentation comments and type usage. Further description of each construct is linked to proper common aspects.
Each entity in a tree description can be linked with additional declarative information. It expands capabilities of tree description without change of tree description language. Declarative information is available during tree description analysis and can be used by a tool that process tree description. Keeping declarative information directly in tree description allows to do without additional configuration files. But it is recommended to keep in tree description only information that is immutable for tree description.
Declarative information is specified as a set of named properties:
|
Property name qid
is qualified identifier.
Property value - boolean <BOOL_LITERAL>
,
integer <INT_LITERAL>
or string
<STRING_LITERAL>
.
Example:
[ treedl.language = "java"; has.key = true; max.length = 8; ]
Here three properties are declared with string, boolean and integer values.
For each entity in a tree description can be defined documentation comment. Documentation comments are available during tree description analysis. Format of documentation comment depends on target language and processing tool. For example, when translating to Java programming language, documentation comments can be used as javadoc comments.
Documentation comments <DOCOMMENT>
can be
only in certain places described in syntax grammar of TreeDL language.
Lexical structure of documentation comments are described in Appendix 1, Lexical grammar.
Example:
/** * Documentation comment */
TreeDL language allows to declare node
types and enumeration types. A type
has name <type:ID>
that is specified when type
is referenced. If type is used in module
differrent from module where it was declared, type name should be
qualified by synonym of module
<module:ID>
where type was declared.
Qualified type name
is type name <type:ID>
qualified by synonym of
module <module:ID>
if needed.
Used type can be defined using one of the following constructs:
|
where type can be:
one of the predefined types
predefined_type
;
node type or enumeration type referenced by qualified
name type_ref
;
custom type <TYPE>
with modifier defining cardinality of type:
without modifiers - exactly one element;
"?"
- 0 or 1 element (optional
attribute);
"*"
- 0 or more elements (attribute value
is a list of values of the specified type);
"+"
- 1 or more elements (attribute value
is a list of values of the specified type).
Example:
int
- predefined integer type;
T - node type or enumeratino type defined in TreeDL;
T?
- optional type whose values are values
or type T
or null
;
<java.util.Date>*
- a list of 0 or
more elements of type java.util.Date
defined in
target language.
Table of Contents
TreeDL description unit is a module
|
that consist of
There are two kinds of modules - structure description modules and operation modules.
Structure description module declares a set
of node types that can be used in a tree, and may be operations on them.
This module kind specified by keyword tree
in module
header.
Operation module defines operations on nodes
whose types are declared in referenced structure description modules.
This module kind specified by keyword module
in
module header.
Full name of module is qualified identifier used to reference module from other modules.
Example:
tree com.unitesk.atp.treedl.TDL; ...
Structure description module with full name
com.unitesk.atp.treedl.TDL
.
module com.unitesk.atp.treedl.Checker; ...
Operation module
com.unitesk.atp.treedl.Checker
.
A module can use types declared in other modules. For that full names of referenced modules should be specified.
|
To use type declared in referenced module its name should be
qualified by synonym of module. By default synonym
of module is the last identifier of full name. In case of conflicts or
other reasons, synonym of module can be defined explicitly using
<synonym:ID>
. Full name of module itself also
defines synonym. Scope of visibility of synonyms is module where they
are declared.
Example:
module com.unitesk.atp.treedl.Checker : TreeDL = com.unitesk.atp.treedl.TDL; ...
Operation module com.unitesk.atp.treedl.Checker
uses referenced structure description module
com.unitesk.atp.treedl.TDL
defining
TreeDL
synonym.
|
Result of module translation to programming language can be
modified using customization code. It is possible to specify base types
<TYPE>
, insert code
<CODE>
from header
construct
before and code from body
construct into result of
module translation. Detailed description of processing of these
constructs is in description of translation scheme for specific
programming language.
Example:
[ treedl.language = "java"; ] module com.unitesk.atp.treedl.TDL; header { import java.util.Map; } body { public static Map getTypes() { ... } }
Code from header
è body
constructs is inserted before and into generated class that is the
result of module translation to Java programming language:
package com.unitesk.atp.treedl; // header code import java.util.Map; public class TDL { // body code public static Map getTypes() { ... } ... }
Table of Contents
Declaration of node type
|
consist of:
list of node type members: attributes, children and customization code - body and constructor
Node type name is identifier
<name:ID>
used to reference node type from any
node type declaration or operation definition in the same module.
From other modules node type can be referenced by qualified type name.
If node type marked with abstract
modifier, it
is abstract. Abstract node type can not be
instantiated and can be used as base type for other node types. Abstract
node types correspond to abstract classes in programming
languages.
If node type marked with root
modifier, it is a
type of root node of a tree.
Example:
abstract node Statement { ... }
Abstract base node type Statement
.
root node CompilationUnit { ... }
Root node type.
A node type declaration can be based on other - base
node type declaration. All members of base node type with
qualified type name
type_ref
are inherited and can be used in inherited
node type. By default a node type is inherited from abstract base node
type Node
:
abstract node Node { attribute late Node? parent; }
This node type defines attribute
parent
containing link to parent node or
null
for root node.
Example:
node IfStatement : Statement { ... }
Node type IfStatement
inherited from
Statement
.
A result of node type translation to programming language can be
modified using customization code. It is possible to specify base types
<TYPE>
of node type. Detailed description of
this construct is in scheme of translation for specific programming
language.
Also a code from constructor and body members is inserted in result of node type translation..
The are the following kinds of node type members:
Children define types and number of children node types;
attributes define additional information linked to nodes. Children are particular case of attributes;
Customization code that is inserted in result of node type translation.
|
An attribute links named value of arbitrary type with tree node. Attribute provides two operations to access a value:
get - get current value;
set - set a new value.
Children are
particular case of attributes that define tree structure. A value of
child is child node or list of child nodes. Any tree node except root
node is child node for exactly one parent node.
All tree nodes have attribute
parent
that contains reference to parent node or
null
for root node.
Attribute definition
|
contains of:
keyword attribute
- for attributes,
child
- for children
modifiers
type of attribute
There are two storing modes of attribute value - default and custom.
If the default storing mode of attribute
value is used, operations to access value are created automatically
when tree description is translated to a programming language. A
value of an attribute can be stored, for example, in a field of an
object that corresponds to tree node. The default mode of attribute
storing is used if custom
modifier is not
specified.
The custom storing mode of attribute
value is used if custom
modifier is specified.
Operations to access value should be defined by user using get and
set constructs. Customization code from these constructs is inserted
into the corresponding operations when tree description is
translated to a programming language.
|
Customization code from
get
and set
constructs can
also be used for attributes with the default storing mode. A code of
get construct can use and change a value of variable
<name:ID>
that will be returned as a value
of attribute.
In the default attribute storing mode this variable will be initialized by the current value of attribute.
In the custom attribute storing mode this variable should be initialized by customization code.
Don't insert code that returns value explicitly - it will be
generated automatically. A code of set construct can use a value of
implicitly declared parameter
<name:ID>
.
In the default attribute storing mode, epilogue that set a
value of attribute to value of
<name:ID>
will be generated.
In the custom attribute storing mode, customization code should store a value of attribute explicitly.
Example:
attribute string fullName;
A value of attibute fullName
is stored in a
field of type string
.
attribute custom string name get { name = fullName.substring( fullName.lastIndexOf( '.' ) + 1 ); } set { ... };
A value of attribute name is not stored but calculated from
attribute fullName as defined in code of constructs
get
and set
.
If modifier late
is not specified then
attribute value should be specified when tree node is
created.
If modifier late
and attribute
initializer are specified then during tree node construction
set-operation with the parameter
<init:CODE>
will be called.
If modifier late
is specified and
initializer is not specified then attribute value before the
first call of set-operation will be defined by translation
scheme for specific programming language and restrictions on
attribute value specified by attribute type (for example,
null
value for non-optional attributes) or by
customization code of set-operation, may be violated.
By default set-operation may be called any times.
If modifier setonce
is specified then
set-operation may not be called more than once. For non-late
attributes or late-attributes with initializer it corresponds to
call of set-operation during tree node construction.
If modifier noset
is specified then
set-operation is not defined and its value should be calculated
by get-operation.
An attribute marked by modifier abstract
only declares attribute name and type, but doesn't define its
storing mode. Attribute storing mode is defined by inherited node
type.
Abstract attribute can be declared only in abstract node type.
Example:
abstract node NamedNode { abstract attribute string name; } node DefaultNamedNode : NamedNode { attribute string name; } node CustomNamedNode : NamedNode { attribute custom string name get { ... } set { ... }; }
Abstract node type NamedNode declares abstract attribute name.
Inherited node type DefaultNamedNode
defines
default storing mode for inherited attribute
name
. Inherited node type
CustomNamedNode
defines custom storing mode for
inherited attribute name
and operations on
them.
An attribute declared in base node type can be overridden. Overriding attribute should have the same type and name. For overriding attribute, additional customization code can be specified, initialization time and access mode can be changed. For overridden abstract attribute, storing mode can be defined.
If overridden attribute is not abstract then overriding
attribute shoulbe have modifier override
. Because
attribute storing mode already set in this case, modifier
custom
should be specified or not specified for
both overridden and overriding attributes simultaneously.
Inherited node type can change late attribute to non-late. But non-late attributes should remain non-late.
Allowed number of calls of set-operation can not be changed.
That is, modifier setonce
shold be specified or
not specified for both overridden and overriding attributes
simultaneously.
Overriding attribute can't forbid set-operation defined by overridden attribute. So, noset modifier can't arise in overriding attribute..
Example:
node BaseNode { attribute late int+ intList; } node MyNode : BaseNode { attribute override int+ intList set { ... }; }
The node type MyNode
overrides attribute
intList
: now it is non-late and set operation has
additional code.
Children are particular case of attributes. Child is
specified by child
keyword. Children give
additional restrictions:
Child can't have modifiers abstract
,
custom
or noset
.
A value of child is not arbitrary object, but child node or list of child nodes.
Value of parent
attribute of each child node is parent
node. A node can't be child for more than one other node.
Example:
node IfStatement : Statement { child Expression condition; child Statement thenStatement; child Statement elseStatement; }
Node type IsStatement
defines
condition
, thenStatement
è
elseStatement
children that are root nodes of
corresponding subtrees.
Customization code of construct
|
will be inserted into result of translation of node type to programming language.
Customization code of construct
|
will be inserted in constructor of node type after initialization of children and attributes. This code, for example, can check additional requirements for group of children and attributes of node. It is preferable to write requirements for a single child (attribute) as set-code for it.
Example:
node TryStatement { child Block block; child CatchClause* optCatchClauseList; child Block? optFinallyBlock; constructor { if( sizeOptCatchClauseList() == 0 && getOptFinallyBlock() == null ) { throw new IllegalArgumentException("There is neither catch clause nor finally block"); } } }
During construction of node of type
TryStatement
exception is thrown if node doesn't
correspond to correct try
statement of Java
programming language - both catch and finally blocks can't be
omitted.
Table of Contents
Declaration of enumeration type
|
contains of:
Name of enumeration type is identifier
<name:ID>
used to reference this type from any
other declaration in the same module.
From other modules enumeration type can be referenced by qualified type name.
A range of enumeration type depends on given constants. A set of
these constants is union of a set of base enumeration type
type_ref
(if specified) and set of constants
<const:ID>
specified in declaration of this
enumeration type.
There are two kinds of enumeration types:
enumeration and set of flags.
A kind of enumeration type is specified by keywords
enum
and flags
respectively. Range
of enumeration is a set of all enumeration type constants. Range of set
of flags is a set of all subsets (including empty subset) of a set of
all enumeration type constants.
Example:
enum Color { RED, GREEN, BLUE }
Range of enumeration Color
consists of three
values: RED
, GREEN
,
BLUE
.
enum ExtendedColor : Color { WHITE, BLACK }
Range of set of enumeration ExtendedColor
consists of five values: three values of base enumeration type
Color
and two additional values
WHITE
and BLACK
.
flags Modifiers { ABSTRACT, CUSTOM, LATE, OVERRIDE, NOSET, SETONCE }
Range of set of flags Modifiers
constsis of 64
combinations of 6 flags.
Table of Contents
Operation on tree is function whose parameter
can be of node types, enumerarion types and custom types. Formal parameter of node type
or enumeration type can be marked by keyword virtual
.
In this case, there should be explicitly defined separate function
implementation for each variant of
parameters - for each value of the corresponding enumeration type
or for each known inheritor of node type. For enumerations it is
equivalent to usual switch
statement:
Example:
enum Sign { PLUS, MINUS, MULT, DIV } operation string toString( virtual Sign sign ) { case( PLUS ): { return "+"; } case( MINUS ): { return "-"; } case( MULT ): { return "*"; } case( DIV ): { return "/"; } }
Operation toString
defines string representation
of enumeration type.
Example:
abstract node Expression { child Expression left; child Expression right; } node AdditionalExpression : Expression { /*... */ } node MultiplicativeExpression : Expression { /* ... */ } node RelationalExpression : Expression { /* .. */ } node EqualityExpression : Expression { /* .. */ } enum Type { INT, BOOL, STRING } operation Type getType( virtual Expression expr ) { case( AdditionalExpression expr ): { if( getType( expr.getLeft() ) == Type.STRING || getType( expr.getRight() ) == Type.STRING ) { return Type.STRING; } else { return Type.INT; } } case( MultiplicativeExpression expr ): { return Type.INT; } case( RelationalExpression expr ): case( EqualityExpression expr ): { return Type.BOOL; } }
Operation getType defines rule of expression type calculation.
Operation case should be defined for each non-abstract inheritor of node
type Expression
. Operation cases can share
implementation.
Operations on tree enable to define virtual operations on node types without changings in node types declarations itself. Other way to implement virtual operations without changings in node types declarations - to use popular design pattren Visitor. But visitors don't allow to use arbitrary set of parameters and return type. And visit method should be defined for each node type in module although operations often are defined only for some subset of node types.
If changes concern not only set of operations but set of node types, the usage of visitors can cause errors not caught by compiler. Often visitors use default implementation and override visit methods only for some needed node types. When new node type appears, new visit method is added to default implementation. Inherited visitors remain correct from compiler's point of view, but action for new node type remains unimplemented!
Operations let avoid such errors, because operation case should be defined for each non-abstract inherited node type. This condition doesn't require significant efforts when new node type appears because implementation can be shared between several operation cases.
Operation definition
|
consists of:
optional modifier - virtual
Signature of operation declares:
result type of operation
name of operation
a list of formal parameters of operation
A formal parameter of operation has name and type. Parameter of
node type or enumeration type can be marked by modifier
virtual
. Virtual parameters define set of operation
cases.
Operation case
case_signature
|
consists of:
list of parameters
A list of parameters of operation case is defined by set of
virtual parameters of operation. In operation case for each virtual
parameter of operation one of variants of this
parameter is specified. Variants of parameter of enumeration type are
constants of this type. Variants of parameter of node type are
parameters with the same name and its type is non-abstract inherited
node type declared in one of modules directly or indirectly referenced
by module where operation is declared. If abstract base node type
Node
is specified as a type of virtual parameter,
operation is defined on all node types in all referenced modules.
Exactly one operation case should be defined for each combination
of variants of virtual parameters. When operation is called, code
<CODE>
of operation case corresponding to set
of actual parameters will be executed. Operation case can be defined
either in body of operation either in one of referenced
operations.
Operation is defined on all combinations of virtual parameters, whose node types are declared in modules directly or indirectly referenced by module where operation is declared. Operation can use one or more operations declared in these modules. Qualified (by synonym of module) names of referenced operations are specified after signature of operation. Inherited operations should have the same types of parameters. Referenced operations should be consistent, that is two referenced operations for each common combination of virtual parameters should use the same case, defined in one of these operations or in operation referenced by both of them.
Example:
tree P; node A {}
tree Q : P; abstract node X {} node B {} operation string F( virtual Node n ) { case( P.A n ): { return "A"; } case( B n ): { return "B"; } }
Operation F
is defined for all node types
declared in module Q
and referenced module
P
. Operation case is defined for each of
non-abstract node types.
tree R : Q; node C {} operation string G( virtual Node n ) : Q.F { case ( C node ): { return "C"; } }
Operation G
is defined for all node
types, defined in module R
and referenced modules
P
and Q
. For all node
types from modules P
and Q
operation Q.F
is used. So, operation cases are
defined only for node types from module R
.
If there is another module S that also use module P, and operation H defined on these modules:
tree S : P; node D {} operation string H( virtual Node n ) { case( P.A n ): { return "P.A"; } case( D n ): { return "S.D"; } }
when trying to extend the definition of operation
R.G
on module S using operation H:
tree R : Q, S; operation string G( virtual Node n ) : Q.F, S.H { case ( C node ): { return "C"; } }
there will be conflict because operations
Q.F
and S.H
use different cases
for node type P.A
: the result of operation
Q.F
is "A"
, but the result of
operation S.H
- "P.A"
. To
resolve this conflict, either remove reference to one of this operations
and redefine its cases, either extract implementation of operation for
module P
to separate operation and use it by both
operations:
tree P; node A {} operation string E( virtual Node n ) { case( A n ): { return "P.A"; } }
tree Q : P; abstract node X {} node B {} operation string F( virtual Node n ) : P.E { case( B n ): { return "B"; } }
tree S : P; node D {} operation string H( virtual Node n ) : P.E { case( D n ): { return "S.D"; } }
tree R : Q, S; operation string G( virtual Node n ) : Q.F, S.H { case ( C node ): { return "C"; } }
This definition of operation R.G
is
correct because both referenced operations on module
P
use the same implementation of operation
P.E
.
Table of Contents
Tokens separation are forced by white space symbols and comments. White space symbols are the following ASCII symbols:
SP - space;
HT - horizontal tab;
FF - form feed;
line separators.
Single line comment starts with pair of symbols
'//'
and lasts till end of line.
Multiline comment starts with pair of symbols
'/*'
not followed by '*'
and
lasts till pair of symbols '*/'
Documentation comment <DOCOMMENT>
starts with sequence of symbols '/**'
and lasts
till pair of symbols '*/'
Comments are recognized during lexical analysis of tree
description and are removed from sequence of tokens before syntax
analysis stage. Documentation comments
<DOCOMMENT>
are exception from this
rule.
|
Identifier <ID>
starts from letter
LETTER
followed by sequence from 0 or more letters or
digit DIGIT
. Symbol '@'
at the
start of identifier is not significant and used as prefix to enable
keywords. That is @id
and id
are
the same identifiers.
The following identifiers are reserved as keywords and can not be used as identifiers without prefix:
abstract attribute body bool case char child constructor
custom double enum false flags float get header int late long module
node noset object operation override root set setonce short string tree
true virtual void
Types defined in programming language
<TYPE>
when used in TreeDL are wrapped by angle
brackets '<' ... '>'
. Inside brackets symbols
'\', '<'
è '>'
should be
escaped by '\'
. Paired '<' ...
'>'
don't require escaping.
Values of properties can be boolean, integer and string literals:
|
Used BNF notation:
"token"
- terminal symbol in syntax
grammar;
<name:LEXEME>
- terminal symbol of type
LEXEME
defined in lexical grammar. Optional name
name
is used to distinguish terminals of type
LEXEME
inside rule.
rule
- non-terminal symbol, rule name.
( ... )?
- optional part of a rule.
( ... )*
- repeat 0 or more times.
( ... )+
- repeat 1 or more times.
( ... | ... | ... )#
- repeat 0 or more times
but each alternative can not be used more than once.
|