Template Language

Spoofax

About

The "template language" is a language that aims to eliminate redundancy in the specification of grammar, completion templates, and pretty printer. It borrows syntax from SDF and StringTemplate.

Syntax Overview

Template language sections
templates
   t*
Section with templates t*
template options
   o*
Section with options o*
Productions
Sort = < e* > Template with elements e*
Sort.Constructor = < e* > Template with elements e*
Sort = [ e* ] Template with elements e* (alt. brackets)
Sort.Constructor = [ e* ] Template with elements e* (alt. brackets)
Placeholders
<Sort> Placeholder (1)
<Sort?> Optional placeholder (0..1)
<Sort*> Repetition (0..n)
<Sort+> Repetition (1..n)
<Sort*; separator="\n"> Repetition with separator
<Sort; text="hi"> Placeholder with replacement text
<Sort; hide> Placeholder hidden from completion template
Escapes
<\\\'\"\ \t\r\n> Element containing escaped characters
<\u0065> Unicode escape
<\\> Escape next line break
<> No output; layout will be allowed here in the grammar
\<, \>, \[, \], \\ Escaped brackets / backslash (prefer alt. brackets)
Priority specification
context-free priorities
  {left: Exp.Times Exp.Over} >
  {left: Exp.Plus Exp.Minus}
Refer to templates using sort and constructor name
Lexical syntax
lexical syntax
  ID = [A-Za-z] [A-Za-z0-9]*
Lexical syntax similar to SDF
Template options
keyword -/- [A-Za-z0-9] Follow restriction for keywords
tokenize : "()" Layout is allowed around (sequences of) these characters
newlines : none Experimental: generate grammar that requires newline characters.
Possible values: none, separating, leading, trailing

Parsing

For each syntax template there is a simple mapping to a SDF production. The semantics of the parser for the syntax template is given entirely by the generated SDF productions.

  • Literal strings are tokenized on space characters (whitespace, tab). Each token results in one literal in the production. That is, hello world maps to "hello" "world", thus allowing any layout between the two tokens.
  • Additionally, literal strings are tokenized on boundaries between characters from the set given by the tokenize option, and any other characters. That is, if() maps to "if" "(" Exp ")", to allow layout between the if keyword and the parentheses.
  • Placeholders translate literally. If a separator option containing any non-layout characters is given, the placeholder maps to a list with separator.

Unparsing / pretty printing

Overview

The pretty printer that is generated from syntax templates consists of a number of Stratego strategies. Each matches a particular pattern in the AST. If the pattern matches, the strategy (recursively) invokes the pretty printing strategies for the child nodes, and combines the results into a BOX AST. This BOX AST can then be converted to a string using the box2text-string strategy from libstratego-gpp.

For example, for a simple plus-expression syntax template:

  Exp.Plus = <<Exp> + <Exp>> {left}

the pretty printing strategy that may be generated is:

  prettyprint-Exp:
    Plus(a, b) ->
      [ H([SOpt(HS(), "0")], [ <pp-one-Z(prettyprint-Exp)> a, S(" + "), <pp-one-Z(prettyprint-Exp)> b ]) ]

As you can see, the pretty printing strategies return a list of boxes that the caller should wrap in a V or Z box (among other things, that is what pp-one-Z does). The user is responsible for this when invoking one of the prettyprint-* strategies from their code!

All pretty printing strategies that are specific to a symbol are hooked up to one global prettyprint-LanguageName strategy, so that it is easy to pretty print an arbitrary subtree of an AST without knowing the type of the root node in advance:

  prettyprint-LanguageName = prettyprint-Exp

Parenthezation

In Spoofax, it is common to not parse parentheses to an AST node. After all, the fact that parentheses are necessary at some point in an expression can be inferred from the priorities of the operators (which are specified in the grammar) and the structure of the AST. The tool sdf2parenthesize does exactly this, and is integrated into SpoofaxLang? via the parenthesize-LanguageName strategy. Invoking the parenthesize-LanguageName strategy results in the addition of Parenthetical constructors to the AST at every place where the structure of the AST conflicts with the operator priorities specified in the grammar. The generated pretty printer knows how to handle the Parenthetical constructor, because a syntax template for parentheses with the bracket attribute, such as:

  Exp = <(<Exp>)> {bracket}

generates the pretty printing strategy:

  prettyprint-Exp:
    Parenthetical(a) ->
      [ H([SOpt(HS(), "0")], [ S("("), <pp-one-Z(prettyprint-Exp)> a, S(")") ]) ]

How to use

The idiomatic way to invoke the generated pretty printer, including parenthezation, and a conversion to string (with word wrap at the 100th column, if applicable), is:

  prettyprint-LanguageName-string =
    parenthesize-LanguageName
    ; prettyprint-LanguageName
    ; !V([], <id>)
    ; box2text-string(|100)

-- TobiVollebregt - 01 Feb 2012