5. Notation

Scopes source code is written in a notation that introduces syntactic rules even before the first function is even written: Scopes List Notation, abbreviated SLN.

Closely related to S-Expressions, SLN can be seen as a human-readable serialization format comparable to YAML, XML or JSON. It has been optimized for simplicity and terseness.

SLN files do not have to contain code on their own. They’re more likely to store configuration or metadata. Therefore, the examples in this document are schema free and do only contain arbitrary data. They’re not necessarily valid Scopes source code.

5.1. At a Glance

In case you don’t have time to read the full documentation, here’s an example that gives you an overview of all notation aspects:

# below is some random data without any schema

# a naked list of five 32-bit signed integers
1 2 3 4 5

# a list that begins with a symbol 'float-values:' and contains a braced
# sublist of floats.
float-values: (1.0 2.0 3.1 4.2 5.5:f64 inf nan)

# we can also nest the sublist using indentation
# note the extravagant heading, another context-free symbol.
==string-values==
    "A" "B" "NCC-1701\n" "\xFFD\xFF" "\"E\""

# a single top-level element, a single-line string
"I am Locutus of Borg."

# a raw block string
""""
    Ma'am is acceptable in a crunch, but I prefer Captain.
                                    -- Kathryn Janeway

# a list of pairs (also lists), arranged horizontally
(1 x) (2 y) (3 z)
# same list, with last two entries arranged vertically
(1 x)
    (2 y)
    (3 z)
# we can line up all entries by using a semicolon to indicate an empty head
;
    (1 x)
    (2 y)
    (3 z)
# parentheses can also be removed for each line entry
;
    1 x
    2 y
    3 z

# appending values to the parent list in the next line
symbol-values one two three four five \
    six seven-of-nine ten

# line continuation can also begin at the start of the next line
::typed-integers:: 0:u8 1:i8 2:i16 3:u16
    \ 4:u32 5:i32 6:u64 7:i64

# which comes in handy when we want to continue the parent list
people like
    jim kirk
    commander spock
    hikari sulu
    \ and many more

# a list with a symbol header and two entries
address-list
    # a list with a header and three more lists of two values each
    entry
        name: "Jean-Luc Picard"
        age: 59
        address: [email protected]
    entry
        # the semicolon acts as list separator
        name: "Worf, Son of Mogh"; age: 24; address: [email protected]
    # line comments double as block comments
    #entry
        name: "Natasha Yar"
        age: 27
        address: [email protected]

# the same list with braced notation; within braced lists,
  indentation is meaningless.
(address-list
    # a list with a header and three more lists of two values each
    (entry
        (name: "Jean-Luc Picard")
        (age: 59)
        (address: [email protected]))
    (entry (name: "Worf, Son of Mogh") (age: 24)
        (address: [email protected])))

# a list of comma separated values - a comma is always recorded as
  a separate symbol, so the list has nine entries
1, 2, 3,4, 5

# a list of options beginning with a symbol in a list with
  square brace style
[task]
    cmd = "bash"
    # the last element is a symbol in a list with curly brace style
    working-dir = {project-base}

5.2. Formatting Rules

SLN files are always assumed to be encoded as UTF-8.

Whitespace controls scoping in the SLN format. Therefore, to avoid possible ambiguities, SLN files must always use spaces, and one indentation level equals four spaces.

5.3. Element Types

SLN recognizes only five kinds of elements:

  • Numbers
  • Strings
  • Symbols
  • Lists

In addition, users can specify comments which are not part of the data structure.

5.3.1. Comments

Both line and block comments are initiated with a single token, #. A comment lasts from its beginning token to the first non-whitespace character with equal or lower indentation. Some examples for valid comments:

# a line comment
not a comment
# a block comment that continues
  in the next line because the line has
  a higher indentation level. Note, that
        comments do not need to respect
    indentation rules
but this line is not a comment

5.3.2. Strings

Strings describe sequences of unsigned 8-bit characters in the range of 0-255. A string begins and ends with " (double quotes). The \ escape character can be used to include quotes in a string and describe unprintable control characters such as \\n (return) and \\t (tab). Other unprintable characters can be encoded via \\xNN, where NN is the character’s hexadecimal code. Strings are parsed as-is, so UTF-8 encoded strings will be copied over verbatim.

Here are some examples for valid strings:

"a single-line string in double quotations"
"return: \n, tab: \t, backslash: \\, double quote: \", nbsp: \xFF."

5.3.3. Raw Block Strings

Raw block strings provide a way to quote multiple lines of text with characters that should not be escaped. A raw block string begins with """" (four double quotes). A raw block string ends at the first newline before a printable character that has a lower indentation.

Here are some examples for valid raw block strings:

""""a single-line string as a block string
# commented line inbetween
""""// a multi-line string that describes a valid C function
    #include <stdio.h>
    void a_function_in_c() {
        printf("hello world\n");
    }

5.3.4. Symbols

Like strings, a symbol describes a sequence of 8-bit characters, but acts as a label or bindable name. Symbols may contain any character from the UTF-8 character set and terminate when encountering any character from the set #;()[]{},. A symbol always terminates when one of these characters is encountered. Any symbol that parses as a number is also excluded. Two symbols sharing the same sequence of characters always map to the same value.

As a special case, , is always parsed as a single character.

Here are some examples for valid symbols:

# classic underscore notation
some_identifier _some_identifier
# hyphenated
some-identifier
# mixed case
SomeIdentifier
# fantasy operators
&+ >~ >>= and= str+str
# numbered
_42 =303

5.3.5. Numbers

Numbers come in two forms: integers and reals. The parser understands integers in the range -(2^63) to 2^64-1 and records them as signed 32-bit values unless the value is too big, in which case it will be extended to 64-bit signed, then 64-bit unsigned. Reals are floating point numbers parsed and stored as IEEE 754 binary32 values.

Numbers can be explicitly specified to be of a certain type by appending a : to the number as well as a numerical typename that is either i8, i16, i32, i64, u8, u16, u32, u64, f32 and f64.

Here are some examples for valid numbers:

# positive and negative integers in decimal and hexadecimal notation
0 +23 42 -303 12 -1 -0x20 0xAFFE
# positive and negative reals
0.0 1.0 3.14159 -2.0 0.000003 0xa400.a400
# reals in scientific notation
1.234e+24 -1e-12
# special reals
+inf -inf nan
# zero as unsigned 64-bit integer and as signed 8-bit integer
0:u64 0:i8
# a floating-point number with double precision
1.0:f64

5.3.6. Lists

Lists are the only nesting type, and can be either scoped by braces or indentation. For braces, (), [] and {} are accepted.

Lists can be empty or contain a virtually unlimited number of elements, only separated by whitespace. They typically describe expressions in Scopes.

Here are some examples for valid lists:

# a list of numbers in naked format
1 2 3 4 5
# three empty braced lists within a naked list
() () ()
# a list containing a symbol, a string, an integer, a real, and an empty list
(print (.. "hello world") 303 606 909)
# three nesting lists
((()))

5.4. Naked & Braced Lists

Every Scopes source file is parsed as a tree of expresion lists.

The classic notation (what we will call braced notation) uses a syntax close to what Lisp and Scheme users know as restricted S-expressions:

(print
    (.. "Hello" "World")
    303 606 909)

As a modern alternative, Scopes offers a naked notation where the scope of lists is implicitly balanced by indentation, an approach used by Python, Haskell, YAML, Sass and many other languages.

This source parses as the same list in the previous, braced example:

# The same list as above, but in naked format.
    A sub-paragraph continues the list.
print
    # elements on a single line with or without sub-paragraph are wrapped
      in a list.
    .. "Hello" "World"

    # values that should not be wrapped have to be prefixed with an
      escape token which causes a continuation of the parent list
    \ 303 606 909

5.4.1. Mixing Modes

Naked lists can contain braced lists, and braced lists can contain naked lists:

# compute the value of (1 + 2 + (3 * 4)) and print the result
(print
    (+ 1 2
        (3 * 4)))

# the same list in naked notation.
  indented lists are appended to the parent list:
print
    + 1 2
        3 * 4

# any part of a naked list can be braced
print
    + 1 2 (3 * 4)

# and a braced list can contain naked parts.
  the escape character \ enters naked mode at its indentation level.
print
    (+ 1 2
        \ 3 * 4) # parsed as (+ 1 2 (3 * 4))

Because it is more convenient for users without specialized editors to write in naked notation, and balancing parentheses can be challenging for beginners, the author suggests to use braced notation sparingly and in good taste. Purists and Scheme enthusiasts may however prefer to work with braced lists almost exclusively.

Therefore Scopes’ reference documentation describes all available symbols in braced notation, while code examples make ample use of naked notation.

5.5. Brace Styles

In addition to regular curvy braces (), SLN parses curly {} and square [] brace styles. They are merely meant for providing variety for writing SLN based formats, and are expanded to simple lists during parsing. Some examples:

[a b c d]
# expands to
(\[\] a b c d)

{1 2 3 4}
# expands to
(\{\} 1 2 3 4)

5.6. List Separators

Both naked and braced lists support a special control character, the list separator ; (semicolon). Known as statement separator in other languages, it groups atoms into separate lists, and permits to reduce the amount of required parentheses or lines in complex trees.

In addition, it is possible to list-wrap the first element of a list in naked mode by starting the head of the block with ;.

Here are some examples:

# in braced notation
(print a; print (a;b;); print c;)
# parses as
((print a) (print ((a) (b))) (print c))

# in naked notation
;
    print a; print b
    ;
        print c; print d
# parses as
((print a) (print b) ((print c) (print d)))

There’s a caveat with semicolons in braced mode tho though: if trailing elements aren’t terminated with ;, they’re not going to be wrapped:

# in braced notation
(print a; print (a;b;); print c)
# parses as
((print a) (print ((a) (b))) print c)

5.7. Pitfalls of Naked Notation

As naked notation giveth the user the freedom to care less about parentheses, it also taketh away. In the following section we will discuss the few small difficulties that can arise and how to solve them efficiently.

5.7.1. Single Elements

Special care must be taken when single elements are defined which the user wishes to wrap in a list.

Here is a braced list describing an expression printing the number 42:

(print 42)

The naked equivalent declares two elements in a single line, which are implicitly wrapped in a single list:

print 42

A single element on its own line is not wrapped:

print           # (print
    42          #        42)

What if we want to just print a newline, passing no arguments?:

print           # print

The statement above will be ignored because a symbol is resolved but not called. One can make use of the ; (split-statement) control character, which ends the current list:

print;          # (print)

5.7.2. Wrap-Around Lines

There are often situations when a high number of elements in a list interferes with best practices of formatting source code and exceeds the line column limit (typically 80 or 100).

In braced lists, the problem is easily corrected:

# import many symbols from an external module into the active namespace
(import-from "OpenGL"
    glBindBuffer GL_UNIFORM_BUFFER glClear GL_COLOR_BUFFER_BIT
    GL_STENCIL_BUFFER_BIT GL_DEPTH_BUFFER_BIT glViewport glUseProgram
    glDrawArrays glEnable glDisable GL_TRIANGLE_STRIP)

The naked approach interprets each new line as a nested list:

# produces runtime errors
import-from "OpenGL"
    glBindBuffer GL_UNIFORM_BUFFER glClear GL_COLOR_BUFFER_BIT
    GL_STENCIL_BUFFER_BIT GL_DEPTH_BUFFER_BIT glViewport glUseProgram
    glDrawArrays glEnable glDisable GL_TRIANGLE_STRIP

# braced equivalent of the term above; each line is interpreted
# as a function call and fails.
(import-from "OpenGL"
    (glBindBuffer GL_UNIFORM_BUFFER glClear GL_COLOR_BUFFER_BIT)
    (GL_STENCIL_BUFFER_BIT GL_DEPTH_BUFFER_BIT glViewport glUseProgram)
    (glDrawArrays glEnable glDisable GL_TRIANGLE_STRIP))

This can be fixed by using the splice-line control character, \:

# correct solution using splice-line, postfix style
import-from "OpenGL" \
    glBindBuffer GL_UNIFORM_BUFFER glClear GL_COLOR_BUFFER_BIT \
    GL_STENCIL_BUFFER_BIT GL_DEPTH_BUFFER_BIT glViewport glUseProgram \
    glDrawArrays glEnable glDisable GL_TRIANGLE_STRIP

Unlike in other languages, and as previously demonstrated, \ splices at the token level rather than the character level, and can therefore also be placed at the beginning of nested lines, where the parent is still the active list:

# correct solution using splice-line, prefix style
import-from "OpenGL"
    \ glBindBuffer GL_UNIFORM_BUFFER glClear GL_COLOR_BUFFER_BIT
    \ GL_STENCIL_BUFFER_BIT GL_DEPTH_BUFFER_BIT glViewport glUseProgram
    \ glDrawArrays glEnable glDisable GL_TRIANGLE_STRIP

5.7.3. Tail Splicing

While naked notation is ideal for writing nested lists that accumulate at the tail:

# braced
(a b c
    (d e f
        (g h i))
    (j k l))

# naked
a b c
    d e f
        g h i
    j k l

…there are complications when additional elements need to be spliced back into the parent list:

(a b c
    (d e f
        (g h i))
    j k l)

Once again, we can reuse the splice-line control character to get what we want:

a b c
    d e f
        g h i
    \ j k l

5.7.4. Left-Hand Nesting

When using infix notation, conditional blocks or functions producing functions, lists occur that nest at the head level rather than the tail:

((((a b)
    c d)
        e f)
            g h)

The equivalent naked mode version makes extensive use of list separator and splice-line characters to describe the same tree:

# equivalent structure
;
    ;
        ;
            a b
            \ c d
        \ e f
    \ g h

A more complex tree which also requires splicing elements back into the parent list can be realized with the same combo of list separator and splice-line:

# braced
(a
    ((b
        (c d)) e)
    f g
    (h i))

# naked
a
    ;
        b
            c d
        \ e
    \ f g
    h i

While this example demonstrates the versatile usefulness of splice-line and list separator, expressing similar trees in partially braced notation might often be easier on the eyes.

As so often, the best format is the one that fits the context.