Towards a more literate program : spaces in symbol names

Michael Francis
8 min readSep 10, 2022
Photo by Aleks Marinkovic on Unsplash

Do spaces really need to be special in programming and can we make reading programs more accessible?

Feel free to skip the background and go straight to the discussion on why it’s ok to have spaces in symbol names in a C like language.

Is it possible to build a structured programming language that is more literate? I’m choosing to use the word literate in a more expansive form than its dictionary definition; a literate person can read and write. So what is a literate programming language? Firstly it likely is not one of the esoteric programming languages

Often defined by their difficulty to be written or read. A fantastic example of this is Piet, perhaps my favorite of these.

Which uses a visual code and pictures to describe the program. You can spend a few happy hours or days down this rabbit hole, and I highly recommend doing so.

Traditionally ‘literate programming’ is identified with no less a luminary than Donald Knuth .

More recently, languages such have Julia (Julia.org) have taken up the mantle. For example this library for Julia https://fredrikekre.github.io/Literate.jl/v2/

Today in 2022, Python/Jupyter notebooks are the most common and popular close-to literate solutions.

One of the most compelling examples, though, is the work performed by Mykel Kochenderfer for Lincoln Labs and FAA collision avoidance simulation. This application used the Julia language to support a self documenting, more literate style.

In the video, you will see the linkage between the textual descriptions and the math. Quite eye-opening to have this linkage on something so critical.

Perhaps less identified, GoLang ( go.dev ) supports a literate mode for go examples. Unfortunately, this is today primarily limited to examples and documentation, but gives a glimpse of what is possible.

Note, go implements this test/example logic through its rich support of reflection and thus does not have to be limited to a specific area.

I provide a short example of a ‘test example’ in

Full details of usage may be found here.

One other area that Go excels in is code generation. This support might surprise people, but the ability to generate documentation and additional implementations from a common source code base using reflection makes for a much more friendly and debuggable environment.

Over the years, I have vacillated between code generation and runtime code. There is at least one production language where I wrote the Flex and Bison, parser and lexer currently in active use, generating significant revenue for over twenty years. Then there were periods in my career where I focused on runtime dependency graphs and their associated complexity. Both have their place, but the generated code has always, in my experience, made for more supportable code in the literate space. Having IDEs that provide code completion and documentation from the generated code is a game changer.

So what is next? By design, languages like Go have limitations that restrict direct language extension. However, Julia ( and other macro-supporting) languages have the potential to support directly a more literate style.

One problem I often see that is unlikely to be fixed with macros, few languages allow one to write a sentence and have it operate as code.

Sidebar, Perhaps the best example of using enligsh to drive code today https://cucumber.io, Which provides an exceptionally literate statement style for the building test case, specifically BDD. https://cucumber.io/docs/bdd/

Spaces, can we have them back?

One thing that has often bugged me about programming languages is the requirement to use camel cases or underscores to define functions, variables, and other symbols that are composed of multiple English words, for example.

NodeList 

or

node_list (old school )

but why not

Node List

Why do we imbue the space character with such special meaning in programming languages? It’s even more so with tools such as YAML and Python. Don’t get me started.

Many people jump to the conclusion that spaces are ambiguous in the grammar of a program. I hope to show that this does not have to be the case and we are just driven by convention.

Consider a function definition, I’ll use Go as the counter-example, but it applies to most C-based languages.

In Go, spaces are used as seperators in a number of locations, for example argument types, or the definition of the function name. Take the following.

func IsHello( arg string ) ( bool, error ) {
if arg == “error” {
return “”, fmt.Error( “an error” )
}
if arg == “hello” {
return true, nil
} else {
return false, nil
}
}

Let’s take this step by step; why is the func keyword special? You can alternatively write the following form, which is common with Lambdas.

IsHello := func( … ) …

In doing so I have removed the use of space as a separator and am treating func as a function type constructor. This change is incidentally not a bad mental model. We have defined a symbol in the scope which reserved the name IsHello as a value type that points to a function.

Now let’s look at the argument list. Again we see the use of space as a separator, but, if we define the colon :as a separator between the definition of the symbol pointing to the variable and the type of the variable, we could rewrite this to be. This is not valid Go code now.

IsHello := func( arg : string ) …

Hold on, this sort of syntax already exists in Go, when we initialize a map

m := map[string]string{
“One” : “first”,
“Two” : “second”,
}

Let us define the colon :operator to mean ‘pair’, so a function takes a list of pairs. A map constructor takes a list of pairs etc.

func( ...pair)

If we define the symbol := to be the bind operator, we are saying that the variable is bound to the result of the right-hand side. Note, pair is a degenerate ( but very useful tuple )

With these changes, we should be able to write

v:string := “hello”

or

v:pair( string, int ) := “hello”: 1

With the GoGeneric syntax that is closer to

v : [string,int]Pair := “hello” : 1 

So what did those changes allow us to do? Well, we could conceivable write the following.

Is Hello := func( arg : string ) ( bool, error ) {
// Code here
}

We have not introduced any ambiguity in the grammar here, but we have allowed the use of space, taken further, and adding named return parameters. With these changes, the following would parse.

Is Hello := func( My Arg : string ) ( My Result : bool, err : error ) {
// Code here
}

Making these changes requires us to eliminate ambiguities in expressions like for and if. We can directly resolve these by requiring parenthesis.

if ( … ) { … } else { … }
for (… ) { … }

This process also introduces the notion that if and for are really not `special`. Currently, these methods are defined as built-ins that reserve symbols in the global scope and have specific treatment in the lexer. The same is true of the return keyword. This can look more like a function.

return( value, nil )

GoLang is a relatively small language, although there are a few other notable places — the go usage in go routines and a few special cases like channels.

go func() { … }()

There are several ways to change the syntax; perhaps the most simple is to treat the go command as a function that takes an invoked function. The defer keyword is is similar

go( func() {….}() ) 
defer( func() {…}() )

The other option would be to overload the usage of the <- ‘channel’ operator. Think of the go symbol as a global channel that takes function invocations and defer is a function scoped symbol that does the same so

go <- func() {…}() 
defer <- My Opened File.close()

Making or defining a channel is a little tricker, personally I find the current syntax clumsy.

c := make( chan int, 0 ) 

Why chan is chan not a builtin type ? Then, using the new generic style

c := make( [int]chan, 0 ) 

Could be a more reasonable and consistent way to handle these definitions. But wait, what about directional channels? Why do they have specific syntax at all?

c := make ([int]inchan, 0 )

Or in our new world

c := make( [int]in chan, 0 )

where in chan chan and out chan are different types.

See my musings on how more consistency could be brought to channels, defer et al.

Now, I don’t expect any of this to happen to Go, but when a new language comes up, perhaps it will not assume that spaces are unique and that you can have more literate and readable statements in your code.

My Funtion That Calls Google := func( 
My Query : string,
Configuration Flags : …Flag Set )
{
// Lots of code here
}
main := func() {
Proxy Connection := Make Proxy( )
defer <- Proxy Connection.Close()
go <- My Function That Calls Google( “Literate Programming”, Proxy Connection )
// Wait for routine to complete
}

It’s not perfect but, in my opinion, it is a step forward.

Sidebar, Interestingly, well, I think so, if we allow the lexer to support prefix binding, it is only the types that need to be updated to enable this space insensitivity. Essentially we would reserve the prefixes of — if, for, var, etc supporting mostly backward compatible behavior…

Thoughts, comments?

--

--