New blog post!
I've been investigating out how various languages get away with not requiring semicolons.
I looked at 11 languages and found so many interesting cases I had to share!
New blog post!
I've been investigating out how various languages get away with not requiring semicolons.
I looked at 11 languages and found so many interesting cases I had to share!
Maybe the post didn't make this clear enough: I kind of agree with the people who say that we should just require semicolons. I only want to implement optional semicolons if that can be done well.
However, just making semicolons required is also a bit of a reductive argument. There are so many things to take into account! I truly think Gleam doesn't need semicolons for instance.
@terts older languages were all one statement per line, no semicolons. Like Fortran, except it had an explicit continuation column.
Other file formats also stick to one statement per line, but e.g. a leading space on a line makes it a continuation of the preceding line.
The use of semicolons is introduced when you no longer require one statement per line.
@terts I would urge you not to go down this road. Stick to mandatory semicolons. If it were entirely my call, I'd take a step *farther* toward mandatory statement terminators, and make
fn returns_b() -> rettype {
a();
b()
}
a syntax error; you would be obliged to write
fn returns_b() -> rettype {
a();
b();
}
and that would return the value returned by b, unlike Rust and current Roto. To return nothing, you would write
fn returns_unit() {
a();
b();
();
}
@terts This is based on extensive experience with C, Rust, Python, Perl, awk, sh, R, and Javascript, and some exposure to Ruby, Go, and Lua (all of which I can read, but avoid using for unrelated reasons).
My experience has been that only the extremes - Python's "the indentation _alone_ determines block structure; use semicolons only to cram multiple statements onto the same line" and C's "you must put a semicolon at the end of every statement" - avoid confusing people with edge cases.
@terts And I dislike Rust's "leave off the last semicolon to make the block evaluate to the value of the last expression instead of to ()" rule because that makes the value of the block change depending on the presence or absence of one character that otherwise has minimal semantic significance, so your brain learns to ignore it.
The type checker will usually flag this when you get it wrong; it would be a much worse problem in a language where bugs like this are runtime errors.
@terts In your Gleam example with 1 + 1 1 + 1 here are some more interesting cases to consider:
1 + 1 -1 + 1 (two expressions)
1 + 1 - 1 + 1 (one expression)
1 + 1 -x + 1 (one expression)
1 + 1 - x + 1 (one expression)
I verified on the Gleam playground that this is indeed how they parse.
@pervognsen That third Gleam case is...interesting 😄
I think I mention that rule that Swift has in the post!
@terts The biggest issue with comparing semicolon inference in existing languages:
Most ship with an absolutely half-assed implementation because it's an aspect of syntax that is rarely possible to fix after it shipped.
Sadly, it appears your article managed to include only languages with such half-assed implementations.
@terts Look for any language that checks both the token before and the token after the newline to determine whether a semicolon should be inserted.
For instance in this list, have a look at Scala: https://pling.jondgoodwin.com/post/semicolon-inference/#scala
@terts The best approach I found to figure out what needs to go into these before/after sets:
Imagine a hypothetical variant of your language where semicolons are required, then treat any difference between that language and your semicolon-inferred language as a bug in the inference rules.
That pretty much decides 98% of the "how should I actually parse this" ambiguities you might encounter, including the "binary operation split across newline gets treated as unary operator" issue.
@terts BCPL had optional semicolons. I suspect the rules are the same as in Go, working from memory and also Ken Thompson would probably have some interaction w/ BCPL.
For me hacking in Go, the optional semicolons are no problem at all, never were, perhaps because of decades-ago exposure to BCPL. They really aren't necessary, if logic says they are, I think that implies a flaw in the assumptions. Perhaps it is self-selection, but Go users seem not to care.
@terts The "always use semicolons in javascript" advice is only half a solution - it prevents two statements from accidentally being concatenated to one, but it does NOT prevent the parser from breaking other statements in half.
Having the parser (or lexer) second guessing the coder is a double-edged sword that can easily lead to the coder second guessing the parser :-(
I enjoyed your write-up though!
@terts The SPSS language has a couple of syntax modes.
In one of them, a line that begins at the left margin starts a new command. If the first character on a line is + or - or ., then that character is ignored, which allows new commands to visually start indented except for that prefix character. This probably made some kind of sense on punch cards in the 1960s when the language originated.
The other syntax mode is more sensible. A command ends if its line ends in a period or a blank line.