syntax...

The Scheme Programming language

cr88192
I am mostly just requesting oppinions here, feel free to ignore if you don't
have any real comment.

yes, I have before heard the comments many have about non-s-expression
syntaxes. my main complaint is mostly that I believe pure s-expressions are
capable of scaring off newbies, and that to "just use emacs/..." is not a
suitible answer in this case (since a newbie will likely be unaware of
things like emacs...).


for my implementation once again the issue of syntax comes up.
I had been forced back to s-expressions as my experience has shown that a
pure line syntax (of the form I created for lb2) is worse than an
indentation syntax for general use (at least this is my impression thus
far).

problems I had with my lb2 syntax:
worse linebreak rules than lb (ugly and inconvinient);
multiple parser block types;
weird syntax/parser behavior that I kept accidentally stepping on;
...

so, now I have gone back in the direction of s-expressions, and have gone
out in another direction.
I created a syntax I am calling lb3, this one is closer to s-expressions,
but has some aspects of c style syntax.

major differences from s-expressions:
it has statements which are, like c, terminated with a semicolon ';';
it has blocks, which now behave like inlined lists (with the internal
contents are parsed as statements);
it has infix operators (any symbol starting with characters besides letters,
hyphens, or underscores and being located in the second position in the list
is moved into function position), like lb/lb2 this can be overrided by using
a period '.' as a prefix (allowing infix symbols to appear in second
position, or normal symbols to appear infix);
using things prefix is still allowed.

in all cases ';' is required as a terminal for statements (including if they
are making use of blocks).

an alternate notation has been added for 'if' (this is handled by the if
special form and not the parser):
if <cond> then: <true*> [else: <false*>];

it is not allowed to mix the notations though, eg:
if (x < 2) 1 else: (x * (fact (x - 1)));
if (x < 2) then: 1 (x * (fact (x - 1)));
are invalid,
the first will return 1 if true and 'else:' otherwise;
the second will treat both as part of the 'then:' branch.

this was to work around an issue that has popped up before with most of my
attempts at a non s-expression syntax:
begin is not very convinient when I want multiple statements after an if.

of course, this added a bit of complexity to my otherwise simple 'if'
special form...

examples:
function fact (x)
{
    if (x < 2) then: 1
        else: (x * (fact (x - 1)));
};

or:
define (fact x)
{
    if (x < 2) 1 (x * (fact (x - 1)));
};

or:
define (fact x) (if (x < 2) 1 (x * (fact (x - 1))));

are valid.

the same goes for:
define (fact x)
    (if (< x 2) 1
        (* x (fact (- x 1))));

thus, it need only differ from s-expressions at the outermost layer if
desired...

comments?...                                            
mis6pittedu
This is very true. One should *never* underestimate lazyness of
people. So, if
schemers would-be were confortable with some editor with has not good
support
for parenthesis, it is possible that they will turn away from Scheme
rather
than learn  a new IDE (unless they are students forced to learn
Scheme, in
which case they could end up hating the language). Shipping an IDE
with Scheme
(i.e. DrScheme) is better than
forcing people to learn emacs (or vi), but still there will be people
unsatisfied with it (why should I need to learn an IDE just to learn a
language??).

I am an emacs user, still I was unsatified with the default scheme
mode, so I
tried other editors. It was a suicidal move. So, I came back to emacs,
I asked
here and I was pointed out to the Quack mode, which actually is
excellent.

Nevertheless, I had to ask here to found it, it didn't work the first
time, and
I had to spend some time to learn it. On top of that I had to change
the emacs
configuration for parenthesis, ask to comp.emacs a few other
questions, etc.
All this time was spent uselessly, from the point of few of learning
Scheme,
even if it was proficuous from the point of view of learning emacs
(this is
the reason why I decided to spend the time). But not everybody would
have
the patience. All this problems would disappear with a less
parenthized syntax.
IMHO, at least in principle, one should be able to program with
Notepad. That's
impossible with Scheme, unless you count yourself all the parenthesis
(notice
that I have nothing against prefix notation). 


For experimental purposes (I am NOT suggesting to change Scheme) I
wrote a
preprocessor that reads pseudo-scheme code and adds parenthesis
according to
the indentation; it works, so it shows that parenthesis are NOT
mandatory
and in principle they could be automatically supplied by the IDE.
Now, I know very well that *lots* of people have written analogous
utilities
to avoid parenthesis and all these utilities are NEVER used by
experienced
programmers. That's fine.
Myself, I do not use my preprocessor (if not as a proof of principle)
since
now I have mastered emacs and Quack enough that I am no more worried
about
parenthesis. But, when you design a new language and you want to make
it accessible to
most people (including layman programmers), it makes sense to lower
the barrier
and use a syntax which does not require help from a specialized
editor. On the
other hand, if you make the syntax particularly hard, you are sure
that only
the more persistent (if not the best) programmers will learn your
language,
thus explaining why Scheme programmers are better than Basic
programmers ;)

For the curious reader, here is an example of a parens-saving
pseudo-scheme program,
just to see how it looks like:

;; take some code and returns the list of names defined in it
;; (probably could be written in a better way)

define definitions '(define define-syntax define-macro)

define (defined-names-in code)
  define def #f
  define defined-names ()
  define (walk-recur code)
    cond 
      (list? code) 
        for-each 
          lambda (block) (walk-recur block) 
        : code
      else
        if (memq def definitions)
          set! defined-names (cons code defined-names)
        set! def code
  walk-recur code
  reverse defined-names

display
  defined-names-in 
    quote(
      define x 1
      define y 2
      (+ x y)) ;=> prints the list (x y)

The preprocessor adds a couple of parens for line, unless the line
starts with
a colon, and converts this program to the following Scheme code:

(define definitions '(define define-syntax define-macro))

(define (defined-names-in code)
  (define def #f)
  (define defined-names ())
  (define (walk-recur code)
    (cond
      ((list? code)
        (for-each
          (lambda (block) (walk-recur block))
          code))
      (else
        (if (memq def definitions)
          (set! defined-names (cons code defined-names)))
        (set! def code))))
  (walk-recur code)
  (reverse defined-names))

(display
  (defined-names-in
    (quote(
      (define x 1)
      (define y 2)
      (display (+ x y))))))

It is interesting to notice that the first version has 22 parens over
24
lines, i.e. less than one parens per line; on the other hand the
regular
Scheme version has 64 parens, i.e. nearly three times more
(insignificant)
parenthesis. Now, if you have a good editor, the parenthesized version
is
actually easier to write, since indentation is not significant and the
editor
take care of it; but if you only have Notepad at your disposition the
first
version becomes suddenly appealing.

So, if I was writing a new language, I will consider the fact that you
may
have s-expressions without too many parenthesis: the price to pay is
significant
indentation which has drawbacks, but also some advantages, such
as forcing people to write easy to read code. You must decide what
counts the
most for you.


Too many confusing ways to write the same thing. Also the colons in
"then:",
"else:" are completely useless and disturbing. Just my 2 Eurocents.

                  Michele Simionato
                                            
Shriram
[c.l.misc removed from Newsgroups header.]

mis6@pitt.edu (Michele Simionato) writes:


The reason they're never used by experienced programmers is usually
because these begin as proofs-of-concept and never grow beyond that.
As a result, they are useless for writing any actual code.

DrScheme has enough support for processing syntax that one could, in
fact, build quite a nice paren-lite Scheme preprocessor and endow it
with all the usual tools that the Scheme languages of DrScheme have.
If someone were to do this seriously, it would have a chance of really
getting a following.  I would use it!

Shriram
                                            
Eli
This is one of the items on my personal wish-slash-todo-list.  I
thought that something along the Javascript lines would be nice -- but
just for syntax, nothing like the ML/Python/etc embeddings...
                                            
Ray
When I first started learning lisp, I disliked the way the syntax 
worked; I felt like all these parens were in my way. 

But, as I used it and got used to it, I started really depending
on those parens.  They provide clear boundaries.  You can always 
tell exactly what information is available to any routine, by 
what's inside its parens.  There's no precedence ordering to 
remember.  There's one punctuation and it means one thing.  

These days I find syntaxes that use delimiters in other ways 
confusing.  

For example POVRay syntax is maddening because it mixes prefix
operators (constructors) and postfix operators (translations, 
rotations, etc) and puts postfix operations outside the 
delimiters (curly braces) of the stuff the operations affect. 

This isn't very bad when dealing with simple objects like spheres
and cylinders, but when you're dealing with complicated CSG's 
that have hundreds of components and they all have to go together 
in some very specific order of unions and merges and translations 
whose scope affects different subassemblies and so on....  it 
gets frustrating and is often non-obvious.  In order to get it 
right, I usually write the scene files in a lispy syntax and then 
translate into POVray.  It makes it much easier to tell which 
operation affects what subcomponents of the model. 

I keep seeing people who want to reform lispy syntax and reduce
the number of parens; I don't want that any more.  The parens
make the program into an explicit syntax tree that is easy to 
take apart, manipulate, analyze, and understand. 

If I had to change anything, I might consider other ways of 
expressing an explicit syntax tree, but fully parenthesized 
prefix is hard to beat.  Fully parenthesized postfix is an 
equivalent possibility, but would require "stack thinking" 
in addition to "subexpression thinking" to analyze, which 
would get bizarre when dealing with subexpressions of syntax
that aren't procedure calls. Of course, that's equivalently 
bizarre in prefix notation when some idiot introduces syntactic
subexpressions that look like function calls but aren't, so 
maybe it's not a worry anyhow.

				Bear
                                            
Eli
You can safely believe me when I say that I do not want to reform any
syntax at all.  Having a quick "normal" syntax is just something that
can be useful in very specific situations -- like using Scheme for an
application that is to be customized by semi-programmers -- which are
exactly the crown that will cry about parens no matter what you say.
                                            
mis6pittedu
Agreed. Notice that I did NOT propose to reform Scheme syntax.
I could even argue that a parentized syntax is *superior* to
indentation, provided one has a good editor. However, the
original question was the following: having to design a NEW
language, should I design the language in such a way that 
it is easy to code in it independently from the editor?
I do think the answer is yes; the answer is even a more decise
yes if, as the original poster pointed out, you have the
constraint to make the language "newbie friendly".
Having less parenthesis is useful if you have these two
constraints; OTOH, once you already have mastered emacs
or any other good editor, and you no more a newbie
programmer, the parenthesis will no more bother you,
so I don't think anybody would eliminate them aftwerwards
(except maybe in interactive programming) since they really
have some advantages, as you point out.
Still, if I was designing a language, I will try to avoid
parenthesis, as much as possible. So, I do think the
concerns of the OP are worthwhile.

           Michele
                                            
Ray
I'll admit to a strong bias. I believe that, in order to be 
maximally useful, a language _has_to_ be a lisp.  I don't 
mean scheme, or common lisp, or any particular kind of lisp, 
but I do mean a language where code can be expressed as data 
and data can be used to express code, where functions are 
first-class values and, by whatever name, you get the functions 
we call lambda and eval and quote, and all the primitive 
constructors and accessors needed to manipulate any kind of 
data expression that your eval can understand as language, 
and where syntax trees are concretely and precisely 
represented in a way open to easy analysis.

All other languages are lesser languages.  I don't mean that 
in a bad way; they have their niches for embedded systems, 
hardware interfaces, and so on. But for general programming, 
Lisp is what's still going to be here a hundred years from 
now -- either as a direct survival of current lisps, or because 
as other languages get more capable they will _necessarily_ 
become lisps.  

There is a threshold of general language utility that only 
Lisps can reach, and all efforts to design a language that 
reaches or surpasses that threshold must ultimately result 
in the reinvention of Lisp. 

Lisp is what's there, mathematically, beyond all the syntax 
of every other language.  It is the most direct expression 
of what computation *IS*. 

So, as a language designer, the first decision I guess is 
whether you're going for all the marbles and trying to 
design the ultimate general-purpose language; if so, start 
with Lisp and see if there's anything in the particular 
dialect you're looking at that you can painlessly clean up 
or usefully add or extend.  The requirement that parens 
serve in Lisp are making it so that programs can be read 
and written reliably as structured data values.  If you 
can find another way to meet that requirement, you can leave
out the parens and still have a lisp.

And if you're not making a lisp, you have to decide what 
sacrifices you'll make and whether whatever you get by 
making them is worth what they cost and whether you'll 
need (and can make) any new constructions to partially-
compensate for whatever you'releaving out. 



Like I say, decide what you're leaving out, how you'll 
compensate for its absence, and whether it's worth it. 

				Bear
                                            
cr88192
ok.
I try to preserve this, thus most of my "syntaxes" are more like "modified"
s-expressions.

maybe. I see interesting stuff happening in various places (and then try if
possible to rip them off).

maybe, or at least be in many ways similar.

err. semantics is one of those questionable spaces, I try to avoid looking
too deep into this as I can't see the bottom...

this is what I have been trying.
for the most part I still use s-exps for external data (I am still not an
xml convert...).

lb3 basically just replaces some parens with braces and semicolons, the
structure is not really effected that much...

fear not, my lang is still a lisp (maybe cruddy, oh well).
I have most of scheme (except hygenic macros, continuations are presently
broke, I never got around to a usable ports implementation, ...).

my philosophy has generally been a lispy core language+lots of ripped off
features from other languages.
I like the lispy core design;
I like prototype objects (originally inspired by 'self', gradually being
improved as I still discover/fix "misguided" design issues of mine
occasionally...);
I like message passing as a general concept (and have ripped off some bits
from erlang, eg: thread centric message passing/recieving);
other features I have seen along the way, eg: what seems cool and like it
can be implemented without too much effort.

hygenic macros: not that much loss;
continuations: some loss, deciding if and how much to reimplement them;
ports: need to get to it;
...

I am trying to avoid loss where possible, *and* trying to make it newbie
friendly where possible. this is a mixed bag, and I skip on newbie
friendliness if I don't like what it implies...

all for now.
                                            
cr88192
yes, and an important note is that I am still keeping s-expressions
around...

I seperate syntaxes by extension:
for s-expressions I use the extension 'lbs' (Lang/Bgb S-expression);
for my first syntax, I used 'lb' (Lang/Bgb [1]);
second syntax, 'lb2' (Lang/Bgb 2);
third syntax, 'lb3' (Lang/Bgb 3).

of these, lbs, lb, and (yet to be demonstrated) lb3 are usable.
'lb2' was a bit horrid, so I will probably not use it anymore...

lb3 is actually the closest to s-expressions thus far, but it is still a
little more complicated. the reason I created it is for the reasons stated
by the previous poster.
sadly, it may or may not be obvious that I am also expecting coders to use
the language in a manner similar to the typical use of imperative
languages... hopefully at least newbies would figure how to make good use of
let and lambda though...


if it were just me I expected to be using the language, I would be fine just
using s-expressions emacs, given that most of the code currently existing in
my lang was written this way (or converted due to annoyances with the other
syntaxes...).

also, lb3 is the first so far that should also work well for data...