Co-expressions Keep Too Much Information


When an Icon or Unicon procedure creates a co-expression, that co-expression
maintains a copy of all the dynamic local variables of that procedure.
Clearly, it makes sense to include copies of the dynamic variables actually
referenced by the co-expression, i.e. those dynamic variable that actually
appear in the "create" expression. But does it make sense for the
co-expression to include copies of dynamic variables that are NOT
referenced? Consider this example:

    global warming

    procedure hello_world()
        local paper, hero
        static shock
        coexp1 := create find( shock, !warming || hero )

The co-expression assigned to coexp1 is required to have its own copy of the
dynamic variable hero, since hero is needed to evaluate the expression. But
this co-expression would also have a copy of the dynamic variable paper,
even though this variable is never referenced by co-expression and cannot
possibly affect this co-expression's evaluation. This can result in the Icon
/ Unicon to maintain unreachable memory, even after garbage collection. For

        paper := some_really_really_big_structure()
        coexp1 := create find( shock, !warming || hero )
        paper := &null
        # This does NOT permit garbage collection of the really, really
        # large structure, since the copy of the paper variable in coexp1
        # still has this structure as its initial value.

The memory wasted by unused local variables is the source of this
Icon bug (going back to at least version 7). If an expression such as

    x := create some_expr_not_involving_x()

is used in a loop, and x is a local variable, unreferenceable co-expressions
are generated by each successive create operation. These co-expressions are
not garbage collected. This problem can be circumvented by making x a static
or global variable or by assigning a value to x before the create operation,
as in

    x := &null
    x := create some_expr_not_involving_x()

To understand this bug (and work-around), consider what happens when

    x := create some_expr_not_involving_x()

is evaluated the first time. A co-expression (call it coexp_1) is created.
Since x is a local variable, coexp_1 has a copy of x, initialized to the
current value of x. After coexp_1 is created, it is stored in the local
variable x.

Next consider what happens with this assignment statement in the second
iteration of the loop. A new co-expression is created (call it coexp_2) and
this co-expression has a copy of x. At this time, however, x equals coexp_1,
so the copy of x in coexp_2 is initially equal to coexp_1. Now assuming that
coexp_1 is not stored anywhere else, coexp_1 becomes
unreachable once coexp_2 is assigned to x. Unfortunately, coexp_1 is still
referenced by the copy of x in coexp_2, so coexp_1 is not garbage collected.

This problem gets worse with further iterations. The third iteration of the
above assignment statement will create the co-expression coexp_3 with a copy
of x with the initial value coexp_2, which in turn holds a copy of x with
the initial value of coexp_1. After coexp_3 is stored in x, coexp_1 and
coexp_2 is unreachable, but cannot be garbage collected. If there were 100
iterations, 100 co-expressions are created, even though only one of them can
be accessed!

Ideally, a co-expression should only contain copies of the local variables
it uses. We can basically get the same effect by making the co-expression
variable block a copy of the local procedure block, but initialize each
unreferenced variable with &null. For example, if the current local
variables are hero and paper, then

        coexp1 := find( shock, !warming || hero )

may have copies of hero and paper, but only hero would be initialized with
its dynamic counterpart; paper would be initialized as &null.

Is this worth fixing? If so, how would you approach it. I have some ideas as
to how we could do this.                                            
thanks for the explanation - I never did quite find out what the actual
problem was.

I guess the source of the problem is two-fold:

1) not analysing strongly enough which variables are referenced

2) simplifying the job by copying the entire stackframe of the invoking
procedure, instead of only the necessary variables.

Is this correct?

If point 2 is correct, then if a compacted stack frame is recorded for the
enclosed code, it must reference the variables at different stack indexes.
This means more work in the compiler. It's occurred to me that creates
written within creates might exacerbate the problem but I haven't put any
thought into that assumption...

I use co-expressions in a critical process, but it's lifetime is measured in
seconds at most. I guess the real question is, does anyone have a long
running or very large process which suffers?

A suitable work-around would be to create the co-expression within a small
procedure which is given only the values needed. This way, extraneous locals
simply won't exist.

Certainly, as a geek I say, fix it on principle, but in practice, who has
the time to do the work, and also the testing to assure that some other
subtle defects haven't crept in? Is it better the Devil you Know? How hard
do you expect the work to be?
longstanding Icon bug (going back to at least version 7).

You're welcome. Hopefully

NO analysis is done of which dynamic variables are used in the


I'm glad you thought of the case where a "create" expression contains a
"create" expression.  Keep in mind that the inner "create" would be
evaluated in the context of the outer co-expression, so any dynamic
variables co-expression would be copied from the outer co-expression!

I am concerned about the complications of creating a "compacted" list of
dynamic variables. That is why I proposed the more conservative approach by
giving each co-expression a complete copy of the dynamic variables, but then
storing &null in each unreferenced variable.

Whoops! I didn't mean to send this before finishing the sentence starting
with "Hopefully". Let me finish it now:
"Hopefully, this explaination also clears up the mystery as to why the two
work-arounds fix the problem."

BTW I have also posted this topic to the Unicon mailing list. Steve Wampler
noted that the unreferenced variables in a co-expression could be accessed
via the "variable" function. IIR the "variable" function returns the value
of a variable, given the variable's name in string form. Naturally, this
function fails if there is no variable by this name.

This observation does throw a monkey wrench into any scheme for cutting down
on co-expression dynamic variable information. One can argue that because of
this function, none of the unreferenced variables in a co-expression is
"garbage". This is unfortunate, for Icon / Unicon programs that actually use
"variable" to read unreferenced co-expression variables are extremely rare.
I certainly have never seen one!
Hi Frank,

I finally got time to read your entire original post.  I think
your analysis of the situation is spot on, and a good explanation
of the 'memory-leak' that can arise from co-expressions.

Although I commented on the variable() function, it's my
general feeling that its ability to access 'hidden' dynamic
variables from within a co-expression is entirely an
artifact of how co-expressions are currently implemented
and *not* a language feature.  There is no requirement
that such access must exist, and I'd be willing to bet that
no one has every made use of this feature from within
a co-expression [and they shouldn't, since it's solely
an implementation artifact!].

So, I'm all for an experimental re-implementation of
co-expressions where unreferenced dynamic variables are
set to &null.  If that looks good, and solves the
memory leak problem (which I expect it will), then perhaps
a revisit to further clean up the implementation would be
in order.
Although a fix would certainly be desirable, it seems that it would require
a change to the Icon virtual machine. Aside from the moritbund native code
compiler and Jcon, Icon and Unicon are based on a high-level virtual
machine. The *.u* files generated by the Icon translator contains assembly
instructions for the virtual machine.

The virtual machine instruction

    create Lx

creates a co-expression, using the code at label Lx as the start of the
expression, and push this new co-expression on the top of the stack.
Basically, the virtual machine code for the Icon expression

    create Some_Expression

looks like this:

            goto L2
    lab L1
            (Code for Some-Express)
    lab L2
            create L1

The problem with this scheme is that this scheme does not indicate which
dynamic variables are actually referenced in the co-expression. This
information is not an input to the "create" virtual instruction, so the
virtual machine cannot distinguish between the referenced and un-referenced
dynamic variables.

A way to fix this would be to add a new virtual machine instruction to allow
this distinction be made. This new interuction would have the form

    covar    n

Executing this instruction would mark local variable n as one that is
referenced in the next co-expression that is created. Assume that
Some_Expression does references local variable 2 and 3. Then with this new
instruction, the virtual machine code for the Icon expression

    create Some_Expression

would look like this:

            goto L2
    lab L1
            (Code for Some-Express)
    lab L2
            covar   2
            covar   3
            create L1

This would permit the virtual machine to distinguish between referenced and
unreferenced dynamic variables.

Of all the changes I've done with Icon, I have never experimented with
changing the virtual machine. Does anyone in the NG / mailing list have
experience in this area? What pitfalls should I watch out for?