NAME¶
stalin - A global optimizing compiler for Scheme
SYNOPSIS¶
- stalin
- [-version]
[-I include-directory]*
[[-s|-x|-q|-t]]
[[-treat-all-symbols-as-external|
-do-not-treat-all-symbols-as-external]]
[[-index-allocated-string-types-by-expression|
-do-not-index-allocated-string-types-by-expression]]
[[-index-constant-structure-types-by-slot-types|
-do-not-index-constant-structure-types-by-slot-types]]
[[-index-constant-structure-types-by-expression|
-do-not-index-constant-structure-types-by-expression]]
[[-index-allocated-structure-types-by-slot-types|
-do-not-index-allocated-structure-types-by-slot-types]]
[[-index-allocated-structure-types-by-expression|
-do-not-index-allocated-structure-types-by-expression]]
[[-index-constant-headed-vector-types-by-element-type|
-do-not-index-constant-headed-vector-types-by-element-type]]
[[-index-constant-headed-vector-types-by-expression|
-do-not-index-constant-headed-vector-types-by-expression]]
[[-index-allocated-headed-vector-types-by-element-type|
-do-not-index-allocated-headed-vector-types-by-element-type]]
[[-index-allocated-headed-vector-types-by-expression|
-do-not-index-allocated-headed-vector-types-by-expression]]
[[-index-constant-nonheaded-vector-types-by-element-type|
-do-not-index-constant-nonheaded-vector-types-by-element-type]]
[[-index-constant-nonheaded-vector-types-by-expression|
-do-not-index-constant-nonheaded-vector-types-by-expression]]
[[-index-allocated-nonheaded-vector-types-by-element-type|
-do-not-index-allocated-nonheaded-vector-types-by-element-type]]
[[-index-allocated-nonheaded-vector-types-by-expression|
-do-not-index-allocated-nonheaded-vector-types-by-expression]]
[[-no-clone-size-limit|
-clone-size-limit number-of-expressions]]
[-split-even-if-no-widening]
[[-fully-convert-to-CPS|
-no-escaping-continuations]]
[-du]
[-Ob] [-Om] [-On] [-Or] [-Ot]
[-d0] [-d1] [-d2] [-d3] [-d4]
[-d5] [-d6] [-d7]
[-closure-conversion-statistics]
[-dc] [-dC] [-dH] [-dg] [-dh]
[-d]
[-architecture name]
[[-baseline|
-conventional|
-lightweight]]
[[-immediate-flat|
-indirect-flat|
-immediate-display|
-indirect-display|
-linked]]
[[-align-strings|-do-not-align-strings]]
[-de] [-df] [-dG] [-di] [-dI]
[-dp] [-dP]
[-ds] [-dS] [-Tmk]
[-no-tail-call-optimization]
[-db] [-c] [-k]
[-cc C-compiler]
[-copt C-compiler-option]*
[pathname]
Compiles the Scheme source file
pathname.sc first into a C file
pathname.c and then into an executable image
pathname. Also
produces a database file
pathname.db. The
pathname argument is
required unless
-version is specified.
DESCRIPTION¶
Stalin is an extremely efficient compiler for Scheme. It is designed to be used
not as a development tool but rather as a means to generate efficient
executable images either for application delivery or for production research
runs. In contrast to traditional Scheme implementations, Stalin is a
batch-mode compiler. There is no interactive READ-EVAL-PRINT loop. Stalin
compiles a single Scheme source file into an executable image (indirectly via
C). Running that image has equivalent semantics to loading the Scheme source
file into a virgin Scheme interpreter and then terminating its execution. The
chief limitation is that it is not possible to LOAD or EVAL new expressions or
procedure definitions into a running program after compilation. In return for
this limitation, Stalin does substantial global compile-time analysis of the
source program under this closed-world assumption and produces executable
images that are small, stand-alone, and fast.
Stalin incorporates numerous strategies for generating efficient code. Among
them, Stalin does global static type analysis using a soft type system that
supports recursive union types. Stalin can determine a narrow or even
monomorphic type for each source code expression in arbitrary Scheme programs
with no type declarations. This allows Stalin to reduce, or often eliminate,
run-time type checking and dispatching. Stalin also does low-level
representation selection on a per-expression basis. This allows the use of
unboxed base machine data representations for all monomorphic types resulting
in extremely high-performance numeric code. Stalin also does global static
life-time analysis for all allocated data. This allows much temporary
allocated storage to be reclaimed without garbage collection. Finally, Stalin
has very efficient strategies for compiling closures. Together, these
compilation techniques synergistically yield efficient object code.
Furthermore, the executable images created by Stalin do not contain
(user-defined or library) procedures that aren't called, variables and
parameters that aren't used, and expressions that cannot be reached. This
encourages a programming style whereby one creates and uses very general
library procedures without fear that executable images will suffer from code
bloat.
OPTIONS¶
- -version
- Prints the version of Stalin and exits immediately.
The following options control preprocessing:
- -I
- Specifies the directories to search for Scheme include
files. This option can be repeated to specify multiple directories. Stalin
first searches for include files in the current directory, then each of
the directories specified in the command line, and finally in the default
installation include directory.
- -s
- Includes the macros from the Scheme->C compatibility
library. Currently, this defines the WHEN and UNLESS syntax.
- -x
- Includes the macros from the Xlib and GL library.
Currently, this defines the FOREIGN-FUNCTION and FOREIGN-DEFINE syntax.
This implies -s.
- -q
- Includes the macros from the QobiScheme library. Currently,
this defines the DEFINE-STRUCTURE syntax, among other things. This implies
-x.
- -t
- Includes the macros needed to compile Stalin with itself.
This implies -q.
The following options control the precision of flow analysis:
- -treat-all-symbols-as-external
- During flow analysis, generate a single abstract external
symbol that is shared among all symbols.
- -do-not-treat-all-symbols-as-external
- During flow analysis, when processing constant expressions
that contain symbols, generate a new abstract internal symbol for each
distinct symbol constant in the program. This is the default.
- -index-allocated-string-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate strings, generate a new abstract string for
each such expression. This is the default.
- -do-not-index-allocated-string-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate strings, generate a single abstract string
that is shared among all such expressions.
Note that there are no versions of the above options for element type because
the element type of a string is always char. Furthermore, there are no
versions of the above options for constant expressions because there is always
only a single abstract constant string.
- -index-constant-structure-types-by-slot-types
- During flow analysis, when processing constant expressions
that contain structures, generate a new abstract structure for each set of
potential slot types for that structure.
- -do-not-index-constant-structure-types-by-slot-types
- During flow analysis, when processing constant expressions
that contain structures, generate a single abstract structure that is
shared among all sets of potential slot types for that structure. This is
the default.
- -index-constant-structure-types-by-expression
- During flow analysis, when processing constant expression
that contain structures, generate a new abstract structure for each such
expression. This is the default.
- -do-not-index-constant-structure-types-by-expression
- During flow analysis, when processing constant expressions
that contain structures, generate a single abstract structure that is
shared among all such expressions.
- -index-allocated-structure-types-by-slot-types
- During flow analysis, when processing procedure-call
expressions that can allocate structures, generate a new abstract
structure for each set of potential slot types for that structure.
- -do-not-index-allocated-structure-types-by-slot-types
- During flow analysis, when processing procedure-call
expressions that can allocate structures, generate a single abstract
structure that is shared among all sets of potential slot types for that
structure. This is the default.
- -index-allocated-structure-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate structures, generate a new abstract
structure for each such expression. This is the default.
- -do-not-index-allocated-structure-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate structures, generate a single abstract
structure that is shared among all such expressions.
Note that, currently, pairs are the only kind of structure that can appear in
constant expressions. This may change in the future, if the reader is extended
to support other kinds of structures.
- -index-constant-headed-vector-types-by-element-type
- During flow analysis, when processing constant expressions
that contain headed vectors, generate a new abstract headed vector for
each potential element type for that headed vector.
- -do-not-index-constant-headed-vector-types-by-element-type
- During flow analysis, when processing constant expressions
that contain headed vectors, generate a single abstract headed vector that
is shared among all potential element types for that headed vector. This
is the default.
- -index-constant-headed-vector-types-by-expression
- During flow analysis, when processing constant expressions
that contain headed vectors, generate a new abstract headed vector for
each such expression. This is the default.
- -do-not-index-constant-headed-vector-types-by-expression
- During flow analysis, when processing constant expressions
that contain headed vectors, generate a single abstract headed vector that
is shared among all such expressions.
- -index-allocated-headed-vector-types-by-element-type
- During flow analysis, when processing procedure-call
expressions that can allocate headed vectors, generate a new abstract
headed vector for each potential element type for that headed vector.
- -do-not-index-allocated-headed-vector-types-by-element-type
- During flow analysis, when processing procedure-call
expressions that can allocate headed vectors, generate a single abstract
headed vector that is shared among all potential element types for that
headed vector. This is the default.
- -index-allocated-headed-vector-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate headed vectors, generate a new abstract
headed vector for each such expression. This is the default.
- -do-not-index-allocated-headed-vector-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate headed vectors, generate a single abstract
headed vector that is shared among all such expressions.
- -index-constant-nonheaded-vector-types-by-element-type
- During flow analysis, when processing constant expressions
that contain nonheaded vectors, generate a new abstract nonheaded vector
for each potential element type for that nonheaded vector.
- -do-not-index-constant-nonheaded-vector-types-by-element-type
- During flow analysis, when processing constant expressions
that contain nonheaded vectors, generate a single abstract nonheaded
vector that is shared among all potential element types for that nonheaded
vector. This is the default.
- -index-constant-nonheaded-vector-types-by-expression
- During flow analysis, when processing constant expressions
that contain nonheaded vectors, generate a new abstract nonheaded vector
for each such expression. This is the default.
- -do-not-index-constant-nonheaded-vector-types-by-expression
- During flow analysis, when processing constant expressions
that contain nonheaded vectors, generate a single abstract nonheaded
vector that is shared among all such expressions.
- -index-allocated-nonheaded-vector-types-by-element-type
- During flow analysis, when processing procedure-call
expressions that can allocate nonheaded vectors, generate a new abstract
nonheaded vector for each potential element type for that nonheaded
vector.
- -do-not-index-allocated-nonheaded-vector-types-by-element-type
- During flow analysis, when processing procedure-call
expressions that can allocate nonheaded vectors, generate a single
abstract nonheaded vector that is shared among all potential element types
for that nonheaded vector. This is the default.
- -index-allocated-nonheaded-vector-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate nonheaded vectors, generate a new abstract
nonheaded vector for each such expression. This is the default.
- -do-not-index-allocated-nonheaded-vector-types-by-expression
- During flow analysis, when processing procedure-call
expressions that can allocate nonheaded vectors, generate a single
abstract nonheaded vector that is shared among all such expressions.
Note that, currently, constant expressions cannot contain nonheaded vectors and
nonheaded vectors are never allocated by any procedure-call expression. ARGV
is the only nonheaded vector. These options are included only for completeness
and in case future extensions to the language allow nonheaded vector constants
and procedures that allocate nonheaded vectors.
- -no-clone-size-limit
- Allow unlimited polyvariance, i.e. make copies of
procedures of any size.
- -clone-size-limit
- Specify the polyvariance limit, i.e. make copies of
procedures that have fewer than this many expressions. Must be a
nonnegative integer. Defaults to 80. Specify 0 to disable
polyvariance.
- -split-even-if-no-widening
- Normally, polyvariance will make a copy of a procedure only
if it is called with arguments of different types. Specify this option to
make copies of procedures even when they are called with arguments of the
same type. This will allow them to be in-lined.
- -fully-convert-to-CPS
- Normally, lightweight CPS conversion is applied, converting
only those expressions and procedures needed to support escaping
continuations. When this option is specified, the program is fully
converted to CPS.
- -no-escaping-continuations
- Normally, full continuations are supported. When this
option is specified, the only continuations that are supported are those
that cannot be called after the procedure that created the continuation
has returned.
- -du
- Normally, after flow analysis, Stalin forces each type set
to have at most one structure-type member of a given name, at most one
headed-vector-type member, and at most one nonheaded-vector-type member.
This option disables this, allowing type sets to have multiple
structure-type members of a given name, multiple headed-vector-type
members, and multiple nonheaded-vector-type members. Sometimes yields more
efficient code and sometimes yields less efficient code.
The following options control the amount of run-time error-checking code
generated. Note that, independent of the settings of these options, Stalin
will always generate code that obeys the semantics of the Scheme language for
correct programs. These options only control the level of safety, that is the
degree of run-time error checking for incorrect programs.
- -Ob
- Specifies that code to check for out-of-bound vector or
string subscripts is to be suppressed. If not specified, a run-time error
will be issued if a vector or string subscript is out of bounds. If
specified, the behavior of programs that have an out-of-bound vector or
string subscript is undefined.
- -Om
- Specifies that code to check for out-of-memory errors is to
be suppressed. If not specified, a run-time error will be issued if
sufficient memory cannot be allocated. If specified, the behavior of
programs that run out of memory is undefined.
- -On
- Specifies that code to check for exact integer overflow is
to be suppressed. If not specified, a run-time error will be issued on
exact integer overflow. If specified, the behavior of programs that cause
exact integer overflow is undefined. Currently, Stalin does not know how
to generate overflow checking code so this option must be specified.
- -Or
- Specifies that code to check for various run-time
file-system errors is to be suppressed. If not specified, a run-time error
will be issued when an unsuccessful attempt is made to open or close a
file. If specified, the behavior of programs that make such unsuccessful
file-access attempts is undefined.
- -Ot
- Specifies that code to check that primitive procedures are
passed arguments of the correct type is suppressed. If not specified, a
run-time error will be issued if a primitive procedure is called with
arguments of the wrong type. If specified, the behavior of programs that
call a primitive procedure with data of the wrong type is undefined.
The following options control the verbosity of the compiler:
- -d0
- Produces a compile-time backtrace upon a compiler
error.
- -d1
- Produces commentary during compilation describing what the
compiler is doing.
- -d2
- Produces a decorated listing of the source program after
flow analysis.
- -d3
- Produces a decorated listing of the source program after
equivalent types have been merged.
- -d4
- Produces a call graph of the source program.
- -d5
- Produces a description of all nontrivial native procedures
generated.
- -d6
- Produces a list of all expressions and closures that
allocate storage along with a description of where that storage is
allocated.
- -d7
- Produces a trace of the lightweight closure-conversion
process.
- -closure-conversion-statistics
- Produces a summary of the closure-conversion statistics.
These are automatically processed by the program bcl-to-latex.sc
which is run by the bcl-benchmark script (both in the
/usr/local/stalin/benchmarks directory) to produce tables II, III,
and IV, of the paper Flow-Directed Lightweight Closure
Conversion.
The following options control the storage management strategy used by compiled
code:
- -dc
- Disables the use of alloca(3). Normally, the
compiler will use alloca(3) to allocate on the call stack when
possible.
- -dC
- Disables the use of the Boehm conservative garbage
collector. Normally, the compiler will use the Boehm collector to allocate
data whose lifetime is not known to be short. Note that the compiler will
still use the Boehm collector for some data if it cannot allocate that
data on the stack or on a region.
- -dH
- Disables the use of regions for allocating data.
- -dg
- Generate code to produce diagnostic messages when region
segments are allocated and freed.
- -dh
- Disables the use of expandable regions and uses fixed-size
regions instead.
The following options control code generation:
- -d
- Specifies that inexact reals are represented as C doubles.
Normally, inexact reals are represented as C floats.
- -architecture
- Specify the architecture for which to generate code. The
default is to generate code for whatever architecture the compiler is run
on. Currently, the known architectures are IA32, IA32-align-double, SPARC,
SPARCv9, SPARC64, MIPS, Alpha, ARM, M68K, PowerPC, and S390.
- -baseline
- Do not perform lightweight closure conversion. Closures are
created for all procedures. The user would not normally specify this
option. It is only intended to measure the effectiveness of lightweight
closure conversion. It is used by the bcl-benchmark script (in the
/usr/local/stalin/benchmarks directory) to produce tables II, III,
and IV, of the paper Flow-Directed Lightweight Closure
Conversion.
- -conventional
- Perform a simplified version of lightweight closure
conversion that does not rely on interprocedural analysis. Attempts to
mimic what `conventional' compilers do (whatever that is). The user would
not normally specify this option. It is only intended to measure the
effectiveness of lightweight closure conversion. It is used by the
bcl-benchmark script (in the /usr/local/stalin/benchmarks
directory) to produce tables II, III, and IV of the paper Flow-Directed
Lightweight Closure Conversion.
- -lightweight
- Perform lightweight closure conversion. This is the
default.
- -immediate-flat
- Generate code using immediate flat closures. This is not
(yet) implemented.
- -indirect-flat
- Generate code using indirect flat closures. This is not
(yet) implemented.
- -immediate-display
- Generate code using immediate display closures.
- -indirect-display
- Generate code using indirect display closures. This is not
(yet) implemented.
- -linked
- Generate code using linked closures. This is the
default.
- -align-strings
- Align all strings to fixnum alignment. This will not work
when strings are returned by foreign procedures that are not aligned to
fixnum alignment. It will also not work when ARGV is used, since those
strings are also not aligned to fixnum alignment. This is the
default.
- -do-not-align-strings
- Do not align strings to fixnum alignment. This must be
specified when strings returned by foreign procedures are not aligned to
fixnum alignment.
- -de
- Enables the compiler optimization known as EQ? forgery.
Sometimes yields more efficient code and sometimes yields less efficient
code.
- -df
- Disables the compiler optimization known as forgery.
- -dG
- Pass arguments using global variables instead of parameters
whenever possible.
- -di
- Generate if statements instead of switch statements for
dispatching.
- -dI
- Enables the use of immediate structures.
- -dp
- Enables representation promotion. Promotes some type sets
from squeezed to squished or squished to general if this will decrease the
amount of run-time branching or dispatching representation coercions.
Sometimes yields more efficient code and sometimes yields less efficient
code.
- -dP
- Enables copy propagation. Sometimes yields more efficient
code and sometimes yields less efficient code.
- -ds
- Disables the compiler optimization known as squeezing.
- -dS
- Disables the compiler optimization known as squishing.
- -Tmk
- Enables generation of code that works with the Treadmarks
distributed-shared-memory package. Currently this option is not fully
implemented and is not known to work.
- -no-tail-call-optimization
- Stalin now generates code that is properly tail recursive,
by default, in all but the rarest of circumstances. And it can be coerced
into generating properly tail-recursive code in all circumstances by
appropriate options. Some tail-recursive calls, those where the call site
is in-lined in the target, are translated as C goto statements and always
result in properly tail-recursive code. The rest are translated as C
function calls in tail position. This relies on the C compiler to perform
tail-call optimization. gcc(1) versions 2.96 and 3.0.2 (and perhaps
other versions) perform tail-call optimization on IA32 (and perhaps other
architectures) when -foptimize-sibling-calls is specified. (
-O2 implies -foptimize-sibling-calls.) gcc(1) only
performs tail-call optimization on IA32 in certain circumstances. First,
the target and the call site must have compatible signatures. To guarantee
compatible signatures, Stalin passes parameters to C functions that are
part of tail-recursive loops in global variables. Second, the target must
not be declared __attribute__ ((noreturn)). Thus Stalin will not
generate a __attribute__ ((noreturn)) declaration for a function
that is part of a tail-recursive loop even if Stalin knows that it never
returns. Third, the function containing the call site cannot call
alloca(3). gcc(1) does no flow analysis. Any call to
alloca(3) in the function containing the call site, no matter
whether the allocated data escapes, will disable tail-call optimization.
Thus Stalin disables stack allocation of data in any procedure in-lined in
a procedure that is part of a tail-recursive loop. Finally, the call site
cannot contain a reentrant region because reentrant regions are freed upon
procedure exit and a tail call would require an intervening region
reclamation. Thus Stalin disables allocation of data on a reentrant region
in any procedure that is part of a tail-recursive loop. Disabling these
optimizations incurs a cost for the benefit of achieving tail-call
optimization. If your C compiler does not perform tail-call optimization
then you may wish not to pay the cost. The
-no-tail-call-optimization option causes Stalin not to take these
above four measures to generate code on which gcc(1) would perform
tail-call optimization. Even when specifying this option, Stalin still
translates calls, where the call site is in-lined in the target, as C goto
statements. There are three rare occasions that can still foil proper tail
recursion. First, if you specify -dC you may force Stalin to use
stack or region allocation even in a tail-call cycle. You can avoid this
by not specifying -dC. Second, gcc(1) will not perform
tail-call optimization when the function containing the call site applies
unary & to a local variable. gcc(1) does no flow analysis. Any
application of unary & to a local variable in the function containing
the call site, no matter whether the pointer escapes, will disable
tail-call optimization. Stalin can generate such uses of unary & when
you specify -de or don't specify -df. You can avoid such
cases by specifying -df and not specifying -de. Finally,
gcc(1) will not perform tail-call optimization when the function
containing the call site calls setjmp(3). gcc(1) does no
flow analysis. Any call to setjmp(3) in the function containing the
call site, no matter whether the jmp_buf escapes, will disable
tail-call optimization. Stalin translates certain calls to
call-with-current-continuation as calls to setjmp(3). You
can force Stalin not to do so by specifying -fully-convert-to-CPS.
Stalin will generate a warning in the first and third cases, namely, when
tail-call optimization is foiled by reentrant-region allocation or calls
to alloca(3) or setjmp(3). So you can hold off specifying
-fully-convert-to-CPS or refraining from specifying -dC
until you see such warnings. No such warning is generated, however, when
uses of unary & foil tail-call optimization. So you might want to
always specify -df and refrain from specifying -de if you
desire your programs to be properly tail recursive.
The following options control the C-compilation phase:
- -db
- Disables the production of a database file.
- -c
- Specifies that the C compiler is not to be called after
generating the C code. Normally, the C compiler is called after generating
the C code to produce an executable image. This implies -k.
- -k
- Specifies that the generated C file is not to be deleted.
Normally, the generated C file is deleted after it is compiled.
- -cc
- Specifies the C compiler to use. Defaults to
gcc(1).
- -copt
- Specifies the options that the C compiler is to be called
with. Normally the C compiler is called without any options. This option
can be repeated to allow passing multiple options to the C compiler.
FILES¶
/usr/local/stalin/include/ default directory for Scheme include files and
library archive files
/usr/local/stalin/include/Scheme-to-C-compatibility.sc include file for
Scheme->C compatibility
/usr/local/stalin/include/QobiScheme.sc include file for QobiScheme
/usr/local/stalin/include/xlib.sc include file for Xlib FPI
/usr/local/stalin/include/xlib-original.sc include file for Xlib FPI
/usr/local/stalin/include/libstalin.a library archive for Xlib FPI
/usr/local/stalin/include/gc.h include file for the Boehm conservative
garbage collector
/usr/local/stalin/include/libgc.a library archive for the Boehm
conservative garbage collector
/usr/local/stalin/include/stalin.architectures the known architectures
and their code-generation parameters
/usr/local/stalin/include/stalin-architecture-name shell script that
determines the architecture on which Stalin is running
/usr/local/stalin/stalin-architecture.c program to construct a new entry
for
stalin.architectures with the code-generation parameters for the
machine on which it is run
/usr/local/stalin/benchmarks directory containing benchmarks from the
paper
Flow-Directed Lightweight Closure Conversion
/usr/local/stalin/benchmarks/bcl-benchmark script for producing tables
II, III, and IV from the paper
Flow-Directed Lightweight Closure
Conversion
/usr/local/stalin/benchmarks/bcl-to-latex.sc Scheme program for producing
tables II, III, and IV from the paper
Flow-Directed Lightweight Closure
Conversion
SEE ALSO¶
sci(2),
scc(2),
gcc(1),
ld(1),
alloca(3),
setjmp(3),
gc(8)
BUGS¶
Version 0.11 is an alpha release and contains many known bugs. Not everything is
fully implemented. Bug mail should be addressed to
Bug-Stalin@AI.MIT.EDU and not to the author. Please include the version
number (0.11) in the message. Periodic announcements of bug fixes,
enhancements, and new releases will be made to
Info-Stalin@AI.MIT.EDU.
Send mail to
Info-Stalin-Request@AI.MIT.EDU to be added to the
Info-Stalin@AI.MIT.EDU mailing list.
AUTHOR¶
Jeffrey Mark Siskind
THANKS¶
Rob Browning packaged version 0.11 for Debian Linux.