Compaq Tru64 UNIX Switch Disclosure
SPEC CPU2000
Compaq Computer Corporation
Revised 21 May 2002

This SPEC CPU2000 switch disclosure is for Tru64 UNIX (formerly known as
Digital UNIX).

This document was originally written in November 1999, and will be
updated to add new switches used in later submissions.  An attempt is
made to be cumulative, so some switches listed from earlier submissions
might not be used in later submissions.

Switches are given in alphabetical order rather than by product or
benchmark.  It is hoped that this order will be convenient for the
reader of the NOTES section of a SPEC CPU2000 disclosure who wants to
look up a specific command or switch.  The collating sequence ignores
upper/lower case, hyphens, and the presence of "no" for negation.  That
is, if you are looking for "-nomumble", try looking under "-mumble".

Note: some switches in this disclosure statement are not used directly,
but are generated by other switches (e.g. "-fast").

-align dcommons (Compaq Fortran)
    Aligns COMMON block entities on natural boundaries up to 8-bytes.

-align sequence (Compaq Fortran)
    Specifies that components of a SEQUENCEd derived type are to be
    aligned according to the alignment rules set by the user (which by
    default cause components to be aligned on natural boundaries).

-all -ldensemalloc -none (linker)
    The "dense malloc" library provides a memory allocation strategy
    that packs memory more tightly, at a slight cost in execution
    speed of malloc and free.  The "-all" and "-none" options surrounding
    the reference to libdensemalloc cause all symbols in the library to
    be included in the images.
   
-ansi_alias (Compaq C)
    Directs the compiler to assume the ANSI C aliasing rules.

-ansi_args (Compaq C)
    Tells the compiler that the source code follows all ANSI rules about
    arguments; that is, whether the type of an argument matches the type
    of the parameter in the called function, or whether a function
    prototype is present so the compiler can automatically perform the
    expected type conversion.

-arch <cpu>  (Compaq C, Compaq Fortran)
    Generate code that may include instructions which are newly 
    introduced with <cpu>.  For example, "ev56" adds byte/word load and
    store, and "ev6" adds sqrt.  See also -tune, below.

-arl=n (KAP C)
    Informs KAP what level of data aliasing may be present in the
    program:
     0 kapc makes no assumptions about data aliasing.
     1 A pointer will not contain its own address.
     2 No objects represented by function parameters overlap in memory.
     3 Globals, function parameters, and locals form distinct groups.
     4 No aliases for objects.

-assume bigarrays (Compaq Fortran)
    Suppresses run-time checking for distributed small array dimensions,
    for increased performance if using the -wsf option.

-assume noaccuracy_sensitive (Compaq Fortran)
    Same as -fp_reorder

-assume nomath_errno (Compaq C)
    Allows the compiler to reorder or combine computations to improve
    the performance of those math functions that it recognizes as
    intrinsic functions.

-assume restricted_pointers (Compaq C)
    During the lifetime of any given pointer "p", the memory locations 
    accessed through it are not accessed by any other memory references.

-assume trusted_short_alignment (Compaq C)
    Specifies that this is a strictly-conforming ANSI C program with
    respect to the dereferencing of pointer-to-short variables.  This
    allows the compiler to assume that any short accessed through a
    pointer is naturally aligned (as the C language requires).

-assume nozsize (Compaq Fortran)
    Omits run-time checking for zero-sized array sections for increased
    performance if using the -wsf option.

cc (compiler)

    If the Developer's Toolkit is NOT installed, this command invokes
    the system C compiler.  If the toolkit has been installed, then it
    invokes the compiler in /usr/lib/cmplrs/cc.dtk

-ckapargs=''  (KAP C)
    Pass the switches between apostrophes to the KAP optimizer.

-Dalloca=__builtin_alloca (Compaq C)
    Portability switch, used for gcc; specifies to use the builtin
    version of alloca

+CFB (notes only)
    As explained in the notes section, "+CFB" is merely an abbrevation
    to help readability of the notes.  Look elsewhere in the notes 
    section for a description of what this abbreviation means.

cxx (compiler)
    Invokes the C++ compiler

-D_INTRINSICS (Compaq C)
    Declares certain functions to be intrinsic.  When a function is
    intrinsic, the compiler is free to generate faster code that
    provides the same function behavior (but may not actually call the
    function).

-D_INLINE_INTRINSICS (Compaq C)
    Directs the compiler to inline some of the intrinsic functions,
    avoiding the overhead of a function call.

-D_FASTMATH (Compaq C)
    Redefines the names of certain common math routines so that faster
    but slightly less accurate functions are used. 

-DALPHA (crafty)
    Portability switch for crafty.  Specifies that longs are 64 bits,
    that we do not need to say "long long" to get 64 bit quantities, and
    that the architecture is little endian.

-DSPEC_CPU2000_DUNIX (perlbmk)
    Portability switch for perlbmk - see source code for exact effect in
    module benchspec/CINT2000/253.perlbmk/src/spec_config.h.  Sets items
    such as number of bytes in a long, little endian byte order, how to
    invoke the C preprocessor, says that "fcntl" is available.

-DSPEC_CPU2000_LP64 (gap, vortex)
    Specifies that longs and pointers are 64 bits.

-DSYS_HAS_CALLOC_PROTO (gap)
    Specifies that the system already defines the function calloc

-DSYS_HAS_IOCTL_PROTO (gap)
    Specifies that the system already defines ioctl

-DSYS_IS_BSD (gap)
    Specifies that the system is compatible with BSD Unix, using
    conventions such as "/" for directory separation, Unix signals,
    string concatenation, etc.

-fb name (spike)
    Causes spike to look for feedback information stored in files
    name.Counts.*

f77 (compiler)
    EITHER invokes the f90 compiler with some flags set that increase
    compatibility with the older f77 standard, OR invokes the older
    compiler, if the link in /bin/f77 has been set as specified in the
    release notes.  Initial CPU2000 submissions set the link for the
    older compiler.

f90 (compiler)
    Invokes the f90 compiler

-fast (Compaq C)
    Provides a single method for turning on a collection of 
    optimizations for increased performance, namely:
       -ansi_alias
       -ansi_args
       -assume nomath_errno
       -assume trusted_short_alignment
       -D_INTRINSICS
       -D_INLINE_INTRINSICS
       -D_FASTMATH
       -float
       -fp_reorder
       -ifo
       -intrinsics
       -O3 
       -readonly_strings

-fast (Compaq Fortran)
    Provides a single method for turning on a collection of 
    optimizations for increased performance, namely:
      -align dcommons
      -arch host
      -assume noaccuracy_sensitive
      -math_library fast
      -O4 (the default)
      -tune host
    For f90 and f95, -fast also sets
      -align sequence
      -assume bigarrays
      -assume nozsize.  

fdo_pre0 = mkdir /tmp/pb; rm -f /tmp/pb/${baseexe}*  (CPU2000 config file) 
    Causes the SPEC tools to clean the directories where feedback is
    accumulated, to remove data from any previous compiles.

-feedback file (spike)
    Causes spike to use the feedback database in the named file.

-fixed (Compaq Fortran)
    Portability switch, used by galgel, indicates that source code is in
    fixed (72 column) format.

-fkapargs='...' (KAP Fortran)
    Pass the switches between apostrophes to the KAP optimizer.

-float (Compaq C)
    Tells the compiler that it is not necessary to promote expressions
    of type float to type double.

-fp_reorder (Compaq C, Compaq Fortran)
    Allows floating-point operations to be reordered during optimiza-
    tion based on algebraic identities.

+GEMFB (Compaq C)
    Use GEM (i.e. compiler) feedback.  This is an abbreviation used  to
    make the notes section simpler, and not an actual switch.   Look
    elsewhere in the notes section for the details.

-g3 (Compaq C, Compaq C++, Compaq Fortran)
    Allow symbols in optimized code

+IFB (notes only)
    As explained in the notes section, "+IFB" is merely an abbrevation
    to help readability of the notes.  Look elsewhere in the notes 
    section for a description of what this abbreviation means.

-ifo (Compaq C)
    Performs inter-file optimizations.

-inline speed (Compaq C, Compaq Fortran)
    Provides inline expansion of function calls even when doing so may
    significantly increase the size of the program.

 -[no]intrinsics
      The -intrinsics option causes the compiler to recognize intrinsic func-
      tions wherever it can automatically, based only on name and call signa-
      ture.

kcc (compiler)
    This command invokes the KAP C high-level optimizer and then
    invokes the Compaq C compiler.  Note: switches are passed to
    the KAP C optimizer within the -ckapargs="<kapc switches>" phrase, 
    other switches are directed to the Compaq C compiler.

kf77 (compiler)
    This command invokes the KAP Fortran 77 high-level optimizer and
    then invokes the Fortran 77 compiler.  When the f77 compiler is
    invoked, KAP adds the following switches:
     -fast
     -non_shared (single CPU only)
     -tune host                            

    Note: Optimization switches are passed to the KAP Fortran 77 optimizer 
    within the -fkapargs="<kapf switches>" phrase, other switches are 
    directed to the Compaq Fortran 77 compiler.

kf90 (compiler)
    This command invokes the KAP Fortran 90 high-level optimizer and
    then invokes the Fortran 90 compiler.  When the f90 compiler is
    invoked, KAP adds the following switches:
     -fast
     -non_shared (single CPU only)
     -tune host                            

    Note: Optimization switches are passed to the KAP Fortran 90 optimizer
    within the -fkapargs="<kapf switches>" phrase, other switches are 
    directed to the Compaq Fortran 90 compiler.

-ldensemalloc 
    Please see above, under "-all -ldensemalloc -none"

-ldxml (library)
    Specifies that the program should be linked with the Compaq extended
    math library, cxml, which incluces optimized BLAS functions

-math_library fast (Compaq Fortran)
    Select math library routines that provide faster performance.  For
    certain ranges of input values, the faster routines may not provide
    a result that is as accurate as provided by the default.

max_per_proc_address_space (Compaq Tru64 Unix)
    Current maximum amount, in bytes, of user process address space.

max_per_proc_data_size (Compaq Tru64 Unix)
    Maximum size, in bytes, of a data segement for each process.

max_proc_per_user (Compaq Tru64 Unix)
    This parameter is set in /etc/sysconfigtab to control how many 
    processes are allowed per user.

max_thread_per_user (Compaq Tru64 Unix)
    This parameter is set in /etc/sysconfigtab to control how many 
    threads are allowed per user.

-non_shared (ld)
    Directs the linker to produce a static executable.  The output 
    object created by the linker will not use any shared objects during
    execution.

-none
    Please see above, under "-all -ldensemalloc -none"

-noporder
    Disables procedure ordering.

-O0 through -O5 (Compaq Fortran)
    Fortran's general optimization level.  
    O0 disable all optimizations
    O1 local optimizations and common subexpressions
    O2 global optimizations such as code motion, strength reduction,
       lifetime analysis, and code scheduling 
    O3 additional global optimizations that may cost more space, such 
       as loop unrolling and code replication 
    O4 inline expansion 
    O5 software pipelining and loop transformation 
       
-O0 through -O4 (Compaq C)
    Compaq C's general optimization level.
    O0 disable all optimizations
    O1 local optimizations and common subexpressions global
       optimizations such as code motion, strength reduction, 
       lifetime analysis, and code scheduling
    O2 additional global optimizations that may cost more space,
       such as loop unrolling and code replication
    O3 inline expansion
    O4 software pipelining

    NOTE: when kcc is used, optimization levels are effectively one 
    less than stated in the command line, for historical reasons.   That
    is, "kcc -O4" eventually invokes the compiler backend with the same
    optimization level as would be used by "cc -O3".

-O0 through -O4 (Compaq C++)
    C++ General optimization level:
    O0 no optimization
    O1 Optimize for space
    O2 Optimize for time
    O3 Same as O2
    O4 Additional speed optimizations at the expense of space

-o file (spike)
   Names the desired output file

-o=n (KAP Fortran)
    KAP's general optimization level.  
    0 No optimization
    1 Induction variables recognized, loop interchanging
    2 Lifetime analysis
    3 Additional loop interchanging, wraparound variables
    4 Loop interchanging around reductions
    5 Array expansion

-o=n (KAP C)
    KAP C's general optimization level.  
    0 No loop optimization performed
      Only simple analysis performed
    1 Simple loop optimization performed
      Loops distributed to optimize only part of loop
    2 Loops in a loop nest optimized
      Lifetime analysis performed
      More powerful data dependence tests performed
    3 Special techniques used to break data dependence cycles
      Triangular loops recognized
      Loop interchanging performed to improve memory referencing
      Special case data dependence tests used
    4 Two versions of a loop generated to break data dependence
      arc when necessary
      Exact data dependence tests used
      Wraparound variables recognized
    5 Array expansion and loop fusion enabled

ONESTEP (SPEC CPU2000 config file)
    Setting ONESTEP=YES tells the SPEC tools to build from all sources
    in one step.  For more information, search for "ONESTEP" in the run
    rules.

per_proc_data_size (Compaq Tru64 Unix)
    Current maximum size, in bytes, of a data segement for each process.

+PFB (notes only)
    As explained in the notes section, "+PFB" is merely an abbrevation
    to help readability of the notes.  Look elsewhere in the notes 
    section for a description of what this abbreviation means.

-pipeline (Compaq C, Compaq Fortran)
    Enables software pipelining, that is, "wrap around" of loop
    iterations to reduce latency.

pixie (Program Analysis Tool)
    Invokes the profiling tool to instrument an executable image.
   
-prof_dir <directory> (Compaq C)
    Specifies a location to which the profiling data files (.Counts and
    .Addrs) are written.

-prof_gen_noopt (Compaq C)
    Generates an executable image that has profiling code added to it
    and which is not optimized (this may improve the profile accuracy).

-prof_use_feedback (Compaq C) 
    Uses profiling feedback to improve runtime performance.

readonly_strings (Compaq C)
    Makes string literals read-only for improved performance.

RM_SOURCES=<file> (SPEC CPU2000 config file)
    Tells the SPEC tools not to use a certain source file, normally
    because it will be replaced by a cxml library.

spike (optimizer)
    Performs code optimization after a program has been linked, such as
    code layout for efficient cache usage, deleting unreachable code,
    and optimization of address computations.  Spike is most effective
    when it uses profile information to guide optimization.

    Example: the first usage of this tool in a SPEC CPU2000 submission
    was in November, 2000, as:
        spike -feedback ${baseexe} -o tmp ${baseexe}; mv tmp ${baseexe}
    In the above commands, the SPEC tools use ${baseexe} to refer to 
    the filename of the executable without any directory specifiers 
    or extensions that the tools will add later.  (In this instance,
    "base" does not refer to the concept of base as used in the run
    rules.)  Spike optimizes the executable, looking for feedback
    data in the executable itself (placed there by the compiler).
    The output of spike is written to a temporary file, which is then 
    immediately moved to replace the original executable.

+SPIKEFB (optimizer)
    Use SPIKE feedback.  This is an abbreviation used to make the 
    notes section simpler, and not an actual switch.   Look elsewhere 
    in the notes section for the details.

-split_threshold n
-splitThresh n
    Adjusts the threshold used by procedure splitting in code layout to
    decide which code is frequently and infrequently executed. The default
    is .99, which means that the most frequently executed basic blocks that
    make up at least 99 percent of the estimated execution time are
    considered frequently executed and the rest are marked as infrequently
    executed. Increasing the threshold can help when the profile is not
    representative. For example, try a value of .999.

-stats dstride (pixie)
    Causes pixie to instrument the program to examine memory access
    strides.

-stride_prefetch (spike)
    Causes spike to optimize prefetches, based on feedback collected as
    to useful strides.

-transform_loops (Compaq Fortran)
-notransform_loops
    Enables/Disables a group of loop transformation optimizations that 
    apply to array references within loops, including loop  blocking,
    distribution, fusion, and interchange.

-tune <cpu> (Compaq C, Compaq Fortran)
    Generate code that is optimized for a particular cpu model.  This
    switch preferentially tunes for the specified model, but assumes
    that the code may be run on any processor that implements the
    instruction set called for in -arch.   For example, the combination
    "-tune ev6 -arch ev56" specifies that the code should be scheduled
    for ev6 class machines while still preserving the ability to run
    quickly on machines that lack the sqrt instruction.  See also -arch,
    above.

-unroll n (Compaq C, Compaq Fortran)
    Specify the depth of loop unrolling

-ur=<n> (KAP Fortran) [alternate spelling: -unroll]
    The maximum number of iterations to unroll inner loops.
    Used within -fkapargs="".

-v (all compilers)
    Turn on verbose mode, so the compiler driver will print its steps as
    it goes.  Has no effect on the generated executable.

-xtaso_short (Compaq C) 
    Directs the compiler to allocate 32-bit pointers by default.  You
    can still use 64-bit pointers, but only by the use of pragmas. 
    Using this switch can cause conflicts between the compiler's
    assumptions about pointer sizes and the assumptions in the system
    libraries.  Diagnostic messages will be generated at compile time 
    unless the installation option "protect_headers_setup(8)" has been 
    used [run: /usr/lib/cmplrs/cc/protect_headers_setup.sh -l ]. (It is, 
    in fact, used in the CPU2000 submissions, as requested by the manpage).