Flag description file for Sun compiled SPECcpu2000 binaries using the 
Sun Studio 11 Compiler and for the Solaris 10 OS.

This file is for flags used with the Opteron based systems.

Last updated:  1-Sep-2006

----------------------------------------------------------------------------
Sun Studio 11 compiler flags
----------------------------------------------------------------------------

Portability Flags:

-Dalloca=__builtin_alloca (Portability: SPEC Tools) Portability switch, used
           for 176.gcc: allow use of compiler's internal builtin alloca. 

-DFMAX_IS_DOUBLE Specifies whether FMAX is double or float.  Used in 252.eon.

-DHAS_LONGLONG (Portability: SPEC Tools) Portability switch, used for
           186.crafty: allows use of the designated prototype. 

-DHAS_STDIO_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype. 

-DHOST_WORDS_LITTLE_ENDIAN Portability switch, used for 176.gcc: Host system
           is little-endian.

-DLITTLE_ENDIAN_ARCH Portability switch, used for 186.crafty: Host
           architecture is little-endian.

-DSPEC_CPU2000_LP64  Compile using LP64 programming model. 

-DSPEC_CPU2000_SOLARIS_X86 (Portability: SPEC Tools) Portability switch, used
           for 253.perlbmk: selects header files and code paths compatible
           with Solaris.

-DSYS_HAS_ANSI  System is ANSI compliant.  Used in 254.gap.

-DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools) Do not supply a prototype for
           calloc().  Portability switch, used for 254.gap: allows use of the
           designated prototype.

-DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype.

-DSYS_HAS_MALLOC_PROTO (Portability: SPEC Tools) Do not supply a prototype for
           malloc().  Portability switch, used for 254.gap: allows use of the
           designated prototype.

-DSYS_HAS_READ_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype. 

-DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype.

-DSYS_HAS_STRING_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype. 

-DSYS_HAS_TIME_PROTO (Portability: SPEC Tools) Portability switch, used for
           254.gap: allows use of the designated prototype.

-DSYS_IS_USG (Portability: SPEC Tools) Portability switch, used for 254.gap:
           selects code compatible with USG-based systems. 

-DUNIX     Compile for a Unix system. Use portability settings like host
           endianess, OS type, and ANSI language extensions to be compatible
           with an UNIX systems.

-DUSE_STRERROR Portability switch, used for 252.eon: use C library function
           strerror.

-e         Accept extended (132 character) input source lines (FORTRAN)

-fixed     Accept fixed-format input source files (FORTRAN)

=================================================================
Optimization Flags:

-Abcopy (optimizer)  
           Increase the probability that the compiler will perform
           memcpy/memset transformations. 

-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>]
        [:mi][:recursion=1] (optimizer)

           Control the optimizer's loop inliner:

            cp=<n> The minimum call site frequency counter in order to
                   consider a routine for inlining.

            cs=<n> Set inline callee size limit to n. The unit roughly
                   corresponds to the number of instructions.

            inc=<n> The inliner is allowed to increase the size of the
                    program by up to n%.

            irs=<n> Allow routines to increase by up to n. The unit
                    roughly corresponds to the number of instructions.

            rs=<n>  The inliner only considers routines smaller than n pseudo
                    instructions as possible inline candidates.

            mi      Perform maximum inlining (without considering code
                    size increase).

            recursion=1 Allow routines that are called recursively to
                        still be eligible for inlining.

-Ashort_ldst (optimizer)  
           Convert multiple short memory operations into single long memory
           operations.

-Ashort_ldst:ldld:  
           Convert multiple short memory loads into single long load
           operations.

-Atile:skewp[:b<n>] (optimizer)
           Perform loop tiling which is enabled by loop skewing. Loop skewing
           is a transformation that transforms a non-fully interchangeable
           loop nest to a fully interchangeable loop nest. The optional b<n>
           sets the tiling block size to n.

-autopar   Enable automatic loop parallelization

           Find and parallelize appropriate loops. Do dependence analysis
           (analyze loops for data dependences). Do loop restructuring.  If
           optimization is not -O3 or higher, it is raised to -O3.

           Number of Threads:  To run a parallelized program in a
           multithreaded environment, you must set the PARALLEL or
           OMP_NUM_THREADS environment variable prior to execution. These
           variables tell the runtime system the maximum number of software
           threads the program can create. The default is 1.  Set PARALLEL or 
           OMP_NUM_THREADS to the desired level of parallelism, upto the 
           maximum number of hardware threads available on the target
           platform.  If you do not set either PARALLEL or OMP_NUM_THREADS, the
           code will run with the default value of 1.

           If PARALLEL and OMP_NUM_THREADS are both set, they must be set to
           the same value, otherwise the runtime library will issue an error
           message.

           If you use -autopar and compile and link in one step, linking will
           automatically include the microtasking library and the threads-safe
           Fortran runtime library.  If you use -autopar and compile and link
           in separate steps, then you must link with f95 -autopar as well.

-dalign    Selects generation of faster double word load/store instructions,
           and alignment of double and quad data on their natural boundaries
           in common blocks.  

-depend=yes  Selects dependence analysis to better optimize DO loops.

-fast      This is a convenience option for selecting a set of optimizations
           for performance and it chooses the following switches that are
           defined elsewhere in this page:

                          (C)
                            -fns
                            -fsimple=2
                            -fsingle
                            -ftrap=%none
                            -nofstore 
                            -xalias_level=basic
                            -xbuiltin=%all
                            -xdepend
                            -xlibmil
                            -xlibmopt
                            -xO5
                            -xregs=frameptr 
                            -xtarget=native

                          (Fortran)
                            -xtarget=native
                            -xO5
                            -xlibmil
                            -fsimple=2
                            -dalign
                            -xlibmopt
                            -depend=yes
                            -fns
                            -ftrap=common
                            -pad=local
                            -xvector=yes
                            -xprefetch=yes
                            -xprefetch_level=2
                            -nofstore 

-fns       Select non-standard floating point mode.
     
           This flag causes the nonstandard floating point mode to be enabled
           when a program begins execution. By default, the nonstandard
           floating point mode will not be enabled automatically.

           Warning: When nonstandard mode is enabled, floating point
           arithmetic may produce results that do not conform to the
           requirements of the IEEE 754 standard.  See the Numerical
           Computation Guide for more information (see docs.sun.com).

-fsimple=1 Select floating-point optimization preferences.  Allow conservative
           simplifications. The resulting code does not strictly conform to
           IEEE 754, but numeric results of most programs are unchanged.

           With -fsimple=1, the optimizer can assume the following:

              - IEEE 754 default rounding/trapping modes do not change after
                process initialization.

              - Computations producing no visible result other than potential
                floating point exceptions might be deleted.

              - Computations with Infinity or NaNs as operands need not
                propagate NaNs to their results; e.g., x*0 might be replaced
                by 0.

              - Computations do not depend on sign of zero.

           With -fsimple=1, the optimizer is not allowed to optimize
           completely without regard to roundoff or exceptions. In particular,
           a floating-point computation cannot be replaced by one that
           produces different results with rounding modes held constant at run
           time.

-fsimple=2 Selects aggressive floating-point optimizations.  This option might
           be unsuited for programs requiring strict IEEE 754 standards
           compliance.

-fsingle   (-Xt and -Xs modes only)  Causes the compiler to evaluate float
           expressions as single precision, rather than double precision.
           (This option has no effect if the compiler is used in either -Xa or
           -Xc modes, as float expressions are already evaluated as single
           precision.)

-ftrap=t   Sets the IEEE 754 trapping mode in effect at startup.

           t is a comma-separated list that consists of one or more of the
           following: %all, %none, common, [no%]invalid, [no%]overflow,
           [no%]underflow, [no%]division, [no%]inexact.

           The default is -ftrap=%none.

           This option sets the IEEE 754 trapping modes that are established
           at program initialization. Processing is left-to-right. 

           common - invalid, division by zero, and overflow.

           %none - the default, turns off all trapping modes.

           Do not use this option for programs that depend on IEEE standard
           exception handling; you can get different numerical results,
           premature program termination, or unexpected SIGFPE signals.

-lbsdmalloc  General purpose memory allocation package supports routines
           malloc, free and realloc. They maintain a table of free blocks for
           efficient allocation and coalescing of free storage. When there is
           no suitable  space already  free, the allocation routines call
           sbrk(2) to get more memory from the  system.  Additional
           information from can be obtained from bsdmalloc man page and the
           follow section from the ld man page:

-lm        Link with math library

-lmopt     This chooses the math library that is optimized for speed

-lmvec     Link with vector math library 


-M <mapfile> Reads mapfile as a text file of directives to ld.  This option
           can be specified multiple times.  If mapfile is a directory, then
           all regular files,  as  defined   by stat(2),  within the directory
           are processed.  See Linker and Libraries Guide for a description of
           mapfiles. Example mapfiles are provided in /usr/lib/ld. See FILES.

-M /usr/lib/ld/map.bssalign
           Linker mapfile that enables the creation of a 'bss' segment, and
           aligns the segment at 4Mb.  This effectively provides an
           appropriate alignment for large page mapping of the heap, and thus
           can be useful when building dynamic executables.  See ppgsz(1)

-nofstore  Cancels forcing expressions to have the precision of the result.  

-O         Synonym for -O3 (Fortran, C++)

-On        Synonym for -xO[n] (Fortran, C++)

-pad=local Local padding to improve use of cache.

-Qoption <pr> <ls>      
             Pass option list <ls> to the compiler phase <pr> 

                     CC      C++ front end
                     cg      Code generator
                     f90comp Fortran first pass
                     iropt   Global optimizer
                     ld      linker
                     ube_ipa interprocedural analyzer

           "-Qoption" can also be spelled as 
           "-qoption"

-qoption CC -iropt-prof      
           Use iropt in the profile phase of the compiler iropt is the
           Global optimizer.

-Qoption iropt -Aujam:inner=g      
           Increase the probability that small-trip-count inner loops will
           be fully unrolled.

-Qoption iropt -Rloop_dist   
           Do not perform loop distribution transformations.

-Qoption ld -M,/usr/lib/ld/map.bssalign
           Pass "-M,/usr/lib/ld/map.bssalign" option to the linker (ld)
           component.

           Linker mapfile that enables the creation of a 'bss' segment, and
           aligns the segment at 4Mb.  This effectively provides an
           appropriate alignment for large page mapping of the heap, and thus
           can be useful when building dynamic executables.  See ppgsz(1)

-Qoption ube_ipa -inl_alt (Fortran x86)
           Invokes Interprocedural analyzer (x86).

-Qoption ube -xcallee=no     
           Do not assume callee-save registers are saved.  -xcallee=yes is
           the default.

RM_SOURCES = lapak.f90 (SPEC tools)
           This option allows building the benchmark 178.galgel without its
           copy of the lapak sources; instead, it links lapak routines from
           the sunperf library.

RM_SOURCES = cfftb.f90 cffti.f90 cfftf.f90 (SPEC tools)
           This option allows building the benchmark 187.facerec without its
           copy of the cfftb.f90 cffti.f90 cfftf.f90 sources; instead, it
           links routines from the sunperf library.

-stackvar  Force all local variables to be allocated on the stack.

           Allocates all the local variables and arrays in routines onto the
           memory stack unless otherwise specified. This option makes these
           variables automatic rather than static and provides more freedom to
           the optimizer when parallelizing loops with calls to subprograms.

-unroll=n  Enable unrolling of DO loops n times where possible.

                  n is a positive integer.
                  n = 1, inhibits all loop unrolling
                  n > 1, this option suggests to the optimizer that
                         it unroll loops n times.

           If any loops are actually unrolled, then the executable file is
           larger.

-W2,-Arestrict_g  Assumes global pointers are not aliased (restricted).

-W2,-switch[,-switch...] (C) 
           Send the listed switch(es) to the global optimizer. See the
           definitions of the individual switches elsewhere in this page.

-Wd,-iropt-prof  Use iropt in the profile phase of the compiler iropt is the
           Global optimizer.

-xalias_level[=<a>]     
           where <a> is  one of:any, basic, weak, layout, strict, std, strong.
           It allows compiler to perform type-based alias analysis at the
           given alias level (C). If you do not supply <a> with -xalias_level,
           the compiler assumes -xalias_level=any.

              - any  -  The compiler assumes that all memory references can
                alias at this level. There is no type-based alias analysis.

              - basic - assume ISO C9X aliasing rules for basic types only.

              - std - assume ISO C9X aliasing rules.

              - strong - assume all pointers are type safe (strongly typed).

-xarch=isa This option limits the code generated by the compiler to the
           instructions of the specified instruction set architecture.

             - generic   This is the   default.  This option generates 32-bit
                         applications.

             - sse2      Adds the SSE2 instruction set to the 32-bit
                         pentium_pro instruction set.

             - sse2a     Adds the AMD extensions to SSE2 instruction set.

             - amd64     Compile 64-bit Solaris x86 applications.

             - amd64a    Adds the AMD extensions (3DNow!, 3DNow!  extensions,
                         and MMX extensions) to the AMD64 architecture and
                         generates 64-bit ELF format binary file.

             - native    This is the   default for the -fast option. The
                         compiler chooses the appropriate setting for the
                         current system processor it is running on and
                         generates 32-bit applications.

           If -xarch=isa is specified multiple times the last specification
           applies.

-xautopar  Synonym for -autopar

-xbuiltin=%all 
           Substitute intrinsic functions or inline system 
           functions where profitable for performance.

-Xc        Specifies the degree of conformance to the ISO C standard. Strictly
           conformant ISO C, without K&R C compatibility extensions.

-xcache=c Define cache for optimizer 
          c must be one of the following:
             o generic
             o native
             o s1/l1/a1[/t1]
             o s1/l1/a1[/t1]:s2/l2/a2[/t2]
             o s1/l1/a1[/t1]:s2/l2/a2[/t2]:s3/l3/a3[/t3]

          The si, li, ai, and ti, are defined as follows:

             - si The size of the data cache at level i, in kilobytes
             - li The line size of the data cache at level i, in bytes
             - ai The associativity of the data cache at level i
             - ti The number of hardware threads sharing the cache at level i
               The ti parameters are optional. A value of 1 is used if not
               present.

          This option specifies the cache properties that the optimizer can
          use. It does not guarantee that any particular cache property is
          used.

-xcrossfile[=<n>]       
          Enable optimization and inlining across source files, n={0|1}.  The
          default is -xcrossfile=0 which specifies that no cross file
          optimizations are performed.  -xcrossfile is equivalent to
          -xcrossfile=1. 

          Normally, the scope of the compiler's analysis is limited to each
          separate file on the command line.  With -xcrossfile, the compiler
          analyzes all the files named on the command line as if they had been
          concatenated into a single source file.

-xdepend  Analyze loops for data dependencies.

-xipo[=<n>] Enable optimization and inlining across source files, n={0|1|2}.
          At -xipo=2, the compiler performs interprocedural aliasing analysis
          as well as optimiza- tion of memory allocation and layout to improve
          cache performance.

-xlibmil  selects inlining of certain math library routines.

-xlibmopt Selects linking the optimized math library.

-xlic_lib=sunperf Link in the Sun supplied performance libraries

-xO1      Does basic local optimization (peephole).

-xO2      xO1 and more local and global optimizations.

-xO3      Besides what xO2 does, it optimizes references or definitions for
          external variables. Loop unrolling and software pipelining are also
          performed.

-xO4      xO3 plus function inlining.

-xO5      Besides what xO4 does, it enables speculative code motion.

-xpad=common[:<n>] (Fortran)
          If multiple same-sized arrays are placed in common, insert padding
          between them for better use of cache. n specifies the amount of
          padding to apply, in units that are the same size as the array
          elements. If no parameter is specified then the compiler selects one
          automatically.

-xpagesize=<n> 
          Set the preferred page size for running the program.  The n value
          must be one of the following: 4K 2M 4M

          You must specify a valid page size for the Solaris OS on the target
          platform, as returned by getpagesize(3C).  If you do not specify a
          valid page size, the request is silently ignored at run-time.  The
          Solaris OS offers no guarantee that the page size request will be
          honored.

-xpagesize_heap=<n> (C, Fortran) 
          Set the preferred heap page size for running the program.  n is the
          same as described for -xpagesize.  You must specify a valid page
          size for the Solaris OS on the target platform, as returned by
          getpagesizes(3C).  If you do not specify a valid page size, the
          request is silently ignored at run-time.

-xpagesize_stack=<n> (C, Fortran) 
          Set the preferred stack page size for running the program. 

          n is the same as described for -xpagesize.  You must specify a valid
          page size for the Solaris OS on the target platform, as returned by
          getpagesizes(3C).  If you do not specify a valid page size, the
          request is silently ignored at run-time.

-xprefetch_level[=<n>]  
          Controls the aggressiveness of the -xprefetch=auto option
          (n={1|2|3})

          -xprefetch_level=1 enables automatic generation of prefetch
          instructions. -xprefetch_level=2 enables additional generation
          beyond level 1 and -xprefetch=3 enables additional generation beyond
          level 2.

-xprefetch[=val[,val]]  
          Enable prefetch instructions on those architectures that support
          prefetch.

            - auto Enable automatic generation of prefetch instructions.
            - no%auto Disable automatic generation of prefetch instructions
            - explicit Enable explicit prefetch macros
            - no%explicit Disable explicit prefetch macros
            - no -xprefetch=no is the same as -xprefetch=no%auto,no%explicit
            - yes -xprefetch=yes is the same as -xprefetch=auto,explicit

          Defaults

            If -xprefetch is not specified, -xprefetch=no%auto,explicit is
            assumed.

            If only -xprefetch is specified, -xprefetch=auto,explicit is
            assumed.

-xprofile   Use the profile feature, shorthand used for the process below

-xprofile=<p>  Collect data for a profile or use a profile to optimize 

          <p>={{collect,use}[:<path>],tcov}

              - collect[:name] Collects and saves execution frequency for
                later use by the optimizer with -xprofile=use. The compiler
                generates code to measure statement execution-frequency.

              - use[:name] Uses execution frequency data to optimize
                strategically. The name is the name of the executable that is
                being analyzed.

-xregs=<r>  Specify the usage of optional registers

-xregs=r[,r...] Specify the usage of registers for the generated code.  r is a
           comma-separated list of one or  more of the following:  [no%]appl,
           [no%]float, [no%]frameptr.

           - [no%]frameptr   (x86 only): [Does not] Allow the    compiler to
             use the frame-pointer register (%ebp  on IA32, %rbp on AMD64) as
             an unallocated callee-saves register.

           Using this register as an unallocated callee- saves register may
           improve program run time.  However, it also reduces the capacity of
           some tools, such as the Performance Analyzer and dtrace, to inspect
           and follow the stack. This stack inspection capability is important
           for system performance measurement and tuning.  Therefore, using
           this  optimization may improve local program performance at  the
           expense of global system performance.

-xrestrict   Treat pointer-valued function parameters as restricted pointers. 

-xtarget=native  Selects options appropriate for the system where the compile
           is taking place, including architecture, chip, and cache sizes. 

-xvector   Enable automatic generation of calls to the vector library
           functions.  Specifying -xvector is equivalent to -xvector=yes.

          It permits the compiler to transform math  library calls within DO
          loops into single calls to the equivalent vector math routines when
          such transformations are possible. This could result in a
          performance improvement for loops with large loop counts.

-xvector=simd  Automatic generation of the vector SIMD instructions

=================================================================
Shell options

PARALLEL=n 
           To run a parallelized program in a multithreaded environment,
           PARALLEL environment variables is set prior to execution to specify
           the number of software threads to use.  For more information, see
           the discussion under -autopar in the compiler flags section.

submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix)

   When running multiple copies of benchmarks, the SPEC config file feature
   "submit" is sometimes used to cause individual jobs to be bound to specific
   processors:

    * "submit=" causes the SPEC tools to use this line when submitting jobs.

    * "echo ...> dobmk" causes the generated commands to be written to a
      file, namely dobmk.

    * "pbind -b" causes this copy's processes to be bound to the CPU
      specified by the expression that follows it. See the config file used in
      the submission for the exact syntax, which tends to be cumbersome
      because of the need to carefully quote parts of the expression. When all
      expressions are evaluated, each CPU ends up with exactly one copy of
      each benchmark. The pbind expression may include:

        - "$SPECUSERNUM": the SPEC tools-assigned number for this copy of the
          benchmark.

        - "expr": Calculate simple arithmetic expressions. For example, the
          effect of binding jobs to a (quote-resolved) expression such as:
          expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) ) would be to
          send the jobs to processors whose numbers are: 0,1,2,3, 8,9,10,11,
          16,17,18,19 ...

        - "psrinfo": find out what processors are available

        - "grep on-line": search the psrinfo output for information regarding
          on-line cpus

    * "awk...print \$1": Pick out the line corresponding to this copy of the
      benchmark and use the CPU number mentioned at the start of this line.

    * sh dobmk actually runs the benchmark. 

ulimit -s unlimited     Set size of stack segment to unlimited

=================================================================
Kernel Parameters (/etc/system):

autoup=<n> (Unix)
           When the file system flush daemon fsflush runs, it will write to
           disk all modified file buffers that are more than n seconds old.

tune_t_fsflushr=<n> (Unix)
           Controls the number of seconds between runs of the file system
           flush daemon,