Flag description file for Sun compiled SPECcpu2000 binaries using the 
Sun Studio 11 Compiler and for the Solaris 10 OS.

This file is for flags used with the Opteron based systems.

----------------------------------------------------------------------------
Sun Studio 11 compiler flags
----------------------------------------------------------------------------

Portability Flags:

-DSPEC_CPU2000_LP64           Compile using LP64 programming model. 

-DFMAX_IS_DOUBLE              Specifies whether FMAX is double or float.
                              Used in 252.eon.

-DSYS_HAS_ANSI                System is ANSI compliant.
                              Used in 254.gap.

-DUNIX 
Compile for a Unix system. Use portability settings like host endianess,
OS type, and ANSI language extensions to be compatible with an UNIX
systems.

-DUSE_STRERROR

-Dalloca=__builtin_alloca (Portability: SPEC Tools)
Portability switch, used for 176.gcc: allow use of compiler's internal builtin alloca. 

-DSPEC_CPU2000_SOLARIS_X86 (Portability: SPEC Tools)
Portability switch, used for 253.perlbmk: selects header files and code
paths compatible with Solaris.

-DHOST_WORDS_LITTLE_ENDIAN 
Portability switch, used for 176.gcc: Host system is little-endian.

-DLITTLE_ENDIAN_ARCH 
Portability switch, used for 186.crafty: Host architecture is little-endian.

-DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools)
Do not supply a prototype for calloc().
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_MALLOC_PROTO (Portability: SPEC Tools)
Do not supply a prototype for malloc().
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_TIME_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_IS_USG (Portability: SPEC Tools)
Portability switch, used for 254.gap: selects code compatible with USG-based systems. 

-DHAS_LONGLONG (Portability: SPEC Tools)
Portability switch, used for 186.crafty: allows use of the designated prototype. 

-DHAS_STDIO_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype. 

-DSYS_HAS_READ_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype. 

-DSYS_HAS_STRING_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype. 

-e                      Accept extended (132 character) input source lines
                        (FORTRAN)

-fixed                  Accept fixed-format input source files (FORTRAN)


-Xa                     Portability flag. Specifies the degree of
			conformance to the ISO C standard. This is the
			default compiler mode. ISO C plus K&R C
			compatibility extensions, with semantic changes
			required by ISO C.

Optimization Flags:


-D                      Set definition for preprocessor.

-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>]
        [:mi][:recursion=1] (optimizer)

	    Control the optimizer's loop inliner:

	    cp=<n> The minimum call site frequency counter in order to
	           consider a routine for inlining.

	    cs=<n> Set inline callee size limit to n. The unit roughly
	           corresponds to the number of instructions.

	    inc=<n> The inliner is allowed to increase the size of the
	            program by up to n%.

	    irs=<n> Allow routines to increase by up to n. The unit
	            roughly corresponds to the number of instructions.

	    rs=<n>  The inliner only considers routines smaller than n pseudo
	   	    instructions as possible inline candidates.

	    mi      Perform maximum inlining (without considering code
	            size increase).

	    recursion=1 Allow routines that are called recursively to
			still be eligible for inlining.


-Wd,-iropt-prof              Use iropt in the profile phase of the compiler
                             iropt is the Global optimizer.

-qoption CC -iropt-prof	     Use iropt in the profile phase of the compiler
                             iropt is the Global optimizer.

-Qoption ube -xcallee=no     Do not assume callee-save registers are saved.
-Qoption ube -xcallee=yes    -xcallee=yes is the default.

-Qoption iropt -Rloop_dist   Do not perform loop distribution transformations.

-W2,-Arestrict_g              Assumes global pointers are not aliased (restricted).

-Abcopy (optimizer) 	     Increase the probability that the compiler will perform 
                             memcpy/memset transformations. 

-Ashort_ldst (optimizer)  :  Convert multiple short memory operations into
                             single long memory operations.

			     -Ashort_ldst:ldld:  Convert multiple short
			     memory loads into single long load
			     operations.

-Atile:skewp[:b<n>] (optimizer)

			Perform loop tiling which is enabled by loop
			skewing. Loop skewing is a transformation that
			transforms a non-fully interchangeable loop nest
			to a fully interchangeable loop nest. The optional
			b<n> sets the tiling block size to n.


-dalign                 Selects generation of faster double word load/store 
                        instructions, and alignment of double and quad data 
                        on their natural boundaries in common blocks.  

-depend=yes             Selects dependence analysis to better optimize DO loops.


-fast                   This is a convenience option for selecting a set
                        of optimizations for performance and it chooses
                        the following switches that are defined elsewhere
                        in this page:

                          (C)
                            -fns
                            -fsimple=2
                            -fsingle
                            -ftrap=%none
                            -nofstore 
                            -xalias_level=basic
                            -xbuiltin=%all
                            -xdepend
                            -xlibmil
                            -xlibmopt
                            -xO5
                            -xregs=frameptr 
                            -xtarget=native

                          (Fortran)
                            -xtarget=native
                            -xO5
                            -xlibmil
                            -fsimple=2
                            -dalign
                            -xlibmopt
                            -depend=yes
                            -fns
                            -ftrap=common
                            -pad=local
                            -xvector=yes
                            -xprefetch=yes
                            -xprefetch_level=2
                            -nofstore 


-fns                    Select non-standard floating point mode.
     
                        This flag causes the nonstandard floating point mode 
                        to be enabled when a program begins execution. By 
                        default, the nonstandard floating point mode will not 
                        be enabled automatically.

                        Warning: When nonstandard mode is enabled, floating 
                        point arithmetic may produce results that do not 
                        conform to the requirements of the IEEE 754 standard. 
                        See the Numerical Computation Guide for more 
                        information (see docs.sun.com).


-fsimple=1              Select floating-point optimization preferences.
			Allow conservative simplifications. The resulting
			code does not strictly conform to IEEE 754, but
			numeric results of most programs are unchanged.

			With -fsimple=1, the optimizer can assume the
			following:

			IEEE 754 default rounding/trapping modes do not
			change after process initialization.

			Computations producing no visible result other
			than potential floating point exceptions might be
			deleted.

			Computations with Infinity or NaNs as operands
			need not propagate NaNs to their results; e.g.,
			x*0 might be replaced by 0.

			Computations do not depend on sign of zero.

		        With -fsimple=1, the optimizer is not allowed to
		        optimize completely without regard to roundoff or
		        exceptions. In particular, a floating-point
		        computation cannot be replaced by one that produces
		        different results with rounding modes held constant
		        at run time.

-fsimple=2              Selects aggressive floating-point optimizations.
			This option might be unsuited for programs
			requiring strict IEEE 754 standards compliance.

-fsingle                (-Xt and -Xs modes only)  Causes the compiler to
                        evaluate float expressions as single precision,
                        rather than double precision.  (This option has no
                        effect if the compiler is used in either -Xa or
                        -Xc modes, as float expressions are already
                        evaluated as single precision.)

-ftrap=t                Sets the IEEE 754 trapping mode in effect at startup.

                        t is a comma-separated list that consists of one or 
                        more of the following: %all, %none, common, 
                        [no%]invalid, [no%]overflow, [no%]underflow, 
                        [no%]division, [no%]inexact.

                        The default is -ftrap=%none.

                        This option sets the IEEE 754 trapping modes that are 
                        established at program initialization. Processing is
                        left-to-right. 

                        common - invalid, division by zero, and overflow.

                        %none - the default, turns off all trapping modes.

			Do not use this option for programs that depend on
			IEEE standard exception handling; you can get
			different numerical results, premature program
			termination, or unexpected SIGFPE signals.


-lbsdmalloc             General purpose memory allocation package supports
			routines malloc, free and realloc. They maintain a
			table of free blocks for efficient allocation and
			coalescing of free storage. When there is  no
			suitable  space already  free, the allocation
			routines call sbrk(2) to get more memory from
			the  system.  Additional information from can be
			obtained from bsdmalloc man page and the follow
			section from the ld man page:


-lm                     Link with math library

-lmopt                  This chooses the math library that is optimized for 
                        speed

-M <mapfile>            Reads mapfile as a text file of directives to ld.
			This option  can be specified multiple times.  If
			mapfile is a directory,  then  all  regular
			files,  as  defined   by stat(2),  within the
			directory are processed. See Linker and Libraries
			Guide for a description of mapfiles. Example
			mapfiles are provided in /usr/lib/ld. See FILES.

-M /usr/lib/ld/map.bssalign
			Linker mapfile that enables the creation of a
			'bss' segment, and aligns the segment at 4Mb.
			This effectively provides an appropriate alignment
			for large page mapping of the heap, and thus can
			be useful when building dynamic executables.  See
			ppgsz(1)

-Qoption ld -M,/usr/lib/ld/map.bssalign
			Pass "-M,/usr/lib/ld/map.bssalign" option to the 
                        linker (ld) component.

			Linker mapfile that enables the creation of a
                        'bss' segment, and aligns the segment at 4Mb.
                        This effectively provides an appropriate alignment
                        for large page mapping of the heap, and thus can
                        be useful when building dynamic executables.  See
                        ppgsz(1)

-nofstore               Cancels forcing expressions to have the precision of 
                        the result.  

-pad=local              Local padding to improve use of cache.


-stackvar               Force all local variables to be allocated on the stack.

			Allocates all the local variables and arrays in
			routines onto the memory stack unless otherwise
			specified. This option makes these variables
			automatic rather than static and provides more
			freedom to the optimizer when parallelizing loops
			with calls to subprograms.


-xalias_level[=<a>]     where <a> is  one of:any, basic, weak, layout,
			strict, std, strong.  It allows compiler to perform
			type-based alias analysis at the given alias level
			(C). If you do not supply <a> with -xalias_level,
			the compiler assumes -xalias_level=any.

			any  -  The compiler assumes that all memory
				references can alias at this level. There
				is no type-based alias anaylysis.

                        basic - assume ISO C9X aliasing rules for basic types 
                        only.

                        std - assume ISO C9X aliasing rules.

                        strong - assume all pointers are type safe (strongly 
                        typed).

-xarch=isa              This option limits the code generated by the
                        compiler to the instructions of the specified
                        instruction set architecture.

			generic   This is the   default.  This option
				  generates 32-bit applications.

                        sse2      Adds the SSE2 instruction set 

                        amd64     Compile 64-bit Solaris x86 applications.

			native    This is the   default for the -fast
				  option. The compiler chooses the
				  appropriate setting for the current
				  system processor it is running on and
				  generates 32-bit applications.

-xbuiltin=%all          Substitute intrinsic functions or inline system 
                        functions where profitable for performance.

-Xc                     Specifies the degree of conformance to the ISO C
			standard. Strictly conformant ISO C, without K&R C
			compatibility extensions.


-xcrossfile[=<n>]       Enable optimization and inlining across source
                        files, n={0|1}.  The default is -xcrossfile=0
                        which specifies that no cross file optimizations
                        are performed.  -xcrossfile is equivalent to
                        -xcrossfile=1. 

                        Normally, the scope of the compiler's analysis is
                        limited to each separate file on the command line.
                        With -xcrossfile, the compiler analyzes all the
                        files named on the command line as if they had
                        been concatenated into a single source file.

-xdepend                Analyze loops for data dependencies.

-xipo[=<n>]             Enable optimization and inlining across source
                        files, n={0|1|2}.  At -xipo=2, the compiler
                        performs interprocedural aliasing analysis as well
                        as optimiza- tion of memory allocation and layout
                        to improve cache performance.

-xlibmil                selects inlining of certain math library routines.

-xlibmopt               Selects linking the optimized math library.

-xlic_lib=sunperf       Link in the Sun supplied performance libraries

-O  (Fortran)           Use of -O (which implies -O3) 
			-xO[n] Synonym for -O[n].

-xO1                    Does basic local optimization (peephole).

-xO2                    xO1 and more local and global optimizations.

-xO3                    Besides what xO2 does, it optimizes references or 
                        definitions for external variables. Loop unrolling and 
                        software pipelining are also performed.

-xO4                    xO3 plus function inlining.

-xO5                    Besides what xO4 does, it enables speculative code 
                        motion.

-xprefetch_level[=<n>]  Controls the aggressiveness of the -xprefetch=auto 
                        option (n={1|2|3})

			-xprefetch_level=1 enables automatic generation of
			prefetch   instructions. -xprefetch_level=2
			enables additional generation beyond level 1 and
			-xprefetch=3 enables additional generation beyond
			level 2.

-xprefetch[=val[,val]]  Enable prefetch instructions on those architectures
                        that support prefetch.

                        auto      

                          Enable automatic generation of prefetch
                          instructions.

                        no%auto

                          Disable automatic generation of prefetch instructions

                        explicit

                          Enable explicit prefetch macros

                        no%explicit

                          Disable explicit prefetch macros

                        yes

                          -xprefetch=yes is the same as
                          -xprefetch=auto,explicit

                        no

                          -xprefetch=no is the same as
                          -xprefetch=no%auto,no%explicit

                        Defaults

                          If -xprefetch is not specified,
                          -xprefetch=no%auto,explicit is assumed.

                          If only -xprefetch is specified,
                          -xprefetch=auto,explicit is assumed.


-xprofile               Use the profile feature, shorthand used for the process
                        below

-xprofile=<p>           Collect data for a profile or use a profile to optimize 
                        <p>={{collect,use}[:<path>],tcov}

                        collect[:name]

                        Collects and saves execution frequency for later
                        use by the optimizer with -xprofile=use. The
                        compiler generates code to measure statement
                        execution-frequency.

                        use[:name]

                        Uses execution frequency data to optimize
                        strategically. The name is the name of the
                        executable that is being analyzed.

-xregs=<r>              Specify the usage of optional registers


-xregs=r[,r...]         Specify the usage of registers for the generated code.
			r is a comma-separated list of one or  more of the
			following:  [no%]appl, [no%]float,
			[no%]frameptr.

			[no%]frameptr	(x86 only):

			[Does not] Allow the    compiler to use the
			frame-pointer register (%ebp  on IA32, %rbp on
			AMD64) as an unallocated callee-saves register.

			Using this register as an unallocated callee-
			saves register may improve program run time.
			However, it also reduces the capacity of some
			tools, such as the Performance Analyzer and
			dtrace, to inspect and follow the stack. This
			stack inspection capability is important for
			system performance measurement and tuning.
			Therefor, using this  optimization may improve
			local program performance at  the expense of
			global system performance.

-xrestrict              Treat pointer-valued function parameters as restricted pointers. 

-xtarget=native         Selects options appropriate for the system where
			the compile is taking place, including
			architecture, chip, and cache sizes. 

-xvector                Enable automatic generation of calls to the vector
			library functions.  Specifying -xvector is
			equivalent to -xvector=yes.

			It permits the compiler to transform math  library
			calls within DO loops into single calls to the
			equivalent vector math routines when such
			transformations are possible. This could result in
			a performance improvement for loops with large
			loop counts.


-xvector=simd           Automatic generation of the vector SIMD instructions


-Qoption <pr> <ls>      Pass option list <ls> to the compiler phase <pr>
			(Fortran, C++):

                        f90comp Fortran first pass

                        iropt Global optimizer

                        cg Code generator

-Qoption iropt -Aujam:inner=g      

			Increase the probability that small-trip-count
			inner loops will be fully unrolled.

-Qoption ube_ipa -inl_alt (Fortran x86)
			Invokes Interprocedural analyzer (x86).

-W2,-switch[,-switch...] (C) 
			Send the listed switch(es) to the global
			optimizer. See the definitions of the individual
			switches elsewhere in this page.

-xpad=common[:<n>] (Fortran)
			If multiple same-sized arrays are placed in
			common, insert padding between them for better use
			of cache. n specifies the amount of padding to
			apply, in units that are the same size as the
			array elements. If no parameter is specified then
			the compiler selects one automatically.

-xpagesize=<n> (C, Fortran) 
                        Set the preferred page size for running the program. 
			The n value must be one of the following:

                        On x86:
                        4K 2M 4M

			You must specify a valid page size for the Solaris
			OS on the target platform, as returned by
			getpagesize(3C).  If you do not specify a valid
			page size, the request is silently ignored at
			run-time.  The Solaris OS offers no guarantee that
			the page size request will be honored.


-xpagesize_heap=<n> (C, Fortran) 
                        Set the preferred heap page size for running the program. 
			n is the same as described for -xpagesize.  You
			must specify a valid page size for the Solaris OS
			on the target platform, as returned by
			getpagesizes(3C).  If you do not specify a valid
			page size, the request is silently ignored at
			run-time.


-xpagesize_stack=<n> (C, Fortran) 
                        Set the preferred stack page size for running the program. 

			n is the same as described for -xpagesize.  You
			must specify a valid page size for the Solaris OS
			on the target platform, as returned by
			getpagesizes(3C).  If you do not specify a valid
			page size, the request is silently ignored at
			run-time.
			


-xprofile=<p>           Collect or optimize with runtime profiling data <p> must be 
                        collect[:nm], use[:nm], or tcov.  At runtime a program compiled 
                        with -xprofile=collect:nm will create the subdirectory nm.profile 
                        to hold the runtime feedback information. nm is an optional name.

-xprofile=collect       Collect profile data for feedback directed optimizations.

-xprofile=use           Use data collected for profile feedback.

ulimit -s unlimited     Set size of stack segment to unlimited

submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix)

When running multiple copies of benchmarks, the SPEC config file feature
submit is sometimes used to cause individual jobs to be bound to specific
processors:


    * submit= causes the SPEC tools to use this line when submitting
      jobs.

    * echo ...> dobmk causes the generated commands to be written to a
      file, namely dobmk.

    * pbind -b causes this copy's processes to be bound to the CPU
      specified by the expression that follows it. See the config file
      used in the submission for the exact syntax, which tends to be
      cumbersome because of the need to carefully quote parts of the
      expression. When all expressions are evaluated, each CPU ends up
      with exactly one copy of each benchmark. The pbind expression may
      include:

          o $SPECUSERNUM: the SPEC tools-assigned number for this copy of the benchmark.
	  o expr: Calculate simple arithmetic expressions. For example,
	    the effect of binding jobs to a (quote-resolved) expression such as:
            expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) )
            would be to send the jobs to processors whose numbers are:
            0,1,2,3, 8,9,10,11, 16,17,18,19 ...
          o psrinfo: find out what processors are available

	  o grep on-line: search the psrinfo output for information
	    regarding on-line cpus

    o awk...print \$1: Pick out the line corresponding to this copy of the
      benchmark and use the CPU number mentioned at the start of this line.

* sh dobmk actually runs the benchmark. 

Kernel Parameters (/etc/system):

autoup=<n> (Unix)
When the file system flush daemon fsflush runs, it will write to disk all
modified file buffers that are more than n seconds old.

tune_t_fsflushr=<n> (Unix)
Controls the number of seconds between runs of the file system flush daemon,