Copyright 2003, 2004 PathScale, Inc. All Rights Reserved.

PathScale EKO Compiler Suite (Fortran, C and C++ compilers)
flag descriptions, for SPEC CPU2000 submissions.

Portability Flags:

-DSPEC_CPU2000_LP64           Compile using LP64 programming model.
-DLINUX_i386                  Linux Intel system, use "long long" as
                                  64bit variable.
-DHAS_ERRLIST                 Prog env provides specification for
                                  "sys_errlist[]".
-DNDEBUG                      Defining this disables any assert macros used for
	                          debugging.
-DSPEC_CPU2000_NEED_BOOL      Use SPEC provided definition of the boolean type.
-DSPEC_CPU2000_LINUX_I386     Compile for an I386 system running Linux.
-DPSEC_CPU2000_GLIBC22        Compatibility with 2.2 & later versions of glibc
-DSYS_IS_USG                  Specifies that the operating system is
                                  USG compliant.
-DSYS_HAS_TIME_PROTO          Do not explicitly declare  time().
-DSYS_HAS_SIGNAL_PROTO        Do not explicitly #include <signal.h>
-DSYS_HAS_IOCTL_PROTO         Do not explicitly declare  ioctl().
-DSYS_HAS_ANSI                System is ANSI compliant.
-DSYS_HAS_CALLOC_PROTO        Do not explicitly declare  calloc().
-DHAVE_SIGNED_CHAR            System supports a "signed char" type.
-DWANT_STDC_PROTO             Use function prototypes as in standard C.
-fixedform                    tells f90 compiler to use fixed format
                                  (F77 72 column format), instead of F90 free
                                  format.


Optimization Flags:

Some suboptions either enable or disable the feature. To enable a feature,
either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling
a feature, is accomplished by adding =0, =OFF, or =FALSE.  These values are
insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate
the enabling or disabling of a feature.

-CG[:...]
              Code Generation option group: control the optimizations
              and transformations of the instruction-level code generator.

-CG:cflow=(ON|OFF)
              A value of OFF disables control flow optimization in the code
              generation. Default is ON.

-CG:gcm=(ON|OFF)
              Specifying OFF disables the instruction-level global code
              motion optimization phase. The default is ON.

-CG:load_exe=n
              Specifies the threshold for subsuming a memory load operation
              into the operand of an arithmetic instruction.  The value of 0
              turns off this subsumption optimization.  The default is 1,
              when this subsumption is performed only when the result of the
              load has only one use.  This subsumption is not performed if the
              number of times the result of the load is used exceeds the
              value n, a non-negative integer.

-CG:local_fwd_sched=(ON|OFF)
              Changes the instruction scheduling algorithm to work forward
              instead of backward for the instructions in each basic block.
              The default is OFF.

-CG:p2align=(ON|OFF)
              Align loop heads to 64-byte boundaries. The default is OFF.

-CG:p2align_freq=n
              Aligns branch targets based on execution frequency.  This option
              is meaningful only under feedback-directed compilation.  The
              default value n=0 turns off the alignment optimization.  Any
              other value specifies the frequency threshold at or above which
              this alignment will be performed by the compiler.

-CG:prefetch=(ON|OFF)
              Turning this OFF suppresses any generation of prefetch
              instructions in the code generator.  This has the same effect
              as -LNO:prefetch=0. The default is ON which implies using
              default prefetch algorithms.

-CG:use_movlpd=(ON|OFF)
              Makes the code generator uses the MOVLPD SSE2 instruction
              instead of MOVSD. See Opteron's instruction description for the
              difference between these two instructions.  The default is OFF.

-fb_create <prefix for feedback data files>
              Used to specify that an instrumented executable program
              is to be generated. Such an executable is suitable for
              producing feedback data files with the specified prefix
              for use in feedback-directed compilation (FDO).  The commonly
              used prefix is "fbdata".  This is OFF by default.

-fb_opt <prefix for feedback data files>
              Used to specify feedback-directed compilation (FDO) by extracting
              feedback data from files with the specified prefix, which were
              previously generated using -fb_create.  The commonly used prefix
              is "fbdata".  This optimization is off by default.

-fno-exceptions
              Tells the compiler that the program does not use exception
              handling, so it can perform more aggressive optimization in
              the code.  The generation of exception handling constructs
              is also suppressed.  Under this flag, code that uses exception
              handling cannot be guaranteed to work correctly.  Note that
              the absence of exception handling construct does not mean
              that the function can be compiled with this flag.  For
              exception handling to work preperly, the scopes
              crossed between throwing and catching an exception must
              all have been compiled with exceptions on.

-fno-math-errno
              Do not set ERRNO after calling math functions that are executed
              with a single instruction, e.g., sqrt.  A program that relies
              on IEEE exceptions for math error handling may want to use this
              flag for speed while maintaining IEEE arithmetic compatibility.
              This is implied by -Ofast.  The default is -fmath-errno.

-INLINE:aggressive=(ON|OFF)
              Tells the compiler to be more aggressive about inlining.  The
              default is -INLINE:aggressive=OFF.

-IPA[:...]
              IPA option group:  control the inter-procedural analyses and
              transformations performed.  Note that giving just the group
              name without any options, i.e., -IPA, will invoke the
              interprocedural analyzer.  -IPA is off by default unless
              -Ofast is specified.

-ipa          Same as -IPA alone.

-IPA:callee_limit=(n)
              Functions whose size exceeds this limit will never be
              automatically inlined by the compiler.  The default is n=2000.

-IPA:ctype=(ON|OFF)
              Turns on optimizations that speed up interfaces to the constructs
              defined in ctype.h by assuming that the program will not be run
              in a multi-threaded environment.  The default is OFF.

-IPA:field_reorder=(ON|OFF)
              Enables the re-ordering of fields in large structs based
              on their reference patterns in feedback compilation to
              minimize data cache misses. The default is OFF.

-IPA:linear=(ON|OFF)
              Sets linearization of array references.  setting can be ON
              or OFF.  When inlining Fortran subroutines, IPA tries to map
              formal array parameters to the shape of the actual
              parameter.  It may not always be able to always map it. In
              the case that it cannot map the parameter, it linearizes the
              array reference. By default, it will not inline such
              callsites because they may cause performance problems.  The
              default is OFF.

-IPA:plimit=(n)
              Inline calls to a procedure until the procedure has grown to
              size of n.  The default is 2500.

-IPA:pu_reorder=(0|1|2)
              Controls the phase that optimizes the layout of the program
              units (functions) in the program.
              0 = Disables procedure reordering (default)
              1 = Reorder based on the frequency in which different
		  procedures are invoked.
              2 = Reorder based on caller-callee relationship.

-IPA:space=(n)
              Inline until a program expansion of n% is reached.  This defaults
              to 100.

-L/opt/acml2.1.0/gnu64/lib -lacml -lg2c
              The flags above are needed to use the PathScale compiler to link
              with the ACML (AMD Core Math Library) 2.0 library.  libg2c is the
              g77 runtime library needed to link with the gnu-compiled, 64-bit
              version of ACML that gets installed at /opt/acml2.0/gnu64
              by default. ACML is available as a free download from
              www.developwithamd.com/acml.  libg2c is in the g77 RPMs that ship
              with Linux distributions.

-LNO:
              option group specifies options and transformations performed
              on loop nests.  The -LNO: option group is enabled only if the -O3
              option is also specified on the compiler command line.

-LNO:blocking[=(ON|OFF)]
              Enable/disable the cache blocking transformation.  The default
              is on at -O3 or higher.

-LNO:fission=(0|1|2)
              This option controls loop fission. The options can be one of the
              following:
              0 = Disables loop fission (default)
              1 = Performs normal fission as necessary
              2 = Specifies that fission be tried before fusion

              If -LNO:fission=1:fusion=1 or -LNO:fission=2:fusion=2 are
              specified, then fusion is performed.

-LNO:full_unroll,fu=N
              Fully unroll innermost loops with trip_count <= N inside LNO.
              N can be any integer between 0 and 100.  The default value for N
              is 5.  Setting this flag to 0 disables full unrolling of small
              trip count loops inside LNO.

-LNO:full_unroll_size=N
              Fully unroll innermost loops with unrolled loop size <= N inside
              LNO.  N can be any integer between 0 and 10000. The conditions
              implied by the full_unroll option must also be satisfied for
              the loop to be fully unrolled. The default value for N is 1600.

-LNO:full_unroll_outer=(ON|OFF)
              Fully unroll outer innermost loops (i.e.stand-alone loops not
              belonging to any loop nest) with known trip count.  The
              conditions implied by both the full_unroll and the
              full_unroll_size options must be satisfied for the loop to be
              fully unrolled. The default is OFF.

-LNO:fusion=n
              Perform loop fusion, n: 0 - off, 1 - conservative,
              2 - aggressive.  The default is 1.

-LNO:prefetch[=(0|1|2|3)]
              Specify level of prefetching.
                   0 = Prefetch disabled.
                   1 = Prefetch is done only for arrays that are always
                       referenced in each iteration of a loop, the default.
                   2 = Prefetch is done without the above restrictions.
                   3 = Most aggressive.

-LNO:prefetch_ahead=n
              Prefetch  n  cache line(s) ahead.  The default is 2.

-m32
              Generates code according to the 32-bit ABI, also known as x86
              or IA32.

-m3dnow
              Enable use of 3DNow instructions. The default is OFF.

-O or -O2
              Turn on extensive optimization.  The optimizations at this level
              are generally conservative, in the sense that they (1) are
              virtually always beneficial, (2) provide improvements
              commensurate to the compile time spent to achieve them,
              and (3) avoid changes which affect such things as floating
              point accuracy.

-O3
              Turn on aggressive optimization.  The optimizations at this level
              are distinguished from -O2 by their aggressiveness, generally
              seeking highest-quality generated code even if it requires
              extensive compile time.  They may include optimizations which are
              generally beneficial but occasionally hurt performance.
              This includes but is not limited to turning on the Loop Nest
              Optimizer, -LNO:opt=1, and setting
              -OPT:ro=1:IEEE_arith=2:Olimit=9000.

-Ofast
              Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno."
              -OPT:Ofast is described below.

-OPT:alias=<name>
              Specifies the pointer aliasing model to be used.  By
              specifiying one or more of the following for <name>, the
              compiler is able to make assumptions throughout the compilation:

              typed        Assume that the code adheres to the ANSI/ISO C
                           standard which states that two pointers of different
                           types cannot point to the same location in memory.
                           This is on by default when -Ofast is specified.

              restrict     Specifies that distinct pointers are assumed
                           to point to distinct, non-overlapping objects.
                           This is off by default.

              disjoint     Specifies that any two pointer expressions are
                           assumed to point to distinct, non-overlapping
                           objects.  This is off by default.

-OPT:div_split=(ON|OFF)
              Enable/disable changing x/y into x*(recip(y)).  This is
              OFF by default but is enabled by -OPT:Ofast or
              -OPT:IEEE_arithmetic=3.

-OPT:fast_complex=(ON|OFF)
              Setting fast_complex=ON enables fast calculations for values
              declared to be of type complex.  When this is set to ON,
              complex absolute value (norm) and complex division use fast
              algorithms that are more likely to overflow or underflow than
              the standard algorithms.  OFF is the default.  fast_complex=ON
              is enabled if -OPT:roundoff=3 is in effect.

-OPT:goto=(OFF|ON)
              Disable/enable the conversion of GOTOs into higher level
              structures like FOR loops.  The default is ON for -O2 or higher.

-OPT:IEEE_arithmetic,IEEE_arith=(n)
              specify level of conformance to IEEE 754 floating pointing
              roundoff/overflow behavior.  n can be one of the following:

              1   Adheres to IEEE accuracy.  This is the default when
                  optimization levels -O0, -O1 and -O2 are in effect.

              2.  May produce inexact result not conforming to IEEE 754.
                  This is the default when -O3 is in effect.

              3.  All mathematically valid transformations are allowed.

-OPT:IEEE_NaN_Inf=(ON|OFF)
              OFF specifies non-IEEE-754 results in operations that might
              have IEEE 754 NaN or infinity operands; this enables many
              optimizations which would be invalid for NaN or infinity
              operands.  The default is ON.

-OPT:transform_to_memlib=(ON|OFF)
              When ON, this option enables transformation of loop constructs
              to calls to memcpy or memset. Default is ON when target
              processor is EM64T, OFF otherwise.

-OPT:Ofast
              Use optimizations selected to maximize performance.
              Although the optimizations are generally safe,
              they may affect floating point accuracy due to rearrangement
              of computations.  This effectively turns on the following
              optimizations:
                -OPT:ro=2:Olimit=0:div_split=ON:alias=typed -TARG:msse2=on

-OPT:Olimit=(n)
              Disable optimization when size of program unit is > n. When n
              is 0, program unit size is ignored and optimization process
              will not be disabled due to compile time limit.  The default is
              0 when -Ofast is specified, otherwise the default is 6000
              under -O2 and 9000 under -O3.

-OPT:roundoff,ro=(n)
              Specifies the level of acceptable departure from source
              language floating-point, round-off, and overflow semantics. n
              can be one of the following:

              0   Inhibits optimizations that might affect the
                  floating-point behavior.  This is the default when
                  optimization levels -O0, -O1, and -O2 are in effect.

              1   Allows simple transformations that might cause limited
                  round-off or overflow differences.  Compounding such
                  transformations could have more extensive effects.
                  This is the default level when -O3 is in effect.

              2   Allows more extensive transformations, such as the
                  reordering of reduction loops.  This is the default
                  level when -Ofast is specified.

              3   Enables any mathematically valid transformation.

-OPT:treeheight=(ON|OFF)
              The value ON turns on re-association in expressions to reduce
              the expressions' tree height.  The default value is OFF.

-OPT:unroll_times_max,unroll_times=(n)
              Unroll inner loops by a maximum of  n.  The default is 4.

-OPT:unroll_size=(n)
              Sets the ceiling of maximum number of instructions for an
              unrolled inner loop. If n = 0, the ceiling is disregarded.

-TENV:X=(0|1|2|3|4)
              Specify the level of enabled exceptions that will be assumed
              for purposes of performing speculative code motion (default
              is 1 at all optimization levels).  In general, an instruction
              will not be speculated (i.e. moved above a branch by the
              optimizer) unless any exceptions it might cause are disabled
              by this option.  At level 0, no speculative code motion may
              be performed.  At level 1, safe speculative code motion may
              be performed, with IEEE-754 underflow and inexact exceptions
              disabled.  At level 2, all IEEE-754 exceptions are disabled
              except divide by zero.  At level 3, all IEEE-754 exceptions
              are disabled including divide by zero.  At level 4, memory
              exceptions may be disabled or ignored.

-TENV:frame_pointer=(ON|OFF)
              Default is ON for C++ and OFF otherwise.
              Local variables in the function stack frame are addressed via
              the frame pointer register.  Ordinarily, the compiler will
              replace this use of frame pointer by addressing local variables
              via the stack pointer when it determines that the stack pointer
              is fixed throughout the function invocation.  This frees up the
              frame pointer for other purposes.  Turning this flag on forces
              the compiler to use the frame pointer to address local variables.
              This flag defaults to on for C++ because the exception handling
              mechanism relies on the frame pointer register being used to
              address local variables.  This flag can be turned off for C++
              for programs that do not throw exceptions.

-WOPT:aggstr=(ON|OFF) 
              ON instructs the scalar optimizer to perform aggressive strength
              reduction, in which all induction expressions within a loop are
              replaced by temporaries that are incremented together with
              the loop variable.  When OFF, strength reduction is only
              performed for non-trivial induction expressions.  Turning this
              off sometimes can improve performance when registers are scarce.

-WOPT:mem_opnds=(ON|OFF)
              ON makes the scalar optimizer preserve any memory operands of
              arithmetic operations so as to promote subsumption of memory
              loads into the operands of arithmetic operations.  The default
              is OFF.

-WOPT:retype_expr=(ON|OFF)
              ON enables the optimization in the compiler that converts 64-bit
              address computation to use 32-bit arithmetic as much as
              possible.  The default is OFF.

-WOPT:val=(0|1|2)
              Controls the number of times the value-numbering optimization is
              performed in the global optimizer, with the  default  being  1.
              This optimization tries to recognize expressions that will
              compute identical run-time values and changes the program to
              avoid re-computing them.


Additional Remarks
==================

BIOS Setting Definitions
  DRAM Interleave
    This entry defines whether or not data will be interleaved among the four
    data banks within individual DRAMs.

  Node Memory Interleave
    This entry defines whether or not data addresses will be alternating
    between both processors in 4KB blocks.

  ACPI SRAT Table
    This entry defines whether or not the Static Resource Allocation Table is
    exported by the BIOS to a location where the operating system can see it.
    The SRAT may only be exported when Node Interleave is disabled.