########################################################################
#07/20/2005
#Acer Incorporated.
#SPEC CPU2000 v1.2 Flag Descriptions
#PathScale EKOPath(TM) Compiler Suite v2.1
#
########################################################################

PathScale EKOPath(TM) Compiler Suite (Fortran, C and C++ compilers)
flag descriptions, for SPEC CPU2000 submissions.

Portability Flags:

-DSPEC_CPU2000_LP64 Compile using LP64 programming model.
-DLINUX_i386 Linux Intel system, use "long long" as
64bit variable.
-DHAS_ERRLIST Prog env provides specification for
"sys_errlist[]".
-DFMAX_IS_DOUBLE Specifies whether FMAX is double or float.
-DNDEBUG Defining this disables any assert macros used for
debugging.
-DSPEC_CPU2000_NEED_BOOL Use SPEC provided definition of the boolean type.
-DSPEC_CPU2000_LINUX_I386 Compile for an I386 system running Linux.
-DPSEC_CPU2000_GLIBC22 Compatibility with 2.2 & later versions of glibc
-DSYS_IS_USG Specifies that the operating system is
USG compliant.
-DSYS_HAS_TIME_PROTO Do not explicitly declare time().
-DSYS_HAS_SIGNAL_PROTO Do not explicitly #include <signal.h>
-DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl().
-DSYS_HAS_ANSI System is ANSI compliant.
-DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc().
-fixedform tells f90 compiler to use fixed format
(F77 72 column format), instead of F90 free format.
-DHAVE_SIGNED_CHAR System supports a "signed char" type.
-DWANT_STDC_PROTO Use function prototypes as in standard C.

Optimization Flags:

Some suboptions either enable or disable the feature. To enable a feature,
either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling
a feature, is accomplished by adding =0, =OFF, or =FALSE. These values are
insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate
the enabling or disabling of a feature.

-CG[:...]
Code Generation option group: control the optimizations
and transformations of the instruction-level code generator.

-CG:cflow=(ON|OFF)
A value of OFF disables control flow optimization in the code
generation. Default is ON.

-CG:gcm=(ON|OFF)
Specifying OFF disables the instruction-level global code
motion optimization phase. The default is ON.

-CG:load_exe=n
Specifies the threshold for subsuming a memory load operation into
the operand of an arithmetic instruction. The value of 0 turns
off this subsumption optimization. The default is 1, when this
subsumption is performed only when the result of the load has only
one use. This subsumption is not performed if the number of times
the result of the load is used exceeds the value n, a non-negative
integer.

-CG:local_fwd_sched=(ON|OFF)
Changes the instruction scheduling algorithm to work forward
instead of backward for the instructions in each basic block.
The default is OFF.

-CG:p2align=(ON|OFF)
Align loop heads to 64-byte boundaries. The default is
OFF.

-CG:p2align_freq=n
Aligns branch targets based on execution frequency. This option
is meaningful only under feedback-directed compilation. The
default value n=0 turns off the alignment optimization. Any
other value specifies the frequency threshold at or above which
this alignment will be performed by the compiler.

-CG:prefetch=(ON|OFF)
Turning this OFF suppresses any generation of prefetch instructions
in the code generator. This has the same effect as -LNO:prefetch=0.
The default is ON which implies using default prefetch algorithms.

-CG:prefetchnta=(ON|OFF)
Prefetch when data is non-temporal at all levels of the cache
hierarchy. This is for data streaming situations in which the
data will not need to be re-used soon. The default is OFF.

-fb_create <prefix for feedback data files>
Used to specify that an instrumented executable program
is to be generated. Such an executable is suitable for
producing feedback data files with the specified prefix
for use in feedback-directed compilation (FDO). The commonly
used prefix is "fbdata". This is OFF by default.

-fb_opt <prefix for feedback data files>
Used to specify feedback-directed compilation (FDO) by extracting
feedback data from files with the specified prefix, which were
previously generated using -fb_create. The commonly used prefix
is "fbdata". This optimization is off by default.

-fb_phase=(0,1,2,3,4)
Used to specify the compilation phase at which instrumentation
for the collection of profile data is performed, so is useful
only when used with -fb_create. The values must be in the
range 0 to 4. The default value is 0, and specifies the
earliest phase for instrumentation, which is after the
front-end processing.

-fno-exceptions
Tells the compiler that the program does not use exception
handling, so it can perform more aggressive optimization in
the code. The generation of exception handling constructs
is also suppressed. Under this flag, code that uses exception
handling cannot be guaranteed to work correctly. Note that
the absence of exception handling construct does not mean
that the function can be compiled with this flag. For
exception handling to work preperly, the scopes
crossed between throwing and catching an exception must
all have been compiled with exceptions on.

-fno-math-errno
Do not set ERRNO after calling math functions that are executed
with a single instruction, e.g., sqrt. A program that relies
on IEEE exceptions for math error handling may want to use this
flag for speed while maintaining IEEE arithmetic compatibility.
This is implied by -Ofast. The default is -fmath-errno.

-GRA:home=(ON|OFF)
Turns on or off the rematerialization optimization for non-local
user variables in the Global Register Allocator. The default
value is ON.

-INLINE:aggressive=(ON|OFF)
Tells the compiler to be more aggressive about inlining. The
default is -INLINE:aggressive=OFF.

-IPA[:...]
IPA option group: control the inter-procedural analyses and
transformations performed. Note that giving just the group name
without any options, i.e., -IPA, will invoke the interprocedural
analyzer. -IPA is off by default unless -Ofast is specified.

-ipa Same as -IPA alone.

-IPA:callee_limit=(n)
Functions whose size exceeds this limit will never be
automatically inlined by the compiler. The default is n=2000.

-IPA:ctype=(ON|OFF)
Turns on optimizations that speed up interfaces to the constructs
defined in ctype.h by assuming that the program will not be run
in a multi-threaded environment. The default is OFF.

-IPA:field_reorder=(ON|OFF)
Enables the re-ordering of fields in large structs based
on their reference patterns in feedback compilation to
minimize data cache misses. The default is OFF.

-IPA:linear=(ON|OFF)
Controls conversion of a multi-dimensional array to a single
dimensional (linear) array that covers the same block of memory.
When inlining Fortran subroutines, IPA tries to map formal
array parameters to the shape of the actual parameter. In the
case that it cannot map the parameter, it linearizes the array
reference. By default, IPA will not inline such callsites
because they may cause performance problems. The default is OFF.

-IPA:min_hotness=(n)
When feedback information is available, a call site to a
procedure must be invoked with a frequency that exceeds
the threshold specified by n before the procedure
will be inlined at that call site. The default is n=10.

-IPA:plimit=(n)
Inline calls to a procedure until the procedure has grown to
size of n. The default is 2500.

-IPA:pu_reorder=(0|1|2)
Controls the phase that optimizes the layout of the program
units (functions) in the program.
0 = Disables procedure reordering (default)
1 = Reorder based on the frequency in which different
procedures are invoked.
2 = Reorder based on caller-callee relationship.

-IPA:small_pu=(n)
A procedure with size smaller than n is not subjected to the
plimit restriction.The default is n=30

-IPA:space=(n)
Inline until a program expansion of n% is reached. This defaults
to 100.

-IPA:use_intrinsic[=(ON|OFF)]
Enable/disable loading the intrinsic version of standard library
functions. The default is OFF.

-L/opt/acml2.5.1/pathscale64/lib -lacml
The flags above are needed to use the PathScale compiler to link
with the ACML (AMD Core Math Library) 2.5.1 library. The
PathScale-compiled, 64-bit version of ACML that gets installed
at /opt/acml2.0/gnu64 by default. ACML is available as a free
download from http://www.developwithamd.com/acml.

-LANG:short_circuit_conditionals=(ON|OFF)
Handles .AND. and .OR. via short-circuiting, in which the
second operand is not evaluated if unnecessary, even if it
contains side effects. Applies only to Fortran. Default is ON.

-LNO:
option group specifies options and transformations performed
on loop nests. The -LNO: option group is enabled only if the -O3
option is also specified on the compiler command line.

-LNO:blocking[=(ON|OFF)]
Enable/disable the cache blocking transformation. The default
is on at -O3 or higher.

-LNO:fission=(0|1|2)
This option controls loop fission. The options can be one of the
following:

0 = Disables loop fission (default)

1 = Performs normal fission as necessary

2 = Specifies that fission be tried before fusion

If -LNO:fission=1:fusion=1 or -LNO:fission=2:fusion=2 are spec-
ified, then fusion is performed.

-LNO:full_unroll,fu=N
Fully unroll innermost loops with trip_count <= N inside LNO.
N can be any integer between 0 and 100. The default value for N
is 5. Setting this flag to 0 disables full unrolling of small
trip count loops inside LNO.

-LNO:full_unroll_size=N
Fully unroll innermost loops with unrolled loop size <= N inside
LNO. N can be any integer between 0 and 10000. The conditions
implied by the full_unroll option must also be satisfied for
the loop to be fully unrolled. The default value for N is 1600.

-LNO:full_unroll_outer=(ON|OFF)
Control the full unrolling of loops with known trip count that
do not contain a loop and are not contained in a loop. The
conditions implied by both the full_unroll and the
full_unroll_size options must be satisfied for the loop to be
fully unrolled. The default is OFF.

-LNO:fusion=n
Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive.
The default is 1.

-LNO:interchange[=(ON|OFF)]
Specifying OFF disables the loop interchange transformation in
the loop nest optimizer. Default is ON.

-LNO:opt=n
This option controls the LNO optimization level. n can be one of
the following:
0 = Disables nearly all loop nest optimizations
1 = Performs full loop nest transformations (default)

-LNO:ou_prod_max=n
Indicates that the product of unrolling of the various outer
loops in a given loop nest is not to exceed n, where n is a
positive integer. The default is 16.

-LNO:outer_unroll_max,ou_max=(n)
Outer_unroll_max indicates that the compiler may unroll outer
loops in a loop nest by as many as n per loop, but no more.
The default is 4.

-LNO:prefetch[=(0|1|2|3)]
Specify level of prefetching.
0 = Prefetch disabled.
1 = Prefetch is done only for arrays that are always
referenced in each iteration of a loop, the default.
2 = Prefetch is done without the above restrictions.
3 = Most aggressive.

-LNO:prefetch_ahead=n
Prefetch n cache line(s) ahead. The default is 2.

-LNO:sclrze=(ON|OFF)
Turns on/off the optimization that replaces an array by a scalar
variable. The default is ON.

-m32
Generates code according to the 32-bit ABI, also known as x86
or IA32.

-m3dnow Enable use of 3DNow instructions. The default is OFF.

-msse2
Enable SSE2 extension. This is the default under -m64 or
-OPT:Ofast. Under -m32, the default is -mno-sse2.

-O or -O2
Turn on extensive optimization. The optimizations at this level are
generally conservative, in the sense that they (1) are virtually
always beneficial, (2) provide improvements commensurate to the
compile time spent to achieve them, and (3) avoid changes which
affect such things as floating point accuracy.

-O3
Turn on aggressive optimization. The optimizations at this level
are distinguished from -O2 by their aggressiveness, generally
seeking highest-quality generated code even if it requires extensive
compile time. They may include optimizations which are generally
beneficial but occasionally hurt performance. This includes but
is not limited to turning on the Loop Nest Optimizer, -LNO:opt=1,
and setting -OPT:ro=1:IEEE_arith=2:Olimit=9000.

-Ofast Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno." -OPT:Ofast is
described below.

-OPT:alias=<name>
Specifies the pointer aliasing model to be used. By
specifiying one or more of the following for <name>, the
compiler is able to make assumptions throughout the compilation:
typed Assume that the code adheres to the ANSI/ISO C
standard which states that two pointers of different
types cannot point to the same location in memory.
This is on by default when -Ofast is specified.

restrict Specifies that distinct pointers are assumed
to point to distinct, non-overlapping objects.
This is off by default.

disjoint Specifies that any two pointer expressions are
assumed to point to distinct, non-overlapping objects.
This is off by default.

-OPT:div_split=(ON|OFF)
Enable/disable changing x/y into x*(recip(y)). This is
OFF by default but is enabled by -OPT:Ofast or
-OPT:IEEE_arithmetic=3.

-OPT:fast_complex=(ON|OFF)
Setting fast_complex=ON enables fast calculations for values
declared to be of type complex. When this is set to ON,
complex absolute value (norm) and complex division use fast
algorithms that are more likely to overflow or underflow than
the standard algorithms. OFF is the default. fast_complex=ON
is enabled if -OPT:roundoff=3 is in effect.

-OPT:goto=(OFF|ON)
Disable/enable the conversion of GOTOs into higher level
structures like FOR loops. The default is ON for -O2 or higher.

-OPT:IEEE_arithmetic,IEEE_arith=(n)
specify level of conformance to IEEE 754 floating pointing
roundoff/overflow behavior. n can be one of the following:

1 Adheres to IEEE accuracy. This is the default when
optimization levels -O0, -O1 and -O2 are in effect.

2. May produce inexact result not conforming to IEEE 754.
This is the default when -O3 is in effect.

3. All mathematically valid transformations are allowed.

-OPT:IEEE_NaN_Inf=(ON|OFF)
OFF specifies non-IEEE-754 results in operations that might
have IEEE 754 NaN or infinity operands; this enables many
optimizations which would be invalid for NaN or infinity
operands. The default is ON.

-OPT:transform_to_memlib=(ON|OFF)
When ON, this option enables transformation of loop constructs
to calls to memcpy or memset. Default is ON when target
processor is EM64T, OFF otherwise.

-OPT:Ofast
Use optimizations selected to maximize performance.
Although the optimizations are generally safe,
they may affect floating point accuracy due to rearrangement
of computations. This effectively turns on the following
optimizations:
-OPT:ro=2:Olimit=0:div_split=ON:alias=typed -TARG:msse2=on

-OPT:Olimit=(n)
Disable optimization when size of program unit is > n. When n
is 0, program unit size is ignored and optimization process
will not be disabled due to compile time limit. The default is
0 when -Ofast is specified, otherwise the default is 6000
under -O2 and 9000 under -O3.

-OPT:roundoff,ro=(n)
Specifies the level of acceptable departure from source
language floating-point, round-off, and overflow semantics. n
can be one of the following:

0 Inhibits optimizations that might affect the
floating-point behavior. This is the default when
optimization levels -O0, -O1, and -O2 are in effect.

1 Allows simple transformations that might cause limited
round-off or overflow differences. Compounding such
transformations could have more extensive effects.
This is the default level when -O3 is in effect.

2 Allows more extensive transformations, such as the
reordering of reduction loops. This is the default
level when -Ofast is specified.

3 Enables any mathematically valid transformation.

-OPT:treeheight=(ON|OFF)
The value ON turns on re-association in expressions to reduce
the expressions' tree height. The default value is OFF.

-OPT:unroll_analysis=(ON|OFF)
The default value of ON lets the compiler analyze the
content of the loop to determine the best unrolling
parameters, instead of strictly adhering to the
-OPT:unroll_times_max and -OPT:unroll_size parameters.

-OPT:unroll_times_max,unroll_times=(n)
Unroll inner loops by a maximum of n. The default is 4.

-OPT:unroll_size=(n)
Sets the ceiling of maximum number of instructions for an
unrolled inner loop. If n = 0, the ceiling is disregarded.

-static
Suppresses dynamic linking at run-time for shared libraries;
uses static linking instead.

-TARG:msse2[=(ON|OFF)]
ON enables the use of scalar floating point instructions
present in the SSE instruction set. The default is ON
for the 64-bit ABI, and OFF for the 32-bit ABI.

-TENV:X=(0|1|2|3|4)
Specify the level of enabled exceptions that will be assumed
for purposes of performing speculative code motion (default
is 1 at all optimization levels). In general, an instruction
will not be speculated (i.e. moved above a branch by the
optimizer) unless any exceptions it might cause are disabled
by this option. At level 0, no speculative code motion may
be performed. At level 1, safe speculative code motion may
be performed, with IEEE-754 underflow and inexact exceptions
disabled. At level 2, all IEEE-754 exceptions are disabled
except divide by zero. At level 3, all IEEE-754 exceptions
are disabled including divide by zero. At level 4, memory
exceptions may be disabled or ignored.

-TENV:frame_pointer=(ON|OFF)
Default is ON for C++ and OFF otherwise.
Local variables in the function stack frame are addressed via
the frame pointer register. Ordinarily, the compiler will
replace this use of frame pointer by addressing local variables
via the stack pointer when it determines that the stack pointer
is fixed throughout the function invocation. This frees up the
frame pointer for other purposes. Turning this flag on forces
the compiler to use the frame pointer to address local variables.
This flag defaults to on for C++ because the exception handling
mechanism relies on the frame pointer register being used to
address local variables. This flag can be turned off for C++
for programs that do not throw exceptions.

-Wl,-x
Passes the -x option to the linker. With this flag set, the
linker will not preserve local (non-global) symbols in the output
symbol table. The linker enters external and static symbols
only. This option conserves space in the output file. This is
OFF by default.

-WOPT:aggstr=(ON|OFF)
ON instructs the scalar optimizer to perform aggressive strength
reduction, in which all induction expressions within a loop are
replaced by temporaries that are incremented together with
the loop variable. When OFF, strength reduction is only
performed for non-trivial induction expressions. Turning this
off sometimes can improve performance when registers are scarce.

-WOPT:if_conv=(ON|OFF)
Enables the translation of simple IF statements to condi-
tional move instructions in the target CPU. Default is
ON.

-WOPT:mem_opnds=(ON|OFF)
ON makes the scalar optimizer preserve any memory operands of
arithmetic operations so as to promote subsumption of memory
loads into the operands of arithmetic operations. The default
is OFF.

-WOPT:retype_expr=(ON|OFF)
ON enables the optimization in the compiler that converts 64-bit
address computation to use 32-bit arithmetic as much as
possible. The default is OFF.

-WOPT:val=(0|1|2)
Controls the number of times the value-numbering optimization is
performed in the global optimizer, with the default being 1.
This optimization tries to recognize expressions that will
compute identical run-time values and changes the program to avoid
re-computing them.

----------------------------------------------------------------------------
Other Notes
----------------------------------------------------------------------------

/usr/bin/taskset [options] [mask] [pid | command [arg] ... ]

taskset is used to set or retreive the CPU affinity of a running process given its
PID or to launch a new COMMAND with a given CPU affinity. The CPU affinity is
represented as a bitmask, with the lowest order bit corresponding to the first logical
CPU and highest order bit corresponding to the last logical CPU.
When the taskset returns, it is gauranteed that the given program has been scheduled to
a legal CPU.
The default behaviour of taskset is to run a new command with a given affinity mask:
taskset [mask] [command] [arguments]

The taskset command is used in the following form in the config file:

submit= MYMASK=`printf '0x%x' \$((1<<\$SPECUSERNUM))`; /usr/bin/taskset \$MYMASK $command

$MYMASK is the bitmask (in hexadecimal) corresponding to a specific SPECUSERNUM. For example, $MYMASK value for the first copy
of a rate run will be 0x00000001, for the second copy of the rate will be 0x00000002
etc. Thus, the first copy of the rate run will have a CPU affinity of CPU0, the second copy will have the
affinity CPU1 etc.