PathScale EKOPath Compiler Suite 2.4

PathScale EKOPath(TM) Compiler Suite (Fortran, C and C++ compilers)
flag descriptions, for SPEC CPU2000 submissions.

Portability Flags:

-DSPEC_CPU2000_LP64 Compile using LP64 programming model.
-DLINUX_i386 Linux Intel system, use "long long" as
64bit variable.
-DHAS_ERRLIST Prog env provides specification for
"sys_errlist[]".
-DSPEC_CPU2000_NEED_BOOL Use SPEC provided definition of the boolean type.
-DSPEC_CPU2000_LINUX_I386 Compile for an I386 system running Linux.
-DSYS_IS_USG Specifies that the operating system is
USG compliant.
-DSYS_HAS_TIME_PROTO Do not explicitly declare time().
-DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl().
-DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc().
-fixedform tells f90 compiler to use fixed format
(F77 72 column format), instead of F90 free format.

Optimization Flags:

Some suboptions either enable or disable the feature. To enable a feature,
either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling
a feature, is accomplished by adding =0, =OFF, or =FALSE. These values are
insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate
the enabling or disabling of a feature.

-CG[:...]
Code Generation option group: control the optimizations
and transformations of the instruction-level code generator.

-CG:cflow=(ON|OFF)
A value of OFF disables control flow optimization in the code
generation. Default is ON.

-CG:cse_regs=N
When performing common subexpression elimination during code
generation, assume there are N extra integer registers available
over the number provided by the CPU. N can be positive, zero, or
negative. The default is positive infinity. See also -CG:sse_cse_regs.

-CG:gcm=(ON|OFF)
Specifying OFF disables the instruction-level global code
motion optimization phase. The default is ON.

-CG:load_exe=n
Specifies the threshold for subsuming a memory load operation into
the operand of an arithmetic instruction. The value of 0 turns
off this subsumption optimization. The default is 1, when this
subsumption is performed only when the result of the load has only
one use. This subsumption is not performed if the number of times
the result of the load is used exceeds the value n, a non-negative
integer. If the ABI is 64-bit and the language is Fortran, the default
for n is 2, otherwise the default is 1. See also -CG:sse_load_exe.

-CG:local_fwd_sched=(ON|OFF)
Changes the instruction scheduling algorithm to work forward
instead of backward for the instructions in each basic block.
The default is OFF for 64-bit ABI, and ON for 32-bit ABI.

-CG:movnti=N
Convert ordinary stores to non-temporal stores when writing memory
blocks of size larger than N KB. When N is set to 0, this
transformation is avoided. The default value is 120 (KB).

-CG:p2align=(ON|OFF)
Align loop heads to 64-byte boundaries. The default is
OFF.

-CG:p2align_freq=n
Aligns branch targets based on execution frequency. This option
is meaningful only under feedback-directed compilation. The
default value n=0 turns off the alignment optimization. Any
other value specifies the frequency threshold at or above which
this alignment will be performed by the compiler.

-CG:prefer_legacy_regs=(ON|OFF)
Tell the local register allocator to use the first 8 integer and
SSE registers whenever possible (%rax-%rbp, %xmm0-%xmm7). Instructions
using these registers have smaller instruction sizes. The default
is OFF.

-CG:prefetch=(ON|OFF)
Turning this OFF suppresses any generation of prefetch instructions
in the code generator. This has the same effect as -LNO:prefetch=0.
The default is ON which implies using default prefetch algorithms.

-CG:sse_cse_regs=N
When performing common subexpression elimination during code generation,
assume there are N extra SSE registers available over the number
provided by the CPU. N can be positive, zero, or negative. The default
is positive infinity. See also -CG:cse_regs.

-CG:sse_load_exe=N
This is similar to -CG:load_exe except that this only affects memory
loads to the SSE co-processor. The default is 0. A memory load to the
SSE is subsumed into an arithmetic instruction if it satisifies
either the -CG:sse_load_exe or the -CG:load_exe condition.

-CG:use_prefetchnta=(ON|OFF)
Prefetch when data is non-temporal at all levels of the cache
hierarchy. This is for data streaming situations in which the
data will not need to be re-used soon. The default is OFF.

-CG:use_test=(ON|OFF)
Make the code generator use the TEST instruction instead of CMP. See
Opteron's instruction description for the difference between these two
instructions. The default is OFF.

-fb_create <prefix for feedback data files>
Used to specify that an instrumented executable program
is to be generated. Such an executable is suitable for
producing feedback data files with the specified prefix
for use in feedback-directed compilation (FDO). The commonly
used prefix is "fbdata". This is OFF by default.

-fb_opt <prefix for feedback data files>
Used to specify feedback-directed compilation (FDO) by extracting
feedback data from files with the specified prefix, which were
previously generated using -fb_create. The commonly used prefix
is "fbdata". This optimization is off by default.

-fno-exceptions
Tells the compiler that the program does not use exception
handling, so it can perform more aggressive optimization in
the code. The generation of exception handling constructs
is also suppressed. Under this flag, code that uses exception
handling cannot be guaranteed to work correctly. Note that
the absence of exception handling construct does not mean
that the function can be compiled with this flag. For
exception handling to work preperly, the scopes
crossed between throwing and catching an exception must
all have been compiled with exceptions on.

-fixedform
(For Fortran only) Treat all input source files, regardless of
suffix, as if they were written in fixed source form (f77 72-column
format), instead of F90 free format. By default, only input files
suffixed with .f or .F are assumed to be written in fixed source form.

-fno-math-errno
Do not set ERRNO after calling math functions that are executed
with a single instruction, e.g., sqrt. A program that relies
on IEEE exceptions for math error handling may want to use this
flag for speed while maintaining IEEE arithmetic compatibility.
This is implied by -Ofast. The default is -fmath-errno.

-GRA:optimize_boundary=(ON|OFF)
Allow the Global Register Allocator to allocate the same
register to different variables in the same basic-block.
Default is OFF.

-INLINE:aggressive=(ON|OFF)
Tells the compiler to be more aggressive about inlining. The
default is -INLINE:aggressive=OFF.

-IPA: ...
The inter-procedural analyzer option group controls application of
inter-procedural analysis and optimization, including inlining,
constant propagation, common block array padding, dead function
elimination, alias analysis, and others. Specify -IPA by itself
to invoke the inter-procedural analysis phase with default options.
If you compile and link in distinct steps, you must specify at least
-IPA for the compile step, and specify -IPA and the individual
options in the group for the link step. If you specify -IPA for the
compile step, and do not specify -IPA for the link step, you will
receive an error.

-IPA[:...]
IPA option group: control the inter-procedural analyses and
transformations performed. Note that giving just the group name
without any options, i.e., -IPA, will invoke the interprocedural
analyzer. -IPA is off by default unless -Ofast is specified.

-ipa Same as -IPA alone.

-IPA:addressing=(ON|OFF)
Invoke the analysis of address operator usage. The default is Off.
-IPA:alias=ON is a prerequisite for this option.

-IPA:aggr_cprop=(ON|OFF)
Enable or disable aggressive inter-procedural constant propagation.
Setting can be ON or OFF. This attempts to avoid passing constant
parameters, replacing the corresponding formal parameters by the
constant values. Less aggressive inter-procedural constant
propagation is done by default. The default setting is ON.

-IPA:alias=(ON|OFF)
Invoke alias/mod/ref analysis. The default is ON.

-IPA:callee_limit=N
Functions whose size exceeds this limit will never be automatically
inlined by the compiler. The default is 500.

-IPA:cgi=(ON|OFF)
Invoke constant global variable identification. This option marks non-
scalar global variables that are never modified as constant, and
propagates their constant values to all files. Default is ON.

-IPA:common_pad_size=N
This specifies the amount by which to pad common block array dimensions.
By default, an amount is automatically chosen that will improve cache
behavior for common block array accesses.

-IPA:cprop=(ON|OFF)
Turn on or off inter-procedural constant propagation. This option
identifies the formal parameters that always have a specific constant
value. Default is ON. See also -IPA:aggr_cprop.

-IPA:ctype=(ON|OFF)
When ON, causes the compiler to generate faster versions of the <ctype.h>
macros such as isalpha, isascii, etc. This flag is unsafe both in
multi-threaded programs and in all locales other than the 7-bit ASCII
(or "C") locale. The default is OFF. Do not turn this on unless the
program will always run under the 7-bit ASCII (or "C") locale and is
single-threaded.

-IPA:depth=N
Identical to maxdepth=N.

-IPA:dfe=(ON|OFF)
Enable or disable dead function elimination. Removes any functions
that are inlined everywhere they are called. The default is ON.

-IPA:dve=(ON|OFF)
Enable or disable dead variable elimination. This option removes
variables that are never referenced by the program. Default is ON.

-IPA:echo=(ON|OFF)
Option to echo (to stderr) the compile commands and the final link
commands that are invoked from IPA. Default is OFF. This option can
help monitor the progress of a large system build.

-IPA:field_reorder=(ON|OFF)
Enable the re-ordering of fields in large structs based on their
reference patterns in feedback compilation to minimize data cache
misses. The default is OFF.

-IPA:forcedepth=N
This option sets inline depths, directing IPA to attempt to inline
all functions at a depth of (at most) N in the callgraph, instead of
using the default inlining heuristics. This option ignores the
default heuristic limits on inlining. Functions at depth 0 make no
calls to any sub-functions. Functions only making calls to depth 0
functions are at depth 1, and so on.

-IPA:inline=(ON|OFF)
This option performs inter-file subprogram inlining during the main
IPA processing. The default is ON. Does not affect the light-weight
inliner.

-IPA:keeplight=(ON|OFF)
This option directs IPA not to send -keep to the compiler, in order
to save space. The default is OFF.

-IPA:linear=(ON|OFF)
Controls conversion of a multi-dimensional array to a single dimensional
(linear) array that covers the same block of memory. When inlining
Fortran subroutines, IPA tries to map formal array parameters to the
shape of the actual parameter. In the case that it cannot map the
parameter, it linearizes the array reference. By default, IPA will not
inline such callsites because they may cause performance problems.
The default is OFF.

-IPA:map_limit=N
Direct when IPA enables sp_partition. N is the maximum size (in bytes)
of input files mapped before IPA invokes -IPA:sp_partition.

-IPA:maxdepth=N
This option directs IPA to not attempt to inline functions at a depth
of more than N in the callgraph; where functions that make no calls
are at depth 0, those that call only depth 0 functions are at depth 1,
and so on. This inlining remains subject to overriding limits on code
expansion. Also see -IPA:forcedepth, -IPA:space, and -IPA:plimit.

-IPA:max_jobs=N
This option limits the maximum parallelism when invoking the compiler
after IPA to (at most) N compilations running at once. The option
can take the following values:

0 = The parallelism chosen is equal to either the number of CPUs, the
number of cores, or the number of hyperthreading units in the
compiling system, whichever is greatest.

1 = Disable parallelization during compilation (default)

>1 = Specifically set the degree of parallelism

-IPA:min_hotness=N
When feedback information is available, a call site to a procedure must
be invoked with a count that exceeds the threshold specified by N
before the procedure will be inlined at that call site. The default
is 10.

-IPA:multi_clone=N
This option specifies the maximum number of clones that can be created
from a single procedure. Default value is 0. Aggressive procedural
cloning may provide opportunities for inter-procedural optimization,
but may also significantly increase the code size.

-IPA:clone_list=(ON|OFF)
Tell the IPA function cloner to list cloning actions as they occur to
stderr. The default is -IPA:clone_list=OFF.

-IPA:node_bloat=N
When this option is used in conjunction with -IPA:multi_clone, it
specifies the maximum percentage growth of the total number of procedures
relative to the original program.

-IPA:plimit=N
This option stops inlining into a specific subprogram once it reaches
size N in the intermediate representation. Default is 2500.

-IPA:pu_reorder=(0|1|2)
Control re-ordering the layout of program units based on their
invocation patterns in feedback compilation to minimize instruction
cache misses. This option is ignored unless under feedback compilation.

0 = Disable procedure reordering. This is the default for non-C++
programs.

1 = Reorder based on the frequency in which different procedures
are invoked. This is the default for C++ programs.

2 = Reorder based on caller-callee relationship.

-IPA:relopt=(ON|OFF)
This option enables optimizations similar to those achieved with the
compiler options -O and -c, where objects are built with the assumption
that the compiled objects will be linked into a call-shared executable
later. The default is OFF. In effect, optimizations based on position-
dependent code (non-PIC) are performed on the compiled objects.

-IPA:small_pu=N
A procedure with size smaller than N is not subjected to the plimit
restriction. The default is 30.

-IPA:sp_partition=[setting]
This option enables partitioning for disk/addressing-saving purposes.
The default is OFF. Mainly used for building very large programs.
Normally, partitioning would be done by IPA internally.

-IPA:space=N
Inline until a program expansion of N% is reached. For example,
-IPA:space=20 limits code expansion due to inlining to approximately
20%. Default is no limit.

-IPA:specfile=filename
Opens a filename to read additional options. The specification file
contains zero or more lines with inliner options in the form expected
on the command line. The specfile option cannot occur in a specification
file, so specification files cannot invoke other specification files.

-IPA:use_intrinsic=(ON|OFF)
Enable/disable loading the intrinsic version of standard library
functions. The default is OFF.

-L/opt/acml3.0.0/pathscale64/lib -lacml
The flags above are needed to use the PathScale compiler to link
with the ACML (AMD Core Math Library) 3.0.0 library. The
PathScale-compiled, 64-bit version of ACML that gets installed
at /opt/acml2.7.0/gnu64 by default. ACML is available as a free
download from http://developer.amd.com/acml.aspx.

-LNO:
Option group specifies options and transformations performed
on loop nests. The -LNO: option group is enabled only if the -O3
option is also specified on the compiler command line.

-LNO:apo_use_feedback=(ON|OFF)
Effective only when specified with -apo under feedback-directed
compilation, this flag tells the auto-parallelizer whether to use
the feedback data of the loops in deciding whether each loop should
be parallelized. When the compiler parallelizes a loop, it generates
both a serial and a parallel version. If the trip count of the loop
is small, it is not beneficial to use the parallel version during
execution. When this flag is set to ON and the feedback data indicates
that the loop has small trip count, the auto-parallelizer will not
generate the parallel version, thus saving the runtime check needed
to decide whether to execute the serial or parallel version of the
loop. The default is OFF.

-LNO:build_scalar_reductions=(ON|OFF)
Build scalar reductions before any loop transformation analysis.
Using this flag may enable further loop transformations involving
reduction loops. The default is OFF. This flag is redundant when
-OPT:roundoff=2 or greater is in effect.

-LNO:blocking[=(ON|OFF)]
Enable/disable the cache blocking transformation. The default
is on at -O3 or higher.

-LNO:blocking_size=N
This option specifies a block size that the compiler must use when
performing any blocking. N must be a positive integer number that
represents the number of iterations.

-LNO:fission=(0|1|2)
This option controls loop fission. The options can be one of the
following:

0 = Disables loop fission (default)

1 = Performs normal fission as necessary

2 = Specifies that fission be tried before fusion

If -LNO:fission=1:fusion=1 or -LNO:fission=2:fusion=2 are spec-
ified, then fusion is performed.

-LNO:full_unroll,fu=N
Fully unroll innermost loops with trip_count <= N inside LNO.
N can be any integer between 0 and 100. The default value for N
is 5. Setting this flag to 0 disables full unrolling of small
trip count loops inside LNO.

-LNO:full_unroll_size=N
Fully unroll innermost loops with unrolled loop size <= N inside
LNO. N can be any integer between 0 and 10000. The conditions
implied by the full_unroll option must also be satisfied for
the loop to be fully unrolled. The default value for N is 1600.

-LNO:full_unroll_outer=(ON|OFF)
Control the full unrolling of loops with known trip count that
do not contain a loop and are not contained in a loop. The
conditions implied by both the full_unroll and the
full_unroll_size options must be satisfied for the loop to be
fully unrolled. The default is OFF.

-LNO:fusion=n
Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive.
The default is 1.

-LNO:fusion_peeling_limit=N
This option sets the limit for the number of iterations allowed to
be peeled in fusion, where N>= 0. N=5 by default.

-LNO:gather_scatter=N
This option enables gather-scatter optimizations. N can be one of the
following:

0 = Disable all gather-scatter optimizations

1 = Perform gather-scatter optimizations in non-nested IF
statements (default)

2 = Perform multi-level gather-scatter optimizations

-LNO:hoistif=(ON|OFF)
This option enables or disables hoisting of IF statements inside inner
loops to eliminate redundant loops. Default is ON.

-LNO:ignore_feedback=(ON|OFF)
If the flag is ON then feedback information from the loop annotations
will be ignored in LNO transformations. The default is OFF.

-LNO:ignore_pragmas=(ON|OFF)
This option specifies that the command-line options override directives
in the source file. Default is OFF.

-LNO:local_pad_size=N
This option specifies the amount by which to pad local array dimensions.
The compiler automatically (by default) chooses the amount of padding
to improve cache behavior for local array accesses.

-LNO:non_blocking_loads=(ON|OFF)
(For C/C++ only) The option specifies whether the processor blocks on
loads. If not set, the default of the current processor is used.

-LNO:oinvar=(ON|OFF)
This option controls outer loop hoisting. Default is ON.

-LNO:opt=(0|1)
This option controls the LNO optimization level. The options
can be one of the following:

0 = Disable nearly all loop nest optimizations.

1 = Perform full loop nest transformations. This is the default.

-LNO:ou_prod_max=N
This option indicates that the product of unrolling of the various
outer loops in a given loop nest is not to exceed N, where N is a
positive integer. The default is 16.

-LNO:outer=(ON|OFF)
This option enables or disables outer loop fusion. Default is ON.

-LNO:outer_unroll_max,ou_max=N
The outer_unroll_max option indicates that the compiler may unroll
outer loops in a loop nest by as many as N per loop, but no more.
The default is 4.

-LNO:parallel_overhead=N
Effective only when specified with -apo, the parallel_overhead
option controls the auto-parallelizing compiler's estimate of the
overhead (in processor cycles) incurred by invoking the parallel
version of a loop. When the compiler parallelizes a loop, it generates
both a serial and a parallel version. If the amount of work performed
by the loop is small, it may not be beneficial to use the parallel
version during execution. The set value of parallel_overhead is used
in this determination during execution time when the number of
processors and the iteration count of the loop are taken into account.
The default value is 4096. Because the optimal value varies across
systems and programs, this option can be used for parallel performance
tuning.

-LNO:prefetch[=(0|1|2|3)]
Specify level of prefetching.
0 = Prefetch disabled.

1 = Prefetch is done only for arrays that are always
referenced in each iteration of a loop, the default.

2 = Prefetch is done without the above restrictions.

3 = Most aggressive.

-LNO:prefetch_ahead=n
Prefetch n cache line(s) ahead. The default is 2.

-LNO:prefetch_verbose=(ON|OFF)
-LNO:prefetch_verbose=ON prints verbose prefetch info to stdout.
Default is OFF.

-LNO:processors=N
Tells the compiler to assume that the program compiled under -apo
will be run on a system with the given number of processors. This
helps in reducing the amount of computation during execution for
determining whether to enter the parallel or serial versions of
loops that are parallelized (see the -LNO:parallel_overhead option).
The default is 0, which means unknown number of processors. The
default value of 0 should be used if the program is intended to run
in different systems with different number of processors. If the
option is set to non-zero and the value is different from the number
of processors, the parallelized code will not perform optimally.

-LNO:sclrze=(ON|OFF)
Turn ON or OFF the optimization that replaces an array by a scalar
variable. The default is ON.

-LNO:simd=(0|1|2)
This option enables or disables inner loop vectorization.

0 = Turn off the vectorizer.

1 = (Default) Vectorize only if the compiler can determine that
there is no undesirable performance impact due to sub-optimal
alignment. Vectorize only if vectorization does not introduce
accuracy problems with floating-point operations.

2 = Vectorize without any constraints (most aggressive).

-m32
Generates code according to the 32-bit ABI, also known as x86
or IA32.

-m64
Compile for 64-bit ABI, also known as AMD64, x86_64, or IA32e.
This is the default.

-m3dnow Enable use of 3DNow instructions. The default is OFF.

-march=(opteron|athlon|athlon64|athlon64fx|em64t|pentium4|xeon|anyx86|auto)
Compiler will optimize code for the selected platform. auto means to
optimize for the platform that the compiler is running on, which
the compiler determines by reading /proc/cpuinfo. anyx86 means a
generic x86 processor. Under 32-bit ABI, anyx86 is a processor without
SSE2/SSE3/3DNow! support; under 64-bit ABI it is a processor with
SSE2 but without SSE3/3DNow!. The default is opteron.

-mcpu=(opteron|athlon64|athlon64fx|em64t|pentium4|xeon|anyx86|auto)
Compiler will optimize code for the selected platform. auto means to
optimize for the platform that the compiler is running on, which
the compiler determines by reading /proc/cpuinfo. anyx86 means a
generic x86 processor. Under 32-bit ABI, anyx86 is a processor without
SSE2/SSE3/3DNow! support; under 64-bit ABI it is a processor with
SSE2 but without SSE3/3DNow!. The default is opteron.

-msse2
Enable use of SSE2 instructions. This is the default under
both -m64 and -m32.

-msse3
Enable use of SSE3 instructions. Default is ON under -march=em64t.
Otherwise, it is OFF by default.

-mno-sse2
Disable the use of SSE2/SSE3 instructions. This flag is only applicable
to -m32. -mno-sse2 is ignored under -m64 with a warning.

-O or -O2
Turn on extensive optimization. The optimizations at this level are
generally conservative, in the sense that they (1) are virtually
always beneficial, (2) provide improvements commensurate to the
compile time spent to achieve them, and (3) avoid changes which
affect such things as floating point accuracy.

-O3
Turn on aggressive optimization. The optimizations at this level
are distinguished from -O2 by their aggressiveness, generally
seeking highest-quality generated code even if it requires extensive
compile time. They may include optimizations which are generally
beneficial but occasionally hurt performance. This includes but
is not limited to turning on the Loop Nest Optimizer, -LNO:opt=1,
and setting -OPT:ro=1:IEEE_arith=2:Olimit=9000:reorg_common=ON.

-Ofast Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math."
-OPT:Ofast is described below.

-OPT: ...
This option group controls miscellaneous optimizations. These options
override defaults based on the main optimization level.

-OPT:alias=<name>
Specifies the pointer aliasing model to be used. By
specifiying one or more of the following for <name>, the
compiler is able to make assumptions throughout the compilation:
typed Assume that the code adheres to the ANSI/ISO C
standard which states that two pointers of different
types cannot point to the same location in memory.
This is on by default when -Ofast is specified.

restrict Specifies that distinct pointers are assumed
to point to distinct, non-overlapping objects.
This is off by default.

disjoint Specifies that any two pointer expressions are
assumed to point to distinct, non-overlapping objects.
This is off by default.

-OPT:align_unsafe=(ON|OFF)
Instruct the vectorizer (invoked at -O3) to aggressively perform
vectorization by assuming that array parameters are aligned at 128-bit
boundaries. The vectorizer will then generate 128-bit aligned load and
store instructions, which are faster than their unaligned counterparts.
If the assumption is incorrect, the aligned memory accesses will result
in run-time segmentation faults. The default is OFF.

-OPT:asm_memory=(ON|OFF)
A debugging option to be used when debugging suspected buggy inline
assembly. If ON, the compiler assumes each asm has "memory" specified
even if it is not there. The default is OFF.

-OPT:bb=N
This specifies the maximum number of instructions a basic block
(straight line sequence of instructions with no control flow) can
contain in the code generator�s program representation. Increasing
this value can improve the quality of optimizations that are applied
at the basic block level, but can increase compilation time in
programs that exhibit such large basic blocks. The default is 1300.
If compilation time is an issue, use a smaller value.

-OPT:cis=(ON|OFF)
Convert SIN/COS pairs using the same argument to a single call cal-
culating both values at once. The default is ON.

-OPT:div_split=(ON|OFF)
Enable/disable changing x/y into x*(recip(y)). This is
OFF by default but is enabled by -OPT:Ofast or
-OPT:IEEE_arithmetic=3.

-OPT:early_intrinsics=(ON|OFF)
When ON, this option causes calls to intrinsics to be
expanded to inline code early in the backend compilation.
This may enable more vectorization opportunities if vector
forms of the expanded operations exist. Default is OFF.

-OPT:fast_bit_intrinsics=(ON|OFF)
Setting this to ON will turn off the check for the bit count being
within range for Fortran intrinsics (like BTEST and ISHFT). The
default setting is OFF.

-OPT:fast_complex=(ON|OFF)
Setting fast_complex=ON enables fast calculations for values
declared to be of type complex. When this is set to ON,
complex absolute value (norm) and complex division use fast
algorithms that are more likely to overflow or underflow than
the standard algorithms. OFF is the default. fast_complex=ON
is enabled if -OPT:roundoff=3 is in effect.

-OPT:fast_exp=(ON|OFF)
This option enables optimization of exponentiation by replacing
the runtime call for exponentiation by multiplication and/or square
root operations for certain compile-time constant exponents (integers
and halfs). This can produce differently rounded results that those
from the runtime function. fast_exp is OFF unless -O3 or -Ofast are
specified, or -OPT:roundoff=1 is in effect.

-OPT:fast_io=(ON|OFF)
(For C/C++ only) This option enables inlining of printf(), fprintf(),
sprintf(),scanf(), fscanf(), sscanf(), and printw(). -OPT:fast_io is
only in effect when the candidates for inlining are marked as intrinsic
to the stdio.h and curses.h files. Default is OFF.

-OPT:fast_math=(ON|OFF)
Setting this to ON will tell the compiler to use the fast math functions
tuned for the processor. The affected math functions include log, exp,
sin, cos, sincos, expf and pow. The default setting is OFF. It is turned
on automatically when -OPT:roundoff is at 2 or above.

-OPT:fast_nint=(ON|OFF)
This option uses a hardware feature to implement NINT and ANINT
(both single- and double-precision versions). Default is OFF but
fast_nint=ON is enabled by default if -OPT:ro=3 is in effect.

-OPT:fast_sqrt=(ON|OFF)
This option calculates square roots using the identity sqrt(x)=x*rsqrt(x),
where rsqrt is the reciprocal square root operation. This transformation
generates fairly accurate code. Default is OFF.

-OPT:fast_stdlib=(ON|OFF)
This option controls the generation of calls to faster versions of some
standard library functions. Default is ON.

-OPT:fast_trunc=(ON|OFF)
This option inlines the NINT, ANINT, and AMOD Fortran intrinsics, both
single- and double-precision versions. Default is OFF. fast_trunc is
enabled automatically if -OPT:roundoff=1 or greater is in effect.

-OPT:goto=(OFF|ON)
Disable/enable the conversion of GOTOs into higher level
structures like FOR loops. The default is ON for -O2 or higher.

-OPT:IEEE_arithmetic,IEEE_arith,IEEE_a=(n)
specify level of conformance to IEEE 754 floating pointing
roundoff/overflow behavior. n can be one of the following:

1 Adheres to IEEE accuracy. This is the default when
optimization levels -O0, -O1 and -O2 are in effect.

2. May produce inexact result not conforming to IEEE 754.
This is the default when -O3 is in effect.

3. All mathematically valid transformations are allowed.

-OPT:IEEE_NaN_Inf=(ON|OFF)
OFF specifies non-IEEE-754 results in operations that might
have IEEE 754 NaN or infinity operands; this enables many
optimizations which would be invalid for NaN or infinity
operands. The default is ON.

-OPT:transform_to_memlib=(ON|OFF)
When ON, this option enables transformation of loop constructs
to calls to memcpy or memset. Default is ON when target
processor is EM64T, OFF otherwise.

-OPT:Ofast
Use optimizations selected to maximize performance.
Although the optimizations are generally safe,
they may affect floating point accuracy due to rearrangement
of computations. This effectively turns on the following
optimizations: -OPT:ro=2:Olimit=0:div_split=ON:alias=typed.

-OPT:Olimit=(n)
Disable optimization when size of program unit is > n. When n
is 0, program unit size is ignored and optimization process
will not be disabled due to compile time limit. The default is
0 when -Ofast is specified, otherwise the default is 6000
under -O2 and 9000 under -O3.

-OPT:roundoff,ro=(n)
Specifies the level of acceptable departure from source
language floating-point, round-off, and overflow semantics. n
can be one of the following:

0 Inhibits optimizations that might affect the
floating-point behavior. This is the default when
optimization levels -O0, -O1, and -O2 are in effect.

1 Allows simple transformations that might cause limited
round-off or overflow differences. Compounding such
transformations could have more extensive effects.
This is the default level when -O3 is in effect.

2 Allows more extensive transformations, such as the
reordering of reduction loops. This is the default
level when -Ofast is specified.

3 Enables any mathematically valid transformation.

-OPT:treeheight=(ON|OFF)
The value ON turns on re-association in expressions to reduce
the expressions' tree height. The default value is OFF.

-OPT:unroll_analysis=(ON|OFF)
The default value of ON lets the compiler analyze the
content of the loop to determine the best unrolling
parameters, instead of strictly adhering to the
-OPT:unroll_times_max and -OPT:unroll_size parameters.

-OPT:unroll_times_max,unroll_times=(n)
Unroll inner loops by a maximum of n. The default is 4.

-OPT:unroll_size=(n)
Sets the ceiling of maximum number of instructions for an
unrolled inner loop. If n = 0, the ceiling is disregarded.

-static
Suppresses dynamic linking at run-time for shared libraries;
uses static linking instead.

-TENV:X=(0|1|2|3|4)
Specify the level of enabled exceptions that will be assumed
for purposes of performing speculative code motion (default
is 1 at all optimization levels). In general, an instruction
will not be speculated (i.e. moved above a branch by the
optimizer) unless any exceptions it might cause are disabled
by this option. At level 0, no speculative code motion may
be performed. At level 1, safe speculative code motion may
be performed, with IEEE-754 underflow and inexact exceptions
disabled. At level 2, all IEEE-754 exceptions are disabled
except divide by zero. At level 3, all IEEE-754 exceptions
are disabled including divide by zero. At level 4, memory
exceptions may be disabled or ignored.

-TENV:frame_pointer=(ON|OFF)
Default is ON for C++ and OFF otherwise.
Local variables in the function stack frame are addressed via
the frame pointer register. Ordinarily, the compiler will
replace this use of frame pointer by addressing local variables
via the stack pointer when it determines that the stack pointer
is fixed throughout the function invocation. This frees up the
frame pointer for other purposes. Turning this flag on forces
the compiler to use the frame pointer to address local variables.
This flag defaults to on for C++ because the exception handling
mechanism relies on the frame pointer register being used to
address local variables. This flag can be turned off for C++
for programs that do not throw exceptions.

-Wl,-x
Passes the -x option to the linker. With this flag set, the
linker will not preserve local (non-global) symbols in the output
symbol table. The linker enters external and static symbols
only. This option conserves space in the output file. This is
OFF by default.

-WOPT:aggstr=N
This controls the aggressiveness of the strength reduction optimiz-
ation performed by the scalar optimizer, in which induction
expressions within a loop are replaced by temporaries that are
incremented together with the loop variable. When strength
reduction is overdone, the additional temporaries increase
register pressure, resulting in excessive register spills that
decrease performance. The value specified must be a positive
integer value, which specifies the maximum number of induction
expressions that will be strength-reduced across an index variable
increment. When set at 0, strength reduction is only per-
formed for non-trivial induction expressions. The default is 11.

-WOPT:if_conv=(0|1|2)
Controls the optimization that translates simple IF statements
to conditional move instructions in the target CPU. Setting to
0 suppresses this optimization. The value of 1 designates
conservative if-conversion, in which the context around the IF
statement is used in deciding whether to if-convert. The value
of 2 enables aggressive if-conversion by causing it to be per-
formed regardless of the context. The default is 1.

-WOPT:mem_opnds=(ON|OFF)
ON makes the scalar optimizer preserve any memory operands of
arithmetic operations so as to help bring about subsumption of
memory loads into the operands of arithmetic operations. Load
subsumption is the combining of an arithmetic instruction and
a memory load into one instruction. The default is OFF.

-WOPT:retype_expr=(ON|OFF)
ON enables the optimization in the compiler that converts 64-bit
address computation to use 32-bit arithmetic as much as
possible. The default is OFF.

-WOPT:unroll=(0|1|2)
Control the unrolling of innermost loops in the scalar optimizer.
Setting to 0 suppresses this unroller. The default is 1, which
makes the scalar optimizer unroll only loops that contain IF
statements. Setting to 2 makes the unrolling to also apply to
loop bodies that are straight line code, which duplicates the
unrolling done in the code generator, and is thus unnecessary.
The default setting of 1 makes this unrolling complementary to
what is done in the code generator. This unrolling is not
affected by the unrolling options under the -OPT group.

-WOPT:val=(0|1|2)
Controls the number of times the value-numbering optimization is
performed in the global optimizer, with the default being 1.
This optimization tries to recognize expressions that will
compute identical run-time values and changes the program to avoid
re-computing them.

----------------------------------------------------------------------------
Other Notes
----------------------------------------------------------------------------

/usr/bin/taskset [options] [mask] [pid | command [arg] ... ]

taskset is used to set or retreive the CPU affinity of a running
process given its PID or to launch a new COMMAND with a given CPU
affinity. The CPU affinity is represented as a bitmask, with the
lowest order bit corresponding to the first logical CPU and highest
order bit corresponding to the last logical CPU. When the taskset
returns, it is guaranteed that the given program has been scheduled
to a legal CPU.

The default behaviour of taskset is to run a new command with a
given affinity mask:

taskset [mask] [command] [arguments]

The taskset command is used in the following form in the config file:

submit= MYMASK=`printf '0x%x' \$((1<<\$SPECUSERNUM))`; /usr/bin/taskset \$MYMASK $command

$MYMASK is the bitmask (in hexadecimal) corresponding to a specific
SPECUSERNUM. For example, $MYMASK value for the first copy of a
rate run will be 0x00000001, for the second copy of the rate will
be 0x00000002 etc. Thus, the first copy of the rate run will have a
CPU affinity of CPU0, the second copy will have the affinity CPU1
etc.