Description of compiler flags for Intel C++ Compiler 9.1
--------------------------------------------------------
-O1 optimize for speed, but disable some optimizations which increase
code size for a small speed benefit. Includes inline expansion
except for intrinsic functions, global optimizations, string
pooling optimizations.

-O2 This is the default level of optimization.
Optimizes for speed. The -O2 option includes O1 optimizations
and in addition enables inlining of intrinsics and more speed
optimizations.

-O3: Builds on -01 and -02 optimizations by enabling high-level
optimization. This level does not guarantee higher performance
unless loop and memory access transformation take place. In
conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the
compiler to perform more aggressive data dependency analysis than
for -O2. This may result in longer compilation times.

-Oa[-] assume [do not assume] no aliasing in program

-Qax<codes> generate code specialized for processors specified by <codes>
while also generating generic IA-32 code. <codes> includes
one or more of the following characters:
K Intel Pentium III and compatible Intel processors
W Intel Pentium 4 and compatible Intel processors
N Intel Pentium 4 and compatible Intel processors. Enables new
optimizations in addition to Intel processor-specific optimizations
P Intel Pentium 4 processors and compatible Intel processors with
Streaming SIMD Extensions 3
B Intel Pentium M and compatible Intel processors

-Qx<codes> generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

/arch:{SSE|SSE2}
same as -QxK and -QxW respectively

----------------------------------------------------------------------------------
Additional Notes on -QxN and -QxP:
----------------------------------------------------------------------------------
-Qx{N|P} The -QxN and -QxP options target your program to run on Intel Pentium 4
and compatible Intel processors. The resulting code might
contain unconditional use of features that are not supported
on other processors. Programs, where the function main() is
compiled with this option, will detect non compatible processors
and generate an error message during execution. This option
also enables new optimizations in addition to Intel processor
specific optimizations.

These options also enable advanced data layout and code restructuring
optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Ob{0|1|2} Controls the compiler's inline expansion.
0: disable inlining.
1: inline functions declared with __inline, and perform C++ inlining
2: inline any function, at the compiler's discretion (same as -Qip)

-Qip enable single-file IP optimizations
(within files, same as -Ob2)

-Qipo multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion

-fast The -fast option enhances execution speed across the entire program
by including the following options that can improve run-time performance:

-O3 (maximum speed and high-level optimizations)
-Qipo (enables interprocedural optimizations across files)
-QxP (generate code specialized for Intel Pentium 4 processor
and compatible Intel processors with Streaming SIMD Extensions 3)
-Qprec-div- (disable -Qprec-div)
where -Qprec-div improves precision of FP divides
(some speed impact)

To override one of the options set by /fast, specify that option after the
-fast option on the command line. The options set by /fast may change from
release to release.

-Qansi_alias[-] enable/disable use of ANSI aliasing rules in
optimizations; user asserts that the program adheres to
these rules. The default for C++ is -Qansi_alias-
which is that aliasing rules are not assumed. The default for
the Fortran compiler is -Qansi_alias as described in the
next section. For C++, the -Qansi_alias
flag will enable optimizations that would otherwise be
prevented by potential aliasing.

-Qprof_gen instrument program for profiling for the first phase of
two-phase profile guided otimization

-Qprof_use Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -Qprof_use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

-Qrcd The Intel compiler uses the -Qrcd option to improve the
performance of code that requires floating-point-to-integer
conversions.

The system default floating point rounding mode is
round-to-nearest. This means that values are rounded during
floating point calculations. However, the C language requires
floating point values to be truncated when a conversion to an
integer is involved. To do this, the compiler must change the
rounding mode to truncation before each floating
point-to-integer conversion and change it back afterwards.

The -Qrcd option disables the change to truncation of the
rounding mode for all floating point calculations, including
floating point-to-integer conversions. Turning on this option
can improve performance, but floating point conversions to
integer will not conform to C semantics.

-Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to
let the compiler decide whether to perform unrolling or not. Use
n = 0 to disable unroller.
If n is not specified, the compiler automatically chooses the maximum
number of times to unroll a loop.

-Qcxx_features Enables both -GX and -GR as described below so C++ Runtime Type
Information and Exception Handling are both enabled

-GX Enables the full C++ Exception Handling unwind semantics.

-GR Enables C++ Runtime Type Information (RTTI).

-Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union
types as one of the following: 1, 2, 4, 8, or 16 (default) bytes.

-Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler.
Default is -Qprefetch.

shlW32M2.lib: MicroQuill SmartHeap Library 7.0 available from
http://www.microquill.com/

Description of compiler flags for Intel FORTRAN Compiler 9.1
-------------------------------------------------------------
-O1 optimize for speed, but disable some optimizations which increase
code size for a small speed benefit. Includes inline expansion
except for intrinsic functions, global optimizations, string
pooling optimizations.

-O2 This is the default level of optimization.
Optimizes for speed. The -O2 option includes O1 optimizations
and in addition enables inlining of intrinsics and more speed
optimizations.

-Qax<codes> generate code specialized for processors specified by <codes>
while also generating generic IA-32 code. <codes> includes
one or more of the following characters:
K Intel Pentium III and compatible Intel processors
W Intel Pentium 4 and compatible Intel processors
N Intel Pentium 4 and compatible Intel processors. Enables new
optimizations in addition to Intel processor-specific optimizations
P Intel Pentium 4 processors and compatible Intel processors with Streaming SIMD Extensions 3
B Intel Pentium M and compatible Intel processors

-Qx<codes> generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

/arch:{SSE|SSE2}
same as -QxK and -QxW respectively

-Qip enable single-file IP optimizations (within files, same as -Ob2)

-fast The -fast option enhances execution speed across the entire program
by including the following options that can improve run-time performance:

To override one of the options set by /fast, specify that option after the
/fast option on the command line. The options set by /fast may change from
release to release.

-Qansi_alias Enables (default) or disables the compiler to assume that the program
adheres to the ANSI Fortran type aliasability rules. For example, an object
of type real cannot be accessed as an integer. You should see the ANSI
standard for the complete set of rules. The default for this flag
is the reverse in C++, as noted in the previous section.

-Qprof_gen instrument program for profiling for the first phase of
two-phase profile guided otimization

-Qrcd Enables fast float-to-int conversion.

-Qscalar_rep(-) Enables(disables) scalar replacement performed during loop
transformations (requires /O3). Such replacement is disabled by
default.

-Qauto Causes all variables to be allocated on the stack, rather than
in local static storage. Does not affect variables that appear in an
EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all
local variables AUTOMATIC, same as /4Ya.

-Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler.
Default is -Qprefetch.

Other Notes:
------------
"/" and "-" are both allowable starting tokens for flags passed to the
compiler i.e. -QxK and /QxK are identical switches.

Additional Libraries Used:
--------------------------

Supplied by MicroQuill:

shlW32M2.lib: MicroQuill SmartHeap Library 7.0 available from
http://www.microquill.com/

Portability options for CPU2000:
--------------------------------
176.gcc:
-Dalloca=_alloca : so as to use the built-in optimized alloca
-Fn : 176.gcc uses alloca and this options tells
the linker to pre-allocate n bytes of stack.
The default amount of stack allocated is not
enough and 176.gcc crashes with a run-time
error

178.galgel:
-FI : Fixed-format F90 source code.
-F32000000 : Same as with 176.gcc, pre-allocates a 32MB
stack

186.crafty:
-DNT_i386 : Specifies that it is a Windows NT Intel
processor-based system which makes the compiler
use "long long" as the 64-bit variable that
186.crafty needs.

253.perlbmk:
-DSPEC_CPU2000_NTOS : This enables some of the code changes necessary
for compilation on Windows, get included
-DPERLDLL : On Windows, we need a perl.exe instead of a
perl.exe and perl.dll. This pre-define ensures
that the changes necessary to get a single,
UNIX-style executible without getting the
indirect calls that can cause a 10% performance
degradation. This allows the Windows-based
executible to be as close as possible to
the Unix-based one.
-MT : Use the static multi-threaded library else
it will not compile.

254.gap:
-DSYS_HAS_CALLOC_PROTO :
-DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence
of malloc and calloc prototypes.