----------------------------
-Super Micro Computer, Inc.-
----------------------------

Description of compiler flags for Intel C++ Compiler 9.0
------------------------------------------------------------

O2
Optimizes for speed. The -O2 option includes the following options:
-Og, -Oi-, -Ot, -Oy, -Ob1, and -Gs This options defaults to ON.
This option also enables.
* inlining of intrinsics
* Intra-file interprocedural optimizations including:
* inlining
* constant propagation
* forward substitution
* routine attribute propagation
* variable address-taken analysis
* dead static function elimination
* removal of unreferenced variables.
* The following performance optimizations:
* copy propogation.
* dead-code elimination
* global register allocation
* global instruction scheduling and control speculation
* loop unrolliing
* optimized code selection
* partial redundancy elimination
* strength reduction/induction variable simplification
* variable renaming
* exception handling optimizations
* tail recursions
* peephole optimizations
* structure assignment lowering and optimizations
* dead store elimination

-O3
Optimizes for speed. Enables high-level optimization. This level does
not guarantee higher performance. Using this option may increase the
compilation time. Impact on performance is application dependent, some
applications may not see a performance improvement. The optimizations
include:
* All optimizations done with -O2
* loop unrolling, including instruction scheduling
* code replication to eliminate branches
* padding the size of certain power-of-two arrays to allow more efficient
cache use.
* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
data dependency analysis than for -O2.

-Oa[-]
Assume [not assume] no aliasing. Default Disabled.

-Obn
Controls the compiler's inline expansion. The amount of inline
expansion performed varies with the value of n as follows:
0: Disables inlining. Statement functions are always inlined.
1: Enables (default) inlining of functions declared with the
__inline keyword. Also enables inlining according to the
C++ language.
2: Enables inlining of any function. However, the
compiler decides which functions to inline. Enables
interprocedural optimizations and has the same effect as
-Qip.
Default n=2.

-Og
Enables global optimizations. Default ON.

-Ot
Enables all speed optimizations.

-Oi[-]
Enables/disables inline expansion of intrinsic functions. Default Enabled.

-Ow[-]
Assume[not assume] no cross function aliasing.

-Oy[-]
Enables [disables] the use of the EBP register in optimizations. When
you disable with -Oy-, the EBP register is used as frame pointer. -Oy has
the effect of reducing the number of general-purpose registers by 1, and can
produce slightly less efficient code.
Default Enabled.

-Gf
Enables string-pooling optimization.

-Gs[n]
Disables stack-checking for routines with n or more bytes of local
variables and compiler temporaries. Default: n=4096

-Gy
Packages functions to enable linker optimization. Default ON.

-Qax{K|W|N|P}
Generates specialized code for processor specific codes
K, W, N while also generating generic IA-32 code.
K = Intel Pentium III and compatible Intel processors
W = Intel Pentium 4 and compatible Intel processors
N = Intel Pentium 4 and compatible Intel processors. These options also enable
advanced data layout and code restructuring optimizations to improve memory
accesses for Intel processors.
P = Intel Pentium 4 processor with Streaming SIMD Extensions 3

-Qx{K|W|N|P}
Generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

-Qip
Enables single-file interprocedural optimizations within a file.

-Qipo
Enables multi-file ip optimizations which allows inline function expansion for
calls to functions defined in separate files. The compiler decides whether to create
one or more object files based on an estimate of the size of the application. It
generates one object file for small applications and two for large ones.

-Qprof_gen
Instruments the program for profiling: to get the execution
count of each basic block.

-Qprof_use
Enables the use of profiling dynamic feedback information
during optimization. Turns on -Qfnsplit. Forces function grouping.

-Qrcd
Enables[disables] fast conversions of floating-point to
integer conversions. This option does not guarantee that
any particular rounding mode will be used.

-Qansi_alias[-]
-Qansi_alias directs the compiler to assume the following:
- Arrays are not accessed out of bounds.
- Pointers are not cast to non-pointer types, and vice-versa.
- References to objects of two different scalar types cannot alias.
For example, an object of type int cannot alias with an object
of type float, or an object of type float cannot alias with an
object of type double.
If your program satisfies the above conditions, setting the -Qansi_alias
flag will help the compiler better optimize the program. However, if your
program does not satisfy one of the above conditions, the -Qansi_alias
flag may lead the compiler to generate incorrect code.

-Qcxx_features Enables both -GX and -GR as described above so C++ Runtime Type Information and
Exception Handling are both enabled

-GR[-]
Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]
Enables[disables] C++ Exception Handling. Default Disabled.

-fast
Maximize speed across the entire program. Turns on -O3, -Qipo,
-Qprec-div-, and -QxP.

To override one of the options set by /fast, specify that option after the
/fast option on the command line. The options set by /fast may change from
release to release.

-Qfp_port
round fp results at assignments & casts (some speed impact)

-Qprefetch
Enable prefetch insertion. Default ON.

-Qunroll[n]
Specifies the maximum number of times to unroll a loop. n=0 disables
loop unrolling. Default: the compiler uses default heuristics when
unrolling loops.

-Qoption,tool,optlist
-Qoption passes an option specified by optlist to a tool, where
optlist is a comma-separated list of options.

tool Description
------------------------------------
cpp Specifies the compiler front-end preprocessor
c Specifies the C++ compiler
asm Specifies the assembler
link Specifies the linker
oplist Indicates one or more valid argument strings for the
designated program. If the argument is a command-line
option, you must include the hyphen. If the argument
contains a space or tab character, you must enclose the
entire argument in quotation characters (""). You must
separate multiple arguments with commas

NOTE: If 'tool' is incorrectly specified, the compiler gives an
warning and the option is ignored. For example, if
-Qoption,f,... is used with the Intel C++ compiler, the
option is ignored with an warning.

-Qoption can be used with the -Qipo flag to refine IPO. The valid options
that can be used for this purpose are:

-ip_args_in_regs=0
Disables the passing of arguments in registers.

-ip_ninl_max_stats=n
Sets the valid max number of intermediate
language statements for a function that is
expanded in line. The number n is a positive
integer. The number of intermediate language
statements usually exceeds the actual number of
source language statements. The default value
for n is 230. The compiler uses a larger limit
for user inline functions.

-ip_ninl_min_stats=n
Sets the valid min number of intermediate
language statements for a function that is
expanded in line. The number n is a positive
integer. The default values for
ip_ninl_min_stats are:
IA-32 compiler: ip_ninl_min_stats = 7

-ip_ninl_max_total_stats=n
Sets the maximum increase in size of a function,
measured in intermediate language statements,
due to inlining. n is a positive integer whose
default value is 2000.

shlW32M.lib:
MicroQuill SmartHeap Library 7.0 available from
http://www.microquill.com/

-Zp{1|2|4|8|16}
Specifies the strictest alignment constraint for structure and union
types as 1, 2. 4. 8 or 16 bytes. Default is 16.

-arch:SSE
Enables the compiler to use SSE instructions.

-arch:SSE2
Enables the compiler to use SSE2 instructions.

-Qprec-div[-]
Enables[disables] improved precision of floating-point divides. Disabling may
slightly improve speed. Default Enabled.

-Qpc64
Enables floating-point significand precision control. The value is used to round
the significand to the correct number of bits. The value must be either 32, 64,
or 80. Default ON.

Description of compiler flags for Intel Fortran Compiler 9.0
------------------------------------------------------------

-O2
Optimizes for speed. The -O2 option includes the following options:
-Og, Ot, -Oy, -Ob1, and -Gs This options defaults to ON.
This option also enables.
* inlining of intrinsics
* Intra-file interprocedural optimizations including:
* inlining
* constant propagation
* forward substitution
* routine attribute propagation
* variable address-taken analysis
* dead static function elimination
* removal of unreferenced variables.
* The following performance optimizations:
* copy propogation.
* dead-code elimination
* global register allocation
* global instruction scheduling and control speculation
* loop unrolliing
* optimized code selection
* partial redundancy elimination
* strength reduction/induction variable simplification
* variable renaming
* exception handling optimizations
* tail recursions
* peephole optimizations
* structure assignment lowering and optimizations
* dead store elimination

-Oa[-]
Assume [not assume] no aliasing

-Ob{0|1|2}
Controls the compiler's inline expansion. The amount of inline
expansion performed varies as follows:
-Ob0: Disable inlining.
-Ob1: Disables (default) inlining unless -Qip or -Ob2 is
specified. Enables inlining of functions.
-Ob2: Enables inlining of any function. However, the
compiler decides which functions to inline. Enables
interprocedural optimizations and has the same effect as
-Qip.

-Og
Enables global optimizations.

-Ot
Enables all speed optimizations.

-Oi[-]
Enables/disables inline expansion of intrinsic functions

-Ow[-]
Assume[not assume] no cross-function aliasing.

-Ox
Same as the -O2 option: enables -Gs, and -Ob1, -Og, -Oy, and -Ot.

-Oy[-]
Enables [disables] the use of the EBP register in optimizations. When
you disable with -Oy-, the EBP register is used as frame pointer.

-auto
Determines whether local variables are put on the run-time stack.

-Gf
Enables string-pooling optimization.

-Gs[n]
Disables stack-checking for routines with n or more bytes of local
variables and compiler temporaries. Default: n=4096

-Gy
Packages functions to enable linker optimization.

-fast
Maximize speed across the entire program. Turns on -O3, -Qprec-div-, -QxP, and -Qipo.

-Qax{K|W|N|P}
Generates specialized code for processor specific codes
K, W, N, P while also generating generic IA-32 code.
K = Intel Pentium III and compatible Intel processors
W = Intel Pentium 4 and compatible Intel processors
N = Intel Pentium 4 and compatible Intel processors. These option also enable
advanced data layout and code restructuring optimizations to improve memory
accesses for Intel processors.
P = Intel Pentium 4 processor with Streaming SIMD 3 (SSE3) support. These option
also enable advanced data layout and code restructuring optimizations to improve memory
accesses for Intel processors.

-Qx{K|W|N|P}
Generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

-Qip
Enables single-file interprocedural optimizations within a file.

-Qipo
multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propagation
- monitoring module-level static variables
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion

-Qprof_gen
Instruments the program for profiling: to get the execution
count of each basic block.

-Qprof_use
Enables the use of profiling dynamic feedback information
during optimization.

-Qrcd
Enables[disables] fast conversions of floating-point to
integer conversions. This option does not guarantee that
any particular rounding mode will be used.

-Qansi_alias
Enables (default) or disables the compiler to assume that the program
adheres to the ANSI Fortran type aliasablility rules. For example, an object
of type real cannot be accessed as an integer. You should see the ANSI
Standard for the complete set of rules.

-Qscalar_rep[-]
Enables[disables] scalar replacement performed during loop
transformations. (requires /O3).

-Qauto Causes all variables to be allocated on the stack, rather than
in local static storage. Does not affect variables that appear in an
EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all
local variables AUTOMATIC, same as /4Ya.

-Qunroll[n]
Specifies the maximum number of times to unroll a loop. n=0 disables
loop unrolling.

-Qprefetch[-]
Enables[disables] prefetch insertion (requires -O3).

-Qoption,tool,optlist
-Qoption passes an option specified by optlist to a tool, where
optlist is a comma-separated list of options.

tool Description
------------------------------------
fpp Specifies the Fortran preprocessor
f Specifies the Fortran compiler
asm Specifies the assembler
link Specifies the linker
oplist Indicates one or more valid argument strings for the
designated tool. You must separate multiple arguments with commas.

-Qoption can be used with the -Qipo flag to refine IPO. The valid option
list that can be used for this purpose are

-ip_args_in_regs=0
Disables the passing of arguments in registers.

-ip_ninl_max_total_stats=n Sets
the maximum increase in size of a function,
measured in intermediate language statements,
due to inlining. n is a positive integer whose
default value is 2000.

shlW32M.lib:
MicroQuill SmartHeap Library 7.0 available from
http://www.microquill.com/

-Zp{1|2|4|8|16}
Specifies the strictest alignment constraint for structure and union
types as 1, 2. 4. 8 or 16 bytes. Default is 16.

-Qprec-div[-]
Enables[disables] improved precision of floating-point divides. Disabling may
slightly improve speed. Default Enabled.

Description of compiler flags for Intel C++ Compiler 8.0
-------------------------------------------------------------------------------

-O2
Optimizes for speed. The -O2 option has the same effect as specifying
the following options: -Og, -Oi, -Ot, -Oy, -Ob1, -Gf, -Gs, and -Gy.
This options defaults to ON.

-Oa[-]
Assume [not assume] no aliasing

-Obn
Controls the compiler's inline expansion. The amount of inline
expansion performed varies with the value of n as follows:
0: Disables inlining.
1: Enables (default) inlining of functions declared with the
__inline keyword. Also enables inlining according to the
C++ language.
2: Enables inlining of any function. However, the
compiler decides which functions to inline. Enables
interprocedural optimizations and has the same effect as
-Qip.
Default n=1.

-Og
Enables global optimizations. Default ON.

-Ot
Enables all speed optimizations. Overrides -Os

-Oi[-]
Enables/disables inline expansion of intrinsic functions. Default Enabled.

-Ow[-]
Assume[not assume] no aliasing within functions, but assume aliasing
across calls.

-Oy[-]
Enables [disables] the use of the EBP register in optimizations. When
you disable with -Oy-, the EBP register is used as frame pointer.
Default Enabled.

-Gf
Enables string-pooling optimization. Default ON.

-Gs[n]
Disables stack-checking for routines with n or more bytes of local
variables and compiler temporaries. Default: n=4096

-Gy
Packages functions to enable linker optimization. Default ON.

-Qax{K|W|N}
Generates specialized code for processor specific codes
K, W, N while also generating generic IA-32 code.
K = Intel Pentium III and compatible Intel processors
W = Intel Pentium 4 and compatible Intel processors
N = Intel Pentium 4 and compatible Intel processors. These options also enable
advanced data layout and code restructuring optimizations to improve memory
accesses for Intel processors.

-Qx{K|W|N}
Generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

-Qip
Enables single-file interprocedural optimizations within a file.
Same as -Ob2.

-Qprof_gen
Instruments the program for profiling: to get the execution
count of each basic block.

-Qprof_use
Enables the use of profiling dynamic feedback information
during optimization. Turns on -Qfnsplit.

-Qrcd
Enables[disables] fast conversions of floating-point to
integer conversions. This option does not guarantee that
any particular rounding mode will be used.

-Qansi_alias[-]
-Qansi_alias directs the compiler to assume[not assume] the following:
- Arrays are not accessed out of bounds.
- Pointers are not cast to non-pointer types, and vice-versa.
- References to objects of two different scalar types cannot alias.
For example, an object of type int cannot alias with an object
of type float, or an object of type float cannot alias with an
object of type double.
If your program satisfies the above conditions, setting the -Qansi_alias
flag will help the compiler better optimize the program. However, if your
program does not satisfy one of the above conditions, the -Qansi_alias
flag may lead the compiler to generate incorrect code.

-GR[-]
Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]
Enables[disables] C++ Exception Handling.

-fast
Maximize speed across the entire program. Turns on -O3 and -Qipo.

-Qfp_port
round fp results at assignments & casts (some speed impact)

-Qprefetch
Enable prefetch insertion. Default ON.

-Qunroll[n]
Specifies the maximum number of times to unroll a loop. n=0 disables
loop unrolling.

-Qoption,tool,optlist
-Qoption passes an option specified by optlist to a tool, where
optlist is a comma-separated list of options.

-Qoption can be used with the -Qipo flag to refine IPO. The valid options
that can be used for this purpose are:

-ip_args_in_regs=0
Disables the passing of arguments in registers.

-ip_ninl_max_total_stats=n
Sets the maximum increase in size of a function,
measured in intermediate language statements,
due to inlining. n is a positive integer whose
default value is 2000.

shlW32M.lib:
MicroQuill SmartHeap Library 7.0 available from
http://www.microquill.com/

-Zp{1|2|4|8|16}
Specifies the strictest alignment constraint for structure and union
types as 1, 2. 4. 8 or 16 bytes. Default is 16.

-arch:SSE
Enables the compiler to use SSE instructions.

-arch:SSE2
Enables the compiler to use SSE2 instructions.

-EHc
Specifies that C functions do not throw exceptions. Default ON.

-G7
Target optimization to Intel Pentium 4 processors. Default ON.

-ML
Compiles and links with the static, single-thread C run time library. Default ON.

-QA
Enables all predefined macros and all assertions. Default ON.

-Qfnsplit
Enables function splitting. Default ON.

-Qms1
Instructs the compiler to enable most Microsoft compatability bugs. Default ON.

-Qmspp
Enables Microsoft C++ 6.0 Processor Pack binary compatability. Default ON.

-Qpc64
Enables floating-point significand precision control. The value is used to round
the significand to the correct number of bits. The value must be either 32, 64,
or 80. Default ON.

-Qpchi
Enables precompiled header files coexistence to reduce build time. Default ON.

-Qsfalign8
May align stack for functions with 8 or 16 byte vars. Default ON.

-Qvc7
Enables compatability with Visual C++ .NET. Default ON.

-Qvec_report1
Indicate vectorized loops in diagnostic information. Default ON.

-vmb
Selects the smallest representation for pointers to members. Use this
option if you define each class before you declare a pointer to a member of the class.
Default ON.

Description of compiler flags for Intel C++ Compiler 8.1
----------------------------------------------------------------------------------
-O1 optimize for speed, but disable some optimizations which increase
code size for a small speed benefit. Includes inline expansion
except for intrinsic functions, global optimizations, string
pooling optimizations.

-O2 This is the default level of optimization.
Optimizes for speed. The -O2 option includes O1 optimizations
and in addition enables inlining of intrinsics and more speed
optimizations.

-O3: Builds on -01 and -02 optimizations by enabling high-level
optimization. This level does not guarantee higher performance
unless loop and memory access transformation take place. In
conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the
compiler to perform more aggressive data dependency analysis than
for -O2. This may result in longer compilation times.

-Oa[-] assume [do not assume] no aliasing in program

-Qax<codes> generate code specialized for processor extensions
specified by <codes> while also generating generic IA-32 code.
<codes> includes one or more of the following characters:
i Pentium Pro and Pentium II processor instructions
M MMX(TM) instructions
K streaming SIMD extensions (implies i and M above)
W Pentium 4 processor with Streaming SIMD Extensions 2
(implies i, M and K)
N Pentium 4 processor with Streaming SIMD Extensions 2
P Pentium 4 processor with Streaming SIMD Extensions 3

-Qx<codes> generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P} The /QxN and /QxP options target your program to run on Intel Pentium 4
and compatible Intel processors. The resulting code might
contain unconditional use of features that are not supported
on other processors. Programs, where the function main() is
compiled with this option, will detect non compatible processors
and generate an error message during execution. This option
also enables new optimizations in addition to Intel processor
specific optimizations.

These options also enable advanced data layout and code restructuring
optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Ob{0|1|2} Controls the compiler's inline expansion.
0: disable inlining.
1: disables inlining unless -Qip or -Ob2 are specified.
2: enables inlining of any function. However, the
compiler decides which functions are inlined. This
option enables interprocedural optimizations and has
the same effect as specifying the -Qip option.

-Qip enable single-file IP optimizations
(within files, same as -Ob2)

-Qipo multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion

-fast The /fast option enhances execution speed across the entire program
by including the following options that can improve run-time performance:

/O3 (maximum speed and high-level optimizations)
/Qipo (enables interprocedural optimizations across files)
/QxP (generate code specialized for Intel Pentium 4 processor with
Streaming SIMD Extensions 3)

To override one of the options set by /fast, specify that option after the
/fast option on the command line. The options set by /fast may change from
release to release.

-Qansi_alias Directs the compiler to assume that the program
adheres to the type-based aliasing rules defined in Section 6.5 of the ISO C
Standard. If your program adheres to these rules, this option will allow
the compiler to optimize more aggressively. If it doesn't adhere to these
rules, it can cause the compiler to generate incorrect code.

-Qprof_gen instrument program for profiling for the first phase of
two-phase profile guided otimization

-Qprof_use Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -Qprof_use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

-Qrcd The Intel compiler uses the -Qrcd option to improve the
performance of code that requires floating-point-to-integer
conversions.

The system default floating point rounding mode is
round-to-nearest. This means that values are rounded during
floating point calculations. However, the C language requires
floating point values to be truncated when a conversion to an
integer is involved. To do this, the compiler must change the
rounding mode to truncation before each floating
point-to-integer conversion and change it back afterwards.

The -Qrcd option disables the change to truncation of the
rounding mode for all floating point calculations, including
floating point-to-integer conversions. Turning on this option
can improve performance, but floating point conversions to
integer will not conform to C semantics.

-Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to
let the compiler decide whether to perform unrolling or not. Use
n = 0 to disable unroller.
If n is not specified, the compiler automatically chooses the maximum
number of times to unroll a loop.

-GX Enables the full C++ Exception Handling unwind semantics.

-GR Enables C++ Runtime Type Information (RTTI).

-Qcxx_features Enables both -GX and -GR as described above so C++ Runtime Type Information and
Exception Handling are both enabled

-Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union
types as one of the following: 1, 2, 4, 8, or 16 (default) bytes.

-Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler.
Default is /Qprefetch.

shlW32M.lib: MicroQuill SmartHeap Library 6.0 available from
http://www.microquill.com/

Description of compiler flags for Intel FORTRAN Compiler 8.1
-------------------------------------------------------------
-O1 optimize for speed, but disable some optimizations which increase
code size for a small speed benefit. Includes inline expansion
except for intrinsic functions, global optimizations, string
pooling optimizations.

-O2 This is the default level of optimization.
Optimizes for speed. The -O2 option includes O1 optimizations
and in addition enables inlining of intrinsics and more speed
optimizations.

-Qx<codes> generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

-Qip enable single-file IP optimizations (within files, same as -Ob2)

-fast The /fast option enhances execution speed across the entire program
by including the following options that can improve run-time performance:

-O3 (maximum speed and high-level optimizations)
-Qipo (enables interprocedural optimizations across files)
-QxP (generate code specialized for Intel Pentium 4 processor with
Streaming SIMD Extensions 3)

To override one of the options set by /fast, specify that option after the
/fast option on the command line. The options set by /fast may change from
release to release.

-Qansi_alias Enables (default) or disables the compiler to assume that the program
adheres to the ANSI Fortran type aliasablility rules. For example, an object
of type real cannot be accessed as an integer. You should see the ANSI
standard for the complete set of rules

-Qprof_gen instrument program for profiling for the first phase of
two-phase profile guided otimization

-Qrcd Enables fast float-to-int conversion.

-Qscalar_rep(-) Enables(disables) scalar replacement performed during loop
transformations (requires /O3).

-Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler.
Default is /Qprefetch.

Other Notes:
------------
"/" and "-" are both allowable starting tokens for flags passed to the
compiler i.e. -QxK and /QxK are identical switches.

Compiler options for PGI Fortran compiler 6.0 for Windows XP IA32
-----------------------------------------------------------------

The optimization levels and their meanings are as follows:

-lacml
Link with the AMD Core Math Library 2.5.3, packaged with the
compiler. Also available at www.amd.com

-O0
A basic block is generated for each Fortran statement. No scheduling

is done between statements. No global optimizations are performed.

-O1
Scheduling within extended basic blocks is performed. Some register
allocation is performed. No global optimizations are performed.

-O2
All level 1 optimizations are performed. In addition, scalar
optimizations such as induction recognition and loop invariant motion
are performed by the global optimizer.

-O3
This level performs all level-one and level-two optimizations and
enables more aggressive hoisting and scalar replacement optimizations.

-fast
Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre"

-fastsse
Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz"

-Mpfi
Generate profile feedback instrumentation; this
includes extra code to collect run-time statistics to
be used in a subsequent compile; -Mpfi must also appear
when the program is linked. When the program is run, a
profile feedback file pgfi.out will be generated; see
-Mpfo.

-Mpfo
Enable profile feedback optimizations; there must be a
profile feedback file pgfi.out in the current
directory, which contains the result of an execution of
the program compiled with -Mpfi.

-Mcache_align
Align unconstrained objects of length greater than or equal to 16 bytes on
cache-line boundaries. An unconstrained object is a data object that is not
a member of an aggregate structure or common block. This option does
not affect the alignment of allocatable or automatic arrays.

Note: To effect cache-line alignment of stack-based local variables, the
main program or function must be compiled with -Mcache_align.

-Mfixed
Process source using Fortran90 freeform specifications.

-Mflushz
Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]
Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align
Instructs the IPA to recognize when pointer targets are all cache-line
aligned, allowing better SSE code generation.

-Mipa=arg
Instructs the IPA to remove arguments replaced by -Mipa=ptr,const

-Mipa=const
Enable propagation of constants across procedure calls.

-Mipa=fast
Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial

-Mipa=f90ptr
Enable Fortran 90 pointer disambiguation across procdure calls.

-Mipa=globals
Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=inline
Automatically determine which functions to inline

-Mipa=safe
Assume unknown function references are safe

-Mipa=localarg
Externalizes local variables for use with -Mipa=arg

-Mipa=ptr
Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial
Instructs the IPA to eliminate functions that are not called.

-Mlre
Enables loop-carried redundancy elimination.

-Mnoframe
Eliminate operations that set up a true stack frame pointer for functions.

-Mnovect
Disables the vectorizer.

-Mscalarsse
Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data)
Extensions) and SSE2 instructions to perform the operations coded.
This implies -Mflushz.

-Munix
Use UNIX calling conventions, no trailing underscores.

-Munroll
Invokes the loop unroller. This also sets the optimization level to 2
if the level is set to less than 2.

:m Instructs the compiler to completely unroll loops with a
constant loop count less than or equal to m, a supplied constant.
If this value is not supplied, the m count is set to 4.

n:u Instructs the compiler to unroll u times, a loop which is
not completely unrolled, or has a non-constant loop count.
If u is not supplied, the unroller computes the number of times a
candidate loop is unrolled.

-Mvect=sse
Instructs the vectorizer to search for loops, and where possible,
use the SSE or SSE2 and prefetch instructions
(depending on which processor is targeted).

Compiler options for PGI C compiler 6.0 for Windows XP
------------------------------------------------------

The optimization levels and their meanings are as follows:

-lacml
Link with the AMD Core Math Library 2.5.3. Available from www.amd.com

-O0
A basic block is generated for each C statement. No scheduling
is done between statements. No global optimizations are performed.