Description of compiler flags for Intel C++ Compiler 8.1
--------------------------------------------------------
-O1    optimize for speed, but disable some optimizations which increase 
       code size for a small speed benefit. Includes inline expansion 
       except for intrinsic functions, global optimizations, string 
       pooling optimizations.  

-O2	
	Optimizes for speed. The -O2 option includes the following options: 
	-Og, Ot, -Oy, -Ob1, and -Gs  This options defaults to ON.
	This option also enables.
	* inlining of intrinsics
	* Intra-file interprocedural optimizations including:
	  * inlining
	  * constant propagation
	  * forward substitution
	  * routine attribute propagation
	  * variable address-taken analysis
	  * dead static function elimination
	  * removal of unreferenced variables.
	* The following performance optimizations:
	  * copy propogation.
	  * dead-code elimination
	  * global register allocation
	  * global instruction scheduling and control speculation
	  * loop unrolliing
	  * optimized code selection
	  * partial redundancy elimination
	  * strength reduction/induction variable simplification
	  * variable renaming
 	  * exception handling optimizations
	  * tail recursions
	  * peephole optimizations
	  * structure assignment lowering and optimizations
	  * dead store elimination


-O3:    Optimizes for speed. Enables high-level optimization. This level does 
	not guarantee higher performance. Using this option may increase the
	compilation time. Impact on performance is application dependent, some
	applications may not see a performance improvement.  The optimizations
	include:
	* All optimizations done with -O2
	* loop unrolling, including instruction scheduling
	* code replication to eliminate branches
	* padding the size of certain power-of-two arrays to allow more efficient
	  cache use.
	* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
	  data dependency analysis than for -O2.

-Oa[-] assume [do not assume] no aliasing in program


-Qax<codes> generate code specialized for processor extensions 
specified by <codes> while also generating generic IA-32 code. 
<codes> includes one or more of the following characters:
    i  Pentium Pro and Pentium II processor instructions
    M  MMX(TM) instructions
    K  streaming SIMD extensions (implies i and M above)
    W  Pentium 4 processor with Streaming SIMD Extensions 2 
       (implies i, M and K)
    N  Pentium 4 processor with Streaming SIMD Extensions 2 
    P  Pentium 4 processor with Streaming SIMD Extensions 3 
    
-Qx<codes>  generate specialized code to run exclusively on processors
            supporting the extensions indicated by <codes> as 
            described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P}   The /QxN and /QxP options target your program to run on Intel Pentium 4 
           and compatible Intel processors.  The resulting code might 
           contain unconditional use of features that are not supported 
           on other processors.  Programs, where the function main() is 
           compiled with this option, will detect non compatible processors 
           and generate an error message during execution.  This option 
           also enables new optimizations in addition to Intel processor 
           specific optimizations.

           These options also enable advanced data layout and code restructuring
	     optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Ob{0|1|2}	Controls the compiler's inline expansion.
		0:  disable inlining.
		1:  disables inlining unless -Qip or -Ob2 are specified.
		2:  enables inlining of any function.  However, the 
                    compiler decides which functions are inlined.  This 
                    option enables interprocedural optimizations and has
                    the same effect as specifying the -Qip option.


-Qip        enable single-file IP optimizations 
           (within files, same as -Ob2)

-Qipo       multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propogation
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-fast            The /fast option enhances execution speed across the entire program 
                 by including the following options that can improve run-time performance:

                     /O3 (maximum speed and high-level optimizations) 
                     /Qipo (enables interprocedural optimizations across files) 
                     /QxP (generate code specialized for Intel Pentium 4 processor with 
                           Streaming SIMD Extensions 3)

                 To override one of the options set by /fast, specify that option after the 
                 /fast option on the command line. The options set by /fast may change from 
                 release to release.

-Qansi_alias     Directs the compiler to assume that the program
                 adheres to the type-based aliasing rules defined in Section 6.5 of the ISO C
                 Standard.  If your program adheres to these rules, this option will allow
                 the compiler to optimize more aggressively.  If it doesn't adhere to these
                 rules, it can cause the compiler to generate incorrect code.
 

-Qprof_gen       instrument program for profiling for the first phase of 
                 two-phase profile guided otimization

-Qprof_use       Instructs the compiler to produce a profile-optimized 
                 executable and merges available dynamic information (.dyn) 
                 files into a pgopti.dpi file. If you perform multiple 
                 executions of the instrumented program, -Qprof_use merges 
                 the dynamic information files again and overwrites the 
                 previous pgopti.dpi file.
                 Without any other options, the current directory is 
                 searched for .dyn files

-Qrcd           The Intel compiler uses the -Qrcd option to improve the
                performance of code that requires floating-point-to-integer                        
                conversions. 

                The system default floating point rounding mode is
                round-to-nearest. This means that values are rounded during 
                floating point calculations. However, the C language requires 
                floating point values to be truncated when a conversion to an                      
                integer is involved. To do this, the compiler must change the 
                rounding mode to truncation before each floating 
                point-to-integer conversion and change it back afterwards.

                The -Qrcd option disables the change to truncation of the 
                rounding mode for all floating point calculations, including                       
                floating point-to-integer conversions. Turning on this option 
                can improve performance, but floating point conversions to 
                integer will not conform to C semantics.

-Qunroll[n]     Specifies the maximum number of times to unroll a loop. Omit n to 
                let the compiler decide whether to perform unrolling or not. Use
                n = 0 to disable unroller. 
                If n is not specified, the compiler automatically chooses the maximum 
                number of times to unroll a loop.

-GX             Enables the full C++ Exception Handling unwind semantics. 

-GR             Enables C++ Runtime Type Information (RTTI). 

-Qcxx_features  Enables both -GX and -GR as described above so C++ Runtime Type Information and 
                Exception Handling are both enabled

-Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union 
                types as one of the following: 1, 2, 4, 8, or 16 (default) bytes.

-Qprefetch[-]   Enables [disables] the insertion of software prefetching by the compiler. 
                Default is /Qprefetch. 

shlW32M.lib:    MicroQuill SmartHeap Library available from 
                http://www.microquill.com/


Description of compiler flags for Intel FORTRAN Compiler 8.1
-------------------------------------------------------------
-O1    optimize for speed, but disable some optimizations which increase 
       code size for a small speed benefit. Includes inline expansion 
       except for intrinsic functions, global optimizations, string 
       pooling optimizations.  

-O2    This is the default level of optimization.  
       Optimizes for speed. The -O2 option includes O1 optimizations 
       and in addition enables inlining of intrinsics and more speed 
       optimizations.


-O3:   Builds on -01 and -02 optimizations by enabling high-level 
       optimization. This level does not guarantee higher performance 
       unless loop and memory access transformation take place. In 
       conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the 
       compiler to perform more aggressive data dependency analysis than 
       for -O2. This may result in longer compilation times. 


-Qax<codes> generate code specialized for processor extensions 
specified by <codes> while also generating generic IA-32 code. 
<codes> includes one or more of the following characters:
    i  Pentium Pro and Pentium II processor instructions
    M  MMX(TM) instructions
    K  streaming SIMD extensions (implies i and M above)
    W  Pentium 4 processor with Streaming SIMD Extensions 2 
       (implies i, M and K)
    N  Pentium 4 processor with Streaming SIMD Extensions 2 
    P  Pentium 4 processor with Streaming SIMD Extensions 3 
    
-Qx<codes>  generate specialized code to run exclusively on processors
            supporting the extensions indicated by <codes> as 
            described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P}   The /QxN and /QxP options target your program to run on Intel Pentium 4 
           and compatible Intel processors.  The resulting code might 
           contain unconditional use of features that are not supported 
           on other processors.  Programs, where the function main() is 
           compiled with this option, will detect non compatible processors 
           and generate an error message during execution.  This option 
           also enables new optimizations in addition to Intel processor 
           specific optimizations.

           These options also enable advanced data layout and code restructuring
	     optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Qip        enable single-file IP optimizations (within files, same as -Ob2)

-Qipo       multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propogation
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-fast            The /fast option enhances execution speed across the entire program 
                 by including the following options that can improve run-time performance:

                 -O3   (maximum speed and high-level optimizations) 
                 -Qipo (enables interprocedural optimizations across files) 
                 -QxP  (generate code specialized for Intel Pentium 4 processor with 
                        Streaming SIMD Extensions 3)

                 To override one of the options set by /fast, specify that option after the 
                 /fast option on the command line. The options set by /fast may change from 
                 release to release.

-Qansi_alias     Enables (default) or disables the compiler to assume that the program 
                 adheres to the ANSI Fortran type aliasablility rules. For example, an object 
                 of type real cannot be accessed as an integer. You should see the ANSI 
                 standard for the complete set of rules 

-Qprof_gen       instrument program for profiling for the first phase of 
                 two-phase profile guided otimization

-Qprof_use       Instructs the compiler to produce a profile-optimized 
                 executable and merges available dynamic information (.dyn) 
                 files into a pgopti.dpi file. If you perform multiple 
                 executions of the instrumented program, -Qprof_use merges 
                 the dynamic information files again and overwrites the 
                 previous pgopti.dpi file.
                 Without any other options, the current directory is 
                 searched for .dyn files

-Qrcd            Enables fast float-to-int conversion.

-Qscalar_rep(-)  Enables(disables) scalar replacement performed during loop 
                 transformations (requires /O3).

-Qauto           Causes all variables to be allocated on the stack, rather than 
                 in local static storage. Does not affect variables that appear in an 
                 EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all 
                 local variables AUTOMATIC, same as /4Ya.

-Qprefetch[-]   Enables [disables] the insertion of software prefetching by the compiler. 
                Default is /Qprefetch. 


Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Portability options for CPU2000:
-------------------------------
176.gcc:     
         -Dalloca=_alloca : so as to use the built-in optimized alloca
         -Fn              : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -FI                    : Fixed-format F90 source code. 
   -F32000000             : Same as with 176.gcc, pre-allocates a 32MB 
                            stack

186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executible without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executible to be as close as possible to 
                            the Unix-based one. 
   -MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes. 


Description of compiler flags for Intel FORTRAN Compiler 8.1
-------------------------------------------------------------
-fast   The -fast option enhances execution speed across the entire program by 
          including the following options that can improve run-time performance: 

               -O3 (maximum speed and high-level optimizations). 
               -Qipo (enables interprocedural optimizations across files). 
               -QxP (specific optimization for Intel Pentium 4 processor with Streaming 
                    SIMD Extensions 3). The -fast option does not include -QxP when 
                    compiling on Itanium�-based systems. 

          To override one of the options set by -fast, specify that option after the /fast 
          option on the command line. To target -fast optimizations for a specific 
          processor, use one of the -Qx options. For example:

               prompt>icl -fast -QxW source_file.cpp 

          The options set by -fast may change from release to release.

-Gs[n] Disables stack-checking for routines with n or more bytes of local variables 
          and compiler temporaries. Default: n=4096 

-inline:speed   Enable speed optimizations (same as -Ob2 -Ot)

-[no]f77rtl   Specifies that the FORTRAN-77-specific run-time support should be
          used.

-[no]fpp   Determines whether the Fortran preprocessor is run on source files 
          prior to compilation.    

-O1    Optimizes to favor code size and code locality. Disables 
          loop unrolling. /O1 may improve performance for applications 
          with very large code size, many branches, and execution time 
          not dominated by code within loops. In most cases /O2 is 
          recommended over /O1. 

          IA-32 systems: Enables options /Og, /Oi-, /Os, /Oy, /Ob1, 
          and /Gs. Disables intrinsics inlining to reduce code size.

-O2    This is the default level of optimization.  
          Optimizes for code speed. This is the generally recommended 
          optimization level.

          IA-32 systems: Enables options /Og, /Os, /Oy, /Ob1, and /Gs..


-O3    Enables /O2 optimizations and more aggressive optimizations such 
          as loop and memory access transformation. The /O3 optimizations 
          may slow down code in some cases compared to /O2 optimizations. 
          Recommended for applications that have loops with heavy use of 
          floating-point calculations and process large data sets.

          IA-32 systems: In conjunction with /Qax{K|W|N|B|P} and /Qx{K|W|N|B|P} 
          options, this option causes the compiler to perform more aggressive data 
          dependency analysis than for /O2. This may result in longer compilation times.

-Oa[-] Assume [do not assume] no aliasing in program

-Obn  Controls the compiler's inline expansion. The amount of inline expansion 
          performed varies with the value of n as follows: 
               0: Disables inlining. 
               1: Enables (default) inlining of functions declared with the __inline 
                   keyword. Also enables inlining according to the C++ language. 
               2: Enables inlining of any function. However, the compiler decides 
                   which functions to inline. Enables interprocedural optimizations and 
                   has the same effect as /Qip. 

-Og    Enables global optimizations.

-Oi[-] Enables [disables] inline expansion of intrinsic functions. 

-Os    Enables most speed optimizations, but disable optimizations that increase 
          code size for a small speed benefit.

-Ot    Enables all speed optimizations.

-Oy[-] Enables [disables] the use of the EBP register in optimizations. When you 
          disable with /Oy-, the EBP register is used as frame pointer.

-Qansi_alias     Enables (default) or disables the compiler to assume that the
          program adheres to the ANSI Fortran type aliasablility rules. For 
          example, an object of type real cannot be accessed as an integer. 
          You should see the ANSI standard for the complete set of rules 

-Qauto   Causes all variables to be allocated on the stack, rather than in local 
          static storage. Does not affect variables that have the SAVE attribute or 
          appear in an EQUIVALENCE statement or common block; same as 
          -automatic, -auto or -4Ya. Opposite of -Qsave.

          If -recursive or -Qopenmp is specified, the default is -Qauto.

-Qax<codes> Generate code specialized for processor extensions 
          specified by <codes> while also generating generic IA-32 code. 
          <codes> includes one or more of the following characters:
                i  Pentium Pro and Pentium II processor instructions
              M  MMX(TM) instructions
               K  streaming SIMD extensions (implies i and M above)
              W  Pentium 4 processor with Streaming SIMD Extensions 2 
                   (implies i, M and K)
               N  Pentium 4 processor with Streaming SIMD Extensions 2 
               P  Pentium 4 processor with Streaming SIMD Extensions 3 
    
-Qx<codes>  Generate specialized code to run exclusively on processors
           supporting the extensions indicated by <codes> as 
           described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P}   The /QxN and /QxP options target your program to run on Intel Pentium
          4 and compatible Intel processors.  The resulting code might 
          contain unconditional use of features that are not supported 
          on other processors.  Programs, where the function main() is 
          compiled with this option, will detect non compatible processors 
          and generate an error message during execution.  This option 
          also enables new optimizations in addition to Intel processor 
          specific optimizations.

          These options also enable advanced data layout and code restructuring
          optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Qip        enable single-file IP optimizations (within files, same as -Ob2)

-Qipo[n]    Enables multifile interprocedural (IP) optimizations (between files). 
          When you specify this option, the compiler performs inline function 
           expansion for calls to functions defined in separate files.

          n is an optional integer that specifies the number of object files the compiler 
          should create. Any integer greater than or equal to 0 is valid.

          If n is 0, the compiler decides whether to create one or more object files based 
          on an estimate of the size of the object file. It generates one object file for small 
          applications, and two or more object files for large applications.

          If n is greater than 0, the compiler generates n object files, unless n exceeds the 
          number of source files (m), in which case the compiler generates only m object files.

          If you do not specify n, the default is 1.

          Multi-file ip optimizations that includes:
               - inline function expansion
               - interprocedural constant propogation
               - monitoring module-level static variables
               - dead code elimination
               - propagation of function characteristics
               - multifile optimization
               - passing arguments in registers
              - loop-invariant code motion

-Qoption, tool,list	Passes an argument list to another program in the
          compilation sequence, such as the assembler or linker. The 
          parameter 'tool' can be:

               fpp    Specifies the Intel Fortran preprocessor
               f         Specifies the Fortran compiler
               asm    Specifies the assembler
               link    Specifies the linker

          -Qoption can be used with the -Qipo flag to refine IPO. The valid options
          that can be used for this purpose are:

               -ip_args_in_regs=0   Disables the passing of arguments in registers.

               -ip_ninl_max_stats=n   Sets the valid max number of intermediate
                    language statements for a function that is  expanded in line. The 
                    number n is a positive integer. The number of intermediate language
                    statements usually exceeds the actual number of  source language 
                    statements. The default value for n is 230. The compiler uses a larger 
                    limit for user inline functions. 
						
               -ip_ninl_min_stats=n    Sets the valid min number of intermediate 
                    language statements for a function that is  expanded in line. The 
                    number n is a positive integer. The default values for ip_ninl_min_stats 
                    are: 
                         IA-32 compiler: ip_ninl_min_stats = 7 
 
               -ip_ninl_max_total_stats=n   Sets the maximum increase in size of a function,
                    measured in intermediate language statements,  due to inlining. n is a 
                    positive integer whose default value is 2000. 

-Qparallel   Automatically detects loops capable of being executed safely in parallel 
          and generates multithreaded code for these loops.

-Qprec   Improves floating-point precision. Some speed impact.

-Qprefetch[-]   Enables [disables] the insertion of software prefetching by the
          compiler (requires -O3). Default is /Qprefetch. 

-Qprof_gen   Instrument program for profiling for the first phase of 
          two-phase profile guided otimization

-Qprof_use   Instructs the compiler to produce a profile-optimized 
          executable and merges available dynamic information (.dyn) 
          files into a pgopti.dpi file. If you perform multiple 
          executions of the instrumented program, -Qprof_use merges 
          the dynamic information files again and overwrites the 
          previous pgopti.dpi file.
          Without any other options, the current directory is 
          searched for .dyn files

-Qrcd   Enables fast float-to-int conversion.

-Qscalar_rep[-]   Enables [disables] scalar replacement performed during loop 
          transformations (requires -O3).

Additional Libraries Used 
-------------------------

Supplied by MicroQuill:

shlW32M.lib:    MicroQuill SmartHeap Library 7.0 available from 
                http://www.microquill.com/

Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Portability options for CPU2000:
-------------------------------
176.gcc:     
   -Dalloca=_alloca       : so as to use the built-in optimized alloca
   -F10000000             : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -FI                    : Fixed-format F90 source code. 
   -F32000000             : Same as with 176.gcc, pre-allocates a 32MB 
                            stack

186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executible without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executible to be as close as possible to 
                            the Unix-based one. 
   -MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.