Description of compiler flags for Intel C++ Compiler 8.0
-------------------------------------------------------

-O2	Optimizes for speed. The -O2 option has the same effect as specifying
       	the following options: -Og, -Oi, -Ot, -Oy, -Ob1, -Gf, -Gs, and -Gy.
       	This options defaults to ON.

-O3    	Optimizes for speed. Enables high-level optimization. This level does 
       	not guarantee higher performance. Using this option may increase the
       	compilation time. Impact on performance is application dependent, some
       	applications may not see a performance improvement.

-Oa[-] 	Assume [not assume] no aliasing

-Obn      	Controls the compiler's inline expansion. The amount of inline
                expansion performed varies with the value of n as follows:
		0:  Disables inlining.
		1:  Enables (default) inlining of functions declared with the
                    __inline keyword. Also enables inlining according to the
                    C++ language.
		2:  Enables inlining of any function.  However, the 
                    compiler decides which functions to inline.  Enables 
                    interprocedural optimizations and has the same effect as 
                    -Qip.

-Og	Enables global optimizations.

-Ot	Enables all speed optimizations.

-Oi[-] 	Enables/disables inline expansion of intrinsic functions

-Ow[-]	Assume[not assume] no cross-function aliasing.

-Oy[-]	Enables [disables] the use of the EBP register in optimizations. When
	you disable with -Oy-, the EBP register is used as frame pointer.

-Gf	Enables string-pooling optimization.

-Gs[n]	Disables stack-checking for routines with n or more bytes of local
	variables and compiler temporaries. Default: n=4096

-Gy	Packages functions to enable linker optimization.

-Qax{K|W|N}	Generates specialized code for processor specific codes 
		K, W, N while also generating generic IA-32 code. 
    	K  = Intel Pentium III and compatible Intel processors
    	W  = Intel Pentium 4 and compatible Intel processors
    	N  = Intel Pentium 4 and compatible Intel processors. These options also enable
	     advanced data layout and code restructuring optimizations to improve memory
	     accesses for Intel processors.
    
-Qx{K|W|N}	Generate specialized code to run exclusively on processors
            supporting the extensions indicated by <codes> as 
            described above.


-Qip        Enables single-file interprocedural optimizations within a file.

-Qipo       multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propagation
	      - monitoring module-level static variables
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-Qprof_gen       Instruments the  program for profiling: to get the execution
		 count of each basic block.

-Qprof_use       Enables the use of profiling dynamic feedback information
		 during optimization.

-Qrcd            Enables[disables] fast conversions of floating-point to 
		 integer conversions. This option does not guarantee that
		 any particular rounding mode will be used.

-Qansi_alias[-]  -Qansi_alias directs the compiler to assume the following: 
                    - Arrays are not accessed out of bounds. 
                    - Pointers are not cast to non-pointer types, and vice-versa. 
                    - References to objects of two different scalar types cannot alias. 
		      For example, an object of type int cannot alias with an object 
		      of type float, or an object of type float cannot alias with an 
		      object of type double. 
                 If your program satisfies the above conditions, setting the -Qansi_alias 
		 flag will help the compiler better optimize the program. However, if your
		 program does not satisfy one of the above conditions, the -Qansi_alias
		 flag may lead the compiler to generate incorrect code.
       		 

-GR[-]           Enables[disables] C++ Run Time Type Information (RTTI). 
		 Default is -GR-

-GX[-]           Enables[disables] C++ Exception Handling. Default is -GX-

-fast            Maximize speed across the entire program. Turns on -O3 and -Qipo.

/Qfp_port   	 round fp results at assignments & casts (some speed impact)

/Qprefetch       is warned and ignored by the Intel C/C++ Compiler

-Qunroll[n]	Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.

/Qoption,tool,optlist /Qoption passes an option specified by optlist to a tool, where
                      optlist is a comma-separated list of options.

		      tool                   Description
                     ------------------------------------  
                       cpp             Specifies the compiler front-end preprocessor 
                       c               Specifies the C++ compiler
                       asm             Specifies the assembler
		       link            Specifies the linker
		       oplist          Indicates one or more valid argument strings for the
		                       designated program. If the argument is a command-line
				       option, you must include the hyphen. If the argument
				       contains a space or tab character, you must enclose the
				       entire argument in quotation characters (""). You must
				       separate multiple arguments with commas
		      
		      /Qoption can be used with the -Qipo flag to refine IPO. The valid options
		      that can be used for this purpose are:

                      -ip_args_in_regs=0        Disables the passing of arguments in registers.

		      -ip_ninl_max_stats=n      Sets the valid max number of intermediate
		                                language statements for a function that is 
						expanded in line. The number n is a positive
						integer. The number of intermediate language
						statements usually exceeds the actual number of
						source language statements. The default value
						for n is 230. The compiler uses a larger limit
						for user inline functions. 
						
                      -ip_ninl_min_stats=n      Sets the valid min number of intermediate 
		                                language statements for a function that is 
						expanded in line. The number n is a positive 
						integer. The default values for 
						ip_ninl_min_stats are: 
                                                IA-32 compiler: ip_ninl_min_stats = 7 
 
                      -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function,
		                                 measured in intermediate language statements, 
						 due to inlining. n is a positive integer whose 
						 default value is 2000. 

		      

shlW32M6.lib:    MicroQuill SmartHeap Library 6.0 available from 
                 http://www.microquill.com/

-Zp{1|2|4|8|16}	 Specifies the strictest alignment constraint for structure and union 
		 types as 1, 2. 4. 8 or 16 bytes. Default is 16.


-arch:SSE        Enables the compiler to use SSE instructions.

-arch:SSE2       Enables the compiler to use SSE2 instructions.




Description of compiler flags for Intel Fortran Compiler 8.0
------------------------------------------------------------

-O2	Optimizes for maximum speed. The -O2 option has the same effect as 
       	-Ox.  This options defaults to ON.

-O3    	Enables -O2 option with more aggressive optimization, for example, 
	loop transformation. Optimizes for maximum speed but may not improve
	performance for some programs.

-Oa[-] 	Assume [not assume] no aliasing

-Ob{0|1|2}	Controls the compiler's inline expansion. The amount of inline
                expansion performed varies as follows:
		-Ob0:  Disable inlining.
		-Ob1:  Disables (default) inlining unless -Qip or -Ob2 is
		       specified. Enables inlining of functions.
		-Ob2:  Enables inlining of any function.  However, the 
                       compiler decides which functions to inline.  Enables 
                       interprocedural optimizations and has the same effect as 
                       -Qip.

-Og	Enables global optimizations.

-Ot	Enables all speed optimizations.

-Oi[-] 	Enables/disables inline expansion of intrinsic functions

-Ow[-]	Assume[not assume] no cross-function aliasing.

-Ox   	Same as the -O2 option: enables -Gs, and -Ob1, -Og, -Oy, -Ot, -Oi.

-Oy[-]	Enables [disables] the use of the EBP register in optimizations. When
	you disable with -Oy-, the EBP register is used as frame pointer.

-Gf	Enables string-pooling optimization.

-Gs[n]	Disables stack-checking for routines with n or more bytes of local
	variables and compiler temporaries. Default: n=4096

-Gy	Packages functions to enable linker optimization.

-fast   Maximize speed across the entire program. Turns on -O3 and -Qipo.

-Qax{K|W|N}	Generates specialized code for processor specific codes 
		K, W, N while also generating generic IA-32 code. 
    	K  = Intel Pentium III and compatible Intel processors
    	W  = Intel Pentium 4 and compatible Intel processors
    	N  = Intel Pentium 4 and compatible Intel processors. These options also enable
	     advanced data layout and code restructuring optimizations to improve memory
	     accesses for Intel processors.
    
-Qx{K|W|N}	Generate specialized code to run exclusively on processors
            supporting the extensions indicated by <codes> as 
            described above.


-Qip        Enables single-file interprocedural optimizations within a file.

-Qipo       multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propagation
	      - monitoring module-level static variables
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-Qprof_gen       Instruments the  program for profiling: to get the execution
		 count of each basic block.

-Qprof_use       Enables the use of profiling dynamic feedback information
		 during optimization.

-Qrcd            Enables[disables] fast conversions of floating-point to 
		 integer conversions. This option does not guarantee that
		 any particular rounding mode will be used.

-Qansi_alias     Enables (default) or disables the compiler to assume that the program 
                 adheres to the ANSI Fortran type aliasablility rules. For example, an object 
                 of type real cannot be accessed as an integer. You should see the ANSI 
                 standard for the complete set of rules.


-Qscalar_rep[-]	 Enables[disables] scalar replacement performed during loop
		 transformations. (requires /O3).

-Qunroll[n]	Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.

-Qprefetch[-]    Enables or disables prefetch insertion (requires -O3).

/Qoption,tool,optlist /Qoption passes an option specified by optlist to a tool, where
                      optlist is a comma-separated list of options.

		      tool                   Description
                     ------------------------------------  
                       fpp             Specifies the Fortran preprocessor 
                       f               Specifies the Fortran compiler
                       asm             Specifies the assembler
		       link            Specifies the linker
		       oplist          Indicates one or more valid argument strings for the
		                       designated tool. You must separate multiple arguments with commas.
		      
		      /Qoption can be used with the -Qipo flag to refine IPO. The valid option
		      list that can be used for this purpose are

                      -ip_args_in_regs=0        Disables the passing of arguments in registers.

		      -ip_ninl_max_stats=n      Sets the valid max number of intermediate
		                                language statements for a function that is 
						expanded in line. The number n is a positive
						integer. The number of intermediate language
						statements usually exceeds the actual number of
						source language statements. The default value
						for n is 230. The compiler uses a larger limit
						for user inline functions. 
						
                      -ip_ninl_min_stats=n      Sets the valid min number of intermediate 
		                                language statements for a function that is 
						expanded in line. The number n is a positive 
						integer. The default values for 
						ip_ninl_min_stats are: 
                                                IA-32 compiler: ip_ninl_min_stats = 7 
 
                      -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function,
		                                 measured in intermediate language statements, 
						 due to inlining. n is a positive integer whose 
						 default value is 2000. 


shlW32M6.lib:    MicroQuill SmartHeap Library 6.0 available from 
                 http://www.microquill.com/

-Zp{1|2|4|8|16}	 Specifies the strictest alignment constraint for structure and union 
		 types as 1, 2. 4. 8 or 16 bytes. Default is 16.



Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Compiler options for PGI Fortran compiler 5.1 for Windows XP
-------------------------------------------------------------

The optimization levels and their meanings are as follows:	

+ACML   Link with the AMD Core Math Library 2.0. Available from www.amd.com

-O0	A basic block is generated for each Fortran statement.  No scheduling 
	is done between statements.  No global optimizations are performed.

-O1	Scheduling within extended basic blocks is performed.  Some register 
	allocation is performed.  No global optimizations are performed.

-O2	All level 1 optimizations are performed.  In addition,  scalar
	optimizations such as induction recognition and loop invariant motion 
	are performed by the global optimizer. 
                
-O3	This level performs all level-one and level-two optimizations and 
	enables more aggressive hoisting and scalar replacement optimizations.

-fast	 Equivalent to "-O2 -Munroll:=c:1 -Mnoframe -Mlre" 

-fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 



-Mcache_align    
     Align unconstrained objects of length greater than or equal to 16 bytes on
     cache-line boundaries. An unconstrained object is a data object that is not
     a member of an aggregate structure or common block. This option does
     not affect the alignment of allocatable or automatic arrays.

     Note: To effect cache-line alignment of stack-based local variables, the
     main program or function must be compiled with -Mcache_align.

-Mfixed 
     Process source using Fortran90 freeform specifications.

-Mflushz 	 
     Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
     Instructs the IPA to recognize when pointer targets are all cache-line 
     aligned, allowing better SSE code generation.

-Mipa=arg  
     Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
     Enable propagation of constants across procedure calls.

-Mipa=fast  
     Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial 
              	
-Mipa=globals  
     Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=localarg  
      Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
     Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
     Instructs the IPA to eliminate functions that are not called.
	
-Mnoframe  
     Eliminate operations that set up a true stack frame pointer for functions.

-Mnosmart   
     Don't run the Smart assembly re-write tool to enable post-compilation 
     linear assembly scheduling and optimization

-Mscalarsse   
     Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
     Extensions) and SSE2  instructions to perform the operations coded. 
     This implies -Mflushz.

-Munix   Use UNIX calling conventions, no trailing underscores.

-Munroll  
     Invokes the loop unroller.  This also sets the optimization level to 2 
     if the level is set to less than 2.
			
      c:m	Instructs the compiler to completely unroll loops with a
	constant loop count less than or equal to m, a supplied constant.
	If this value is not supplied, the m count is set to 4.

      n:u	Instructs the compiler to unroll u times, a loop which is
	not completely unrolled, or has a non-constant loop count.
	If u is not supplied, the unroller computes the number of times a
	candidate loop is unrolled.

-Mvect=sse  
     Instructs the vectorizer to search for loops, and where possible,
     use the SSE or SSE2 and prefetch instructions
     (depending on which processor is targeted).


Portability options for CPU2000:
-------------------------------
176.gcc:     
         -Dalloca=_alloca : so as to use the built-in optimized alloca
         /Fn              : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -Mfixed                : Assume free-format source


186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included. 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executable without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executable to be as close as possible to 
                            the Unix-based one.
   /MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.