Description of compiler flags for Intel C++ Compiler 8.0
--------------------------------------------------------

Portability flags:

-Dalloca=_alloca         so as to use the built-in optimized alloc
/Fn                      176.gcc uses alloca and this options tells
                         the linker to pre-allocate n bytes of stack. 
                         The default amount of stack allocated is not 
                         enough and  176.gcc crashes with a run-time 
                         error

-DNT_i386                Specifies that it is a Windows NT Intel 
                         processor-based system which makes the compiler 
                         use "long long" as the 64-bit variable that 
                         186.crafty needs.        

-DSPEC_CPU2000_NTOS      This enables the code changes for porting to 
                         Windows get included. 
   
-DPERLDLL                On Windows, we need a perl.exe instead of a 
                         perl.exe and perl.dll. This pre-define ensures 
                         that the changes necessary to get a single, 
                         UNIX-style executable without getting the 
                         indirect calls that can cause a 10% performance 
                         degradation. This allows the Windows-based 
                         executable to be as close as possible to 
                         the Unix-based one.
/MT                      Use the static multi-threaded library else 
                         it will not compile.

-DSYS_HAS_CALLOC_PROTO   These two pre-defines tell of the existence 
-DSYS_HAS_MALLOC_PROTO   of malloc and calloc prototypes.


Optimization Flags:

-O2
		         Optimizes for speed. The -O2 option has the same effect as specifying
		         the following options: -Og, -Oi, -Ot, -Oy, -Ob1, -Gf, -Gs, and -Gy.
		         This options defaults to ON.

-O3    	
		         Optimizes for speed. Enables high-level optimization. This level does 
		         not guarantee higher performance. Using this option may increase the
		         compilation time. Impact on performance is application dependent, some
		         applications may not see a performance improvement.

-Oa[-] 	
		         Assume [not assume] no aliasing

-Obn      	
		         Controls the compiler's inline expansion. The amount of inline
		         expansion performed varies with the value of n as follows:
		         0:  Disables inlining.
		         1:  Enables (default) inlining of functions declared with the
		             __inline keyword. Also enables inlining according to the
                    C++ language.
		         2:  Enables inlining of any function.  However, the 
		             compiler decides which functions to inline.  Enables 
		             interprocedural optimizations and has the same effect as 
		             -Qip.
		         Default n=1.

-Og	
		         Enables global optimizations.  Default ON.

-Ot	
		         Enables all speed optimizations.  Overrides -Os

-Oi[-] 	
		         Enables/disables inline expansion of intrinsic functions.  Default Enabled.

-Ow[-]	
		         Assume[not assume] no aliasing within functions, but assume aliasing
		         across calls.

-Oy[-]	
		         Enables [disables] the use of the EBP register in optimizations. When
		         you disable with -Oy-, the EBP register is used as frame pointer. 
		         Default Enabled.

-Gf	
	             Enables string-pooling optimization.  Default ON.

-Gs[n]	
		         Disables stack-checking for routines with n or more bytes of local
		         variables and compiler temporaries. Default: n=4096

-Gy	
		         Packages functions to enable linker optimization.  Default ON.

-Qax{K|W|N}
		         Generates specialized code for processor specific codes 
		         K, W, N while also generating generic IA-32 code. 
		         K  = Intel Pentium III and compatible Intel processors
		         W  = Intel Pentium 4 and compatible Intel processors
		         N  = Intel Pentium 4 and compatible Intel processors. These options also enable
		         advanced data layout and code restructuring optimizations to improve memory
		         accesses for Intel processors.
    
-Qx{K|W|N}	
		         Generate specialized code to run exclusively on processors
		         supporting the extensions indicated by <codes> as 
		         described above.


-Qip        
		         Enables single-file interprocedural optimizations within a file.  
		         Same as -Ob2.

-Qipo       
		         multi-file ip optimizations that includes:
		         - inline function expansion
		         - interprocedural constant propagation
		         - monitoring module-level static variables
		         - dead code elimination
		         - propagation of function characteristics
		         - passing arguments in registers
		         - loop-invariant code motion

-Qprof_gen       
		         Instruments the  program for profiling: to get the execution
		         count of each basic block.

-Qprof_use       
		         Enables the use of profiling dynamic feedback information
		         during optimization.  Turns on -Qfnsplit.

-Qrcd            
		         Enables[disables] fast conversions of floating-point to 
		         integer conversions. This option does not guarantee that
		         any particular rounding mode will be used.

-Qansi_alias[-]  
		         -Qansi_alias directs the compiler to assume[not assume] the following: 
		         - Arrays are not accessed out of bounds. 
		         - Pointers are not cast to non-pointer types, and vice-versa. 
		         - References to objects of two different scalar types cannot alias. 
		         For example, an object of type int cannot alias with an object 
		         of type float, or an object of type float cannot alias with an 
		         object of type double. 
		         If your program satisfies the above conditions, setting the -Qansi_alias 
		         flag will help the compiler better optimize the program. However, if your
		         program does not satisfy one of the above conditions, the -Qansi_alias
		         flag may lead the compiler to generate incorrect code.
       		 

-GR[-]           
		         Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]           
		         Enables[disables] C++ Exception Handling.

-fast            
		         Maximize speed across the entire program. Turns on -O3 and -Qipo.

-Qfp_port   	 
		         round fp results at assignments & casts (some speed impact)

-Qprefetch       
		         Enable prefetch insertion.  Default ON.

-Qunroll[n]	 
		         Specifies the maximum number of times to unroll a loop. n=0 disables
		         loop unrolling.

-Qoption,tool,optlist 
		         -Qoption passes an option specified by optlist to a tool, where
		         optlist is a comma-separated list of options.

		         tool		Description
		         ------------------------------------  
		            cpp		Specifies the compiler front-end preprocessor 
		              c		Specifies the C++ compiler
		            asm		Specifies the assembler
		           link		Specifies the linker
		         oplist		Indicates one or more valid argument strings for the
				 designated program. If the argument is a command-line
				 option, you must include the hyphen. If the argument
				 contains a space or tab character, you must enclose the
				 entire argument in quotation characters (""). You must
				 separate multiple arguments with commas
		      

	             -Qoption can be used with the -Qipo flag to refine IPO. The valid options
		         that can be used for this purpose are:

		-ip_args_in_regs=0		
				 Disables the passing of arguments in registers.
		
		-ip_ninl_max_stats=n		
				 Sets the valid max number of intermediate
			     language statements for a function that is 
				 expanded in line. The number n is a positive
				 integer. The number of intermediate language
				 statements usually exceeds the actual number of
				 source language statements. The default value
				 for n is 230. The compiler uses a larger limit
				 for user inline functions. 
						
		-ip_ninl_min_stats=n      
				 Sets the valid min number of intermediate 
				 language statements for a function that is 
				 expanded in line. The number n is a positive 
				 integer. The default values for 
				 ip_ninl_min_stats are: 
				 IA-32 compiler: ip_ninl_min_stats = 7 
 
		-ip_ninl_max_total_stats=n 
				 Sets the maximum increase in size of a function,
				 measured in intermediate language statements, 
				 due to inlining. n is a positive integer whose 
				 default value is 2000. 


shlW32M.lib:    
		         MicroQuill SmartHeap Library available from 
		         http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		         Specifies the strictest alignment constraint for structure and union 
		         types as 1, 2. 4. 8 or 16 bytes. Default is 16.


-arch:SSE        
		         Enables the compiler to use SSE instructions.

-arch:SSE2
		         Enables the compiler to use SSE2 instructions.

-EHc             
		         Specifies that C functions do not throw exceptions. Default ON.

-G7              
		         Target optimization to Intel Pentium 4 processors.  Default ON.

-ML              
		         Compiles and links with the static, single-thread C run time library.  Default ON.

-QA              
		         Enables all predefined macros and all assertions.  Default ON.

-Qfnsplit        
		         Enables function splitting.  Default ON.

-Qms1            
		         Instructs the compiler to enable most Microsoft compatability bugs.  Default ON.

-Qmspp           
		         Enables Microsoft C++ 6.0 Processor Pack binary compatability.  Default ON.

-Qpc64           
		         Enables floating-point significand precision control.  The value is used to round
		         the significand to the correct number of bits.  The value must be either 32, 64, 
		         or 80.  Default ON.

-Qpchi           
		         Enables precompiled header files coexistence to reduce build time.  Default ON.

-Qsfalign8	 
		         May align stack for functions with 8 or 16 byte vars.  Default ON.

-Qvc7            
		         Enables compatability with Visual C++ .NET.  Default ON.

-Qvec_report1    
		         Indicate vectorized loops in diagnostic information.  Default ON.

-vmb             
		         Selects the smallest representation for pointers to members. Use this
		         option if you define each class before you declare a pointer to a member of the class.
		         Default ON.


Description of compiler flags for Intel C++ Compiler 9.0
--------------------------------------------------------

Portability flags:

-Dalloca=_alloca         so as to use the built-in optimized alloc
/Fn                      176.gcc uses alloca and this options tells
                         the linker to pre-allocate n bytes of stack. 
                         The default amount of stack allocated is not 
                         enough and  176.gcc crashes with a run-time 
                         error

-DNT_i386                Specifies that it is a Windows NT Intel 
                         processor-based system which makes the compiler 
                         use "long long" as the 64-bit variable that 
                         186.crafty needs.        

-DSPEC_CPU2000_NTOS      This enables the code changes for porting to 
                         Windows get included. 
   
-DPERLDLL                On Windows, we need a perl.exe instead of a 
                         perl.exe and perl.dll. This pre-define ensures 
                         that the changes necessary to get a single, 
                         UNIX-style executable without getting the 
                         indirect calls that can cause a 10% performance 
                         degradation. This allows the Windows-based 
                         executable to be as close as possible to 
                         the Unix-based one.
/MT                      Use the static multi-threaded library else 
                         it will not compile.

-DSYS_HAS_CALLOC_PROTO   These two pre-defines tell of the existence 
-DSYS_HAS_MALLOC_PROTO   of malloc and calloc prototypes.


Optimization Flags:

-O2	
		Optimizes for speed. The -O2 option includes the following options: 
		-Og, -Oi-, -Os, -Oy, -Ob1, and -Gs  This options defaults to ON.
		This option also enables.
		* inlining of intrinsics
		* Intra-file interprocedural optimizations including:
		  * inlining
		  * constant propagation
		  * forward substitution
		  * routine attribute propagation
		  * variable address-taken analysis
		  * dead static function elimination
		  * removal of unreferenced variables.
		* The following performance optimizations:
		  * copy propogation.
		  * dead-code elimination
		  * global register allocation
		  * global instruction scheduling and control speculation
		  * loop unrolliing
		  * optimized code selection
		  * partial redundancy elimination
		  * strength reduction/induction variable simplification
		  * variable renaming
		  * exception handling optimizations
		  * tail recursions
		  * peephole optimizations
		  * structure assignment lowering and optimizations
		  * dead store elimination

-O3    	
		Optimizes for speed. Enables high-level optimization. This level does 
		not guarantee higher performance. Using this option may increase the
		compilation time. Impact on performance is application dependent, some
		applications may not see a performance improvement.  The optimizations
		include:
		* All optimizations done with -O2
		* loop unrolling, including instruction scheduling
		* code replication to eliminate branches
		* padding the size of certain power-of-two arrays to allow more efficient
		  cache use.
		* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
		  data dependency analysis than for -O2.

-Oa[-] 	
		Assume [not assume] no aliasing.  Default Disabled.

-Obn      	
		Controls the compiler's inline expansion. The amount of inline
		expansion performed varies with the value of n as follows:
		0:  Disables inlining.  Statement functions are always inlined.
		1:  Enables (default) inlining of functions declared with the
		    __inline keyword. Also enables inlining according to the
		    C++ language.
		2:  Enables inlining of any function.  However, the 
		    compiler decides which functions to inline.  Enables 
		    interprocedural optimizations and has the same effect as 
		    -Qip.
		Default n=2.

-Og	
		Enables global optimizations.  Default ON.

-Ot	
		Enables all speed optimizations.  Overrides -Os

-Oi[-] 	
		Enables/disables inline expansion of intrinsic functions.  Default Enabled.

-Ow[-]	
		Assume[not assume] no cross function aliasing.

-Oy[-]	
		Enables [disables] the use of the EBP register in optimizations. When
		you disable with -Oy-, the EBP register is used as frame pointer.  -Oy has
		the effect of reducing the number of general-purpose registers by 1, and can
		produce slightly less efficient code.
		Default Enabled.

-Gf	
		Enables string-pooling optimization.

-Gs[n]	
		Disables stack-checking for routines with n or more bytes of local
		variables and compiler temporaries. Default: n=4096

-Gy	
		Packages functions to enable linker optimization.  Default ON.

-Qax{K|W|N}	
		Generates specialized code for processor specific codes 
		K, W, N while also generating generic IA-32 code. 
		K  = Intel Pentium III and compatible Intel processors
		W  = Intel Pentium 4 and compatible Intel processors
		N  = Intel Pentium 4 and compatible Intel processors. These options also enable
		     advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
    
-Qx{K|W|N}	
		Generate specialized code to run exclusively on processors
		supporting the extensions indicated by <codes> as 
		described above.


-Qip        
		Enables single-file interprocedural optimizations within a file.  

-Qipo       
		Enables multi-file ip optimizations which allows inline function expansion for
		calls to functions defined in separate files.  The compiler decides whether to create
		one or more object files based on an estimate of the size of the application.  It
		generates one object file for small applications and two for large ones.

-Qprof_gen       
		Instruments the  program for profiling: to get the execution
		count of each basic block.

-Qprof_use       
		Enables the use of profiling dynamic feedback information
		during optimization.  Turns on -Qfnsplit.  Forces function grouping.

-Qrcd            
		Enables[disables] fast conversions of floating-point to 
		integer conversions. This option does not guarantee that
		any particular rounding mode will be used.

-Qansi_alias[-]  
		-Qansi_alias directs the compiler to assume the following: 
		    - Arrays are not accessed out of bounds. 
		    - Pointers are not cast to non-pointer types, and vice-versa. 
		    - References to objects of two different scalar types cannot alias. 
		      For example, an object of type int cannot alias with an object 
		      of type float, or an object of type float cannot alias with an 
		      object of type double. 
		If your program satisfies the above conditions, setting the -Qansi_alias 
		flag will help the compiler better optimize the program. However, if your
		program does not satisfy one of the above conditions, the -Qansi_alias
		flag may lead the compiler to generate incorrect code.
       		 

-GR[-]           
		Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]           
		Enables[disables] C++ Exception Handling.  Default Disabled.

-fast            
		Maximize speed across the entire program. Turns on -O3, -Qipo,
		-Qprec-div-,  and -QxP.

-Qfp_port   	 
		round fp results at assignments & casts (some speed impact)

-Qprefetch       
		Enable prefetch insertion.  Default ON.

-Qunroll[n]	 
		Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.  Default: the compiler uses default heuristics when 
		unrolling loops.

-Qoption,tool,optlist 
		-Qoption passes an option specified by optlist to a tool, where
		optlist is a comma-separated list of options.

		tool		Description
		------------------------------------  
		cpp		Specifies the compiler front-end preprocessor 
		c		Specifies the C++ compiler
		asm		Specifies the assembler
		link		Specifies the linker
		oplist		Indicates one or more valid argument strings for the
				designated program. If the argument is a command-line
				option, you must include the hyphen. If the argument
				contains a space or tab character, you must enclose the
				entire argument in quotation characters (""). You must
				separate multiple arguments with commas

                NOTE: If 'tool' is incorrectly specified, the compiler gives an
		warning and the option is ignored. For example, if
		-Qoption,f,...  is used with the Intel C++ compiler, the
		option is ignored with an warning.		      
		      
		-Qoption can be used with the -Qipo flag to refine IPO. The valid options
		that can be used for this purpose are:

		-ip_args_in_regs=0        
				Disables the passing of arguments in registers.

		-ip_ninl_max_stats=n      
				Sets the valid max number of intermediate
				language statements for a function that is 
				expanded in line. The number n is a positive
				integer. The number of intermediate language
				statements usually exceeds the actual number of
				source language statements. The default value
				for n is 230. The compiler uses a larger limit
				for user inline functions. 
						
                      -ip_ninl_min_stats=n      
				Sets the valid min number of intermediate 
				language statements for a function that is 
				expanded in line. The number n is a positive 
				integer. The default values for 
				ip_ninl_min_stats are: 
				IA-32 compiler: ip_ninl_min_stats = 7 
 
                      -ip_ninl_max_total_stats=n 
				Sets the maximum increase in size of a function,
				measured in intermediate language statements, 
				due to inlining. n is a positive integer whose 
				default value is 2000. 
	
shlW32M.lib:    
		MicroQuill SmartHeap Library available from 
		http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		Specifies the strictest alignment constraint for structure and union 
		types as 1, 2. 4. 8 or 16 bytes. Default is 16.


-arch:SSE        
		Enables the compiler to use SSE instructions.

-arch:SSE2       
		Enables the compiler to use SSE2 instructions.

-Qprec-div[-]    
		Enables[disables] improved precision of floating-point divides.  Disabling may
		slightly improve speed.  Default Enabled.

-Qpc64           
		Enables floating-point significand precision control.  The value is used to round
		the significand to the correct number of bits.  The value must be either 32, 64, 
		or 80.  Default ON.


Description of compiler flags for Intel Fortran Compiler 9.0
------------------------------------------------------------

Portability flags:

-FI           Fixed-format F90 source code. 
-F32000000    Same as with 176.gcc, pre-allocates a 32MB 
              stack


Optimization Flags:

-O2	
		Optimizes for speed. The -O2 option includes the following options: 
		-Og, Ot, -Oy, -Ob1, and -Gs  This options defaults to ON.
		This option also enables.
		* inlining of intrinsics
		* Intra-file interprocedural optimizations including:
		  * inlining
		  * constant propagation
		  * forward substitution
		  * routine attribute propagation
		  * variable address-taken analysis
		  * dead static function elimination
		  * removal of unreferenced variables.
		* The following performance optimizations:
		  * copy propogation.
		  * dead-code elimination
		  * global register allocation
		  * global instruction scheduling and control speculation
		  * loop unrolliing
		  * optimized code selection
		  * partial redundancy elimination
		  * strength reduction/induction variable simplification
		  * variable renaming
 		  * exception handling optimizations
		  * tail recursions
		  * peephole optimizations
		  * structure assignment lowering and optimizations
		  * dead store elimination

-O3    	
		Optimizes for speed. Enables high-level optimization. This level does 
		not guarantee higher performance. Using this option may increase the
		compilation time. Impact on performance is application dependent, some
		applications may not see a performance improvement.  The optimizations
		include:
		* All optimizations done with -O2
		* loop unrolling, including instruction scheduling
		* code replication to eliminate branches
		* padding the size of certain power-of-two arrays to allow more efficient
		  cache use.
		* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
		  data dependency analysis than for -O2.

-Oa[-] 	
		Assume [not assume] no aliasing

-Ob{0|1|2}	
		Controls the compiler's inline expansion. The amount of inline
		expansion performed varies as follows:
		-Ob0:  Disable inlining.
		-Ob1:  Disables (default) inlining unless -Qip or -Ob2 is
		       specified. Enables inlining of functions.
		-Ob2:  Enables inlining of any function.  However, the 
		       compiler decides which functions to inline.  Enables 
		       interprocedural optimizations and has the same effect as 
		       -Qip.

-Og	
		Enables global optimizations.

-Ot	
		Enables all speed optimizations.

-Oi[-] 	
		Enables/disables inline expansion of intrinsic functions

-Ow[-]	
		Assume[not assume] no cross-function aliasing.

-Ox   	
		Same as the -O2 option: enables -Gs, and -Ob1, -Og, -Oy, and -Ot.

-Oy[-]	
		Enables [disables] the use of the EBP register in optimizations. When
		you disable with -Oy-, the EBP register is used as frame pointer.

-auto   
		Determines whether local variables are put on the run-time stack.

-Gf	
		Enables string-pooling optimization.

-Gs[n]	
		Disables stack-checking for routines with n or more bytes of local
		variables and compiler temporaries. Default: n=4096

-Gy	
		Packages functions to enable linker optimization.

-fast   
		Maximize speed across the entire program. Turns on -O3, -Qprec-div-, -QxP, and -Qipo.

-Qax{K|W|N|P} 
		Generates specialized code for processor specific codes 
		K, W, N, P while also generating generic IA-32 code. 
		K  = Intel Pentium III and compatible Intel processors
		W  = Intel Pentium 4 and compatible Intel processors
		N  = Intel Pentium 4 and compatible Intel processors. These option also enable
		     advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
		P  = Intel Pentium 4 processor with Streaming SIMD 3 (SSE3) support. These option 
		     also enable advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
    
-Qx{K|W|N|P} 
		Generate specialized code to run exclusively on processors
		supporting the extensions indicated by <codes> as 
		described above.


-Qip        
		Enables single-file interprocedural optimizations within a file.

-Qipo      
		multi-file ip optimizations that includes:
		- inline function expansion
		- interprocedural constant propagation
		- monitoring module-level static variables
		- dead code elimination
		- propagation of function characteristics
		- passing arguments in registers
		- loop-invariant code motion

-Qprof_gen       
		Instruments the  program for profiling: to get the execution
		count of each basic block.

-Qprof_use       
		Enables the use of profiling dynamic feedback information
		during optimization.

-Qrcd            
		Enables[disables] fast conversions of floating-point to 
		integer conversions. This option does not guarantee that
		any particular rounding mode will be used.

-Qansi_alias     
		Enables (default) or disables the compiler to assume that the program 
		adheres to the ANSI Fortran type aliasablility rules. For example, an object 
		of type real cannot be accessed as an integer. You should see the ANSI 
		Standard for the complete set of rules.


-Qscalar_rep[-]	 
		Enables[disables] scalar replacement performed during loop
		transformations. (requires /O3).

-Qunroll[n]	
		Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.

-Qprefetch[-]    
		Enables[disables] prefetch insertion (requires -O3).

-Qoption,tool,optlist 
		-Qoption passes an option specified by optlist to a tool, where
		optlist is a comma-separated list of options.

		tool		Description
		------------------------------------  
		fpp		Specifies the Fortran preprocessor 
		f		Specifies the Fortran compiler
		asm		Specifies the assembler
		link		Specifies the linker
		oplist		Indicates one or more valid argument strings for the
				designated tool. You must separate multiple arguments with commas.
		      
		-Qoption can be used with the -Qipo flag to refine IPO. The valid option
		list that can be used for this purpose are

		-ip_args_in_regs=0        
				Disables the passing of arguments in registers.

		-ip_ninl_max_stats=n      
				Sets the valid max number of intermediate
				language statements for a function that is 
				expanded in line. The number n is a positive
				integer. The number of intermediate language
				statements usually exceeds the actual number of
				source language statements. The default value
				for n is 230. The compiler uses a larger limit
				for user inline functions. 
						
		-ip_ninl_min_stats=n      
				Sets the valid min number of intermediate 
				language statements for a function that is 
				expanded in line. The number n is a positive 
				integer. The default values for 
				ip_ninl_min_stats are: 
				IA-32 compiler: ip_ninl_min_stats = 7 
 
		-ip_ninl_max_total_stats=n Sets 
				the maximum increase in size of a function,
				measured in intermediate language statements, 
				due to inlining. n is a positive integer whose 
				default value is 2000. 


shlW32M.lib:    
		MicroQuill SmartHeap Library available from 
		http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		Specifies the strictest alignment constraint for structure and union 
		types as 1, 2. 4. 8 or 16 bytes. Default is 16.

-Qprec-div[-]    
		Enables[disables] improved precision of floating-point divides.  Disabling may
		slightly improve speed.  Default Enabled.

Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Compiler options for PGI Fortran compiler 6.0 for Windows XP IA32
-----------------------------------------------------------------

The optimization levels and their meanings are as follows:	

-lacml  
		Link with the AMD Core Math Library 2.5.3, packaged with the
		compiler. Also available at www.amd.com

-O0	
		A basic block is generated for each Fortran statement.  No scheduling 
	
		is done between statements.  No global optimizations are performed.

-O1	
		Scheduling within extended basic blocks is performed.  Some register 
		allocation is performed.  No global optimizations are performed.

-O2	
		All level 1 optimizations are performed.  In addition,  scalar
		optimizations such as induction recognition and loop invariant motion 
		are performed by the global optimizer. 
                
-O3	
		This level performs all level-one and level-two optimizations and 
		enables more aggressive hoisting and scalar replacement optimizations.

-fast	 
		Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" 

-fastsse 
		Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Mpfi    
		Generate profile feedback instrumentation; this
		includes extra code to collect run-time statistics to
		be used in a subsequent compile; -Mpfi must also appear
		when the program is linked.  When the program is run, a
		profile feedback file pgfi.out will be generated; see
		-Mpfo.

-Mpfo    
		Enable profile feedback optimizations; there must be a
		profile feedback file pgfi.out in the current
		directory, which contains the result of an execution of
		the program compiled with -Mpfi.

-Mcache_align    
		Align unconstrained objects of length greater than or equal to 16 bytes on
		cache-line boundaries. An unconstrained object is a data object that is not
		a member of an aggregate structure or common block. This option does
		not affect the alignment of allocatable or automatic arrays.

		Note: To effect cache-line alignment of stack-based local variables, the
		main program or function must be compiled with -Mcache_align.

-Mfixed 
		Process source using Fortran90 freeform specifications.

-Mflushz 	 
		Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  
		Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
		Instructs the IPA to recognize when pointer targets are all cache-line 
		aligned, allowing better SSE code generation.

-Mipa=arg  
		Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
		Enable propagation of constants across procedure calls.

-Mipa=fast  
		Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial 

-Mipa=f90ptr
		Enable Fortran 90 pointer disambiguation across procdure calls.
              	
-Mipa=globals  
		Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=inline
		Automatically determine which functions to inline

-Mipa=safe
		Assume unknown function references are safe

-Mipa=localarg  
		Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
		Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
		Instructs the IPA to eliminate functions that are not called.

-Mlre
		Enables loop-carried redundancy elimination.
	
-Mnoframe  
		Eliminate operations that set up a true stack frame pointer for functions.

-Mnovect
		Disables the vectorizer.

-Mscalarsse   
		Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
		Extensions) and SSE2  instructions to perform the operations coded. 
		This implies -Mflushz.

-Munix   
		Use UNIX calling conventions, no trailing underscores.

-Munroll  
		Invokes the loop unroller.  This also sets the optimization level to 2 
		if the level is set to less than 2.
			
		:m	Instructs the compiler to completely unroll loops with a
			constant loop count less than or equal to m, a supplied constant.
			If this value is not supplied, the m count is set to 4.

		n:u	Instructs the compiler to unroll u times, a loop which is
			not completely unrolled, or has a non-constant loop count.
			If u is not supplied, the unroller computes the number of times a
			candidate loop is unrolled.

-Mvect=sse  
		Instructs the vectorizer to search for loops, and where possible,
		use the SSE or SSE2 and prefetch instructions
		(depending on which processor is targeted).


Compiler options for PGI C compiler 6.0 for Windows XP
------------------------------------------------------

The optimization levels and their meanings are as follows:	

-lacml  
		Link with the AMD Core Math Library 2.5.3. Available from www.amd.com

-O0	
		A basic block is generated for each C statement.  No scheduling 
		is done between statements.  No global optimizations are performed.

-O1	
		Scheduling within extended basic blocks is performed.  Some register 
		allocation is performed.  No global optimizations are performed.

-O2	
		All level 1 optimizations are performed.  In addition,  scalar
		optimizations such as induction recognition and loop invariant motion 
		are performed by the global optimizer. 
                
-O3	
		This level performs all level-one and level-two optimizations and 
		enables more aggressive hoisting and scalar replacement optimizations.

-fast	 
		Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" 

-fastsse 
		Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Mpfi    
		Generate profile feedback instrumentation; this
		includes extra code to collect run-time statistics to
		be used in a subsequent compile; -Mpfi must also appear
		when the program is linked.  When the program is run, a
		profile feedback file pgfi.out will be generated; see
		-Mpfo.

-Mpfo    
		Enable profile feedback optimizations; there must be a
		profile feedback file pgfi.out in the current
		directory, which contains the result of an execution of
		the program compiled with -Mpfi.

-Mcache_align    
		Align unconstrained objects of length greater than or equal to 16 bytes on
		cache-line boundaries. An unconstrained object is a data object that is not
		a member of an aggregate structure or common block. This option does
		not affect the alignment of allocatable or automatic arrays.

		Note: To effect cache-line alignment of stack-based local variables, the
		main program or function must be compiled with -Mcache_align.


-Mflushz 	 
		Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  
		Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
		Instructs the IPA to recognize when pointer targets are all cache-line 
		aligned, allowing better SSE code generation.

-Mipa=arg  
		Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
		Enable propagation of constants across procedure calls.

-Mipa=fast  
		Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial 

-Mipa=f90ptr
		Enable Fortran 90 pointer disambiguation across procdure calls.
              	
-Mipa=globals  
		Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=inline
		Automatically determine which functions to inline

-Mipa=safe
		Assume unknown function references are safe

-Mipa=localarg  
		Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
 		Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
		Instructs the IPA to eliminate functions that are not called.

-Mlre
		Enables loop-carried redundancy elimination.
	
-Mnoframe  
		Eliminate operations that set up a true stack frame pointer for functions.

-Mnovect
		Disables the vectorizer.

-Mscalarsse   
		Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
		Extensions) and SSE2  instructions to perform the operations coded. 
		This implies -Mflushz.

-Munix   
		Use UNIX calling conventions, no trailing underscores.

-Munroll  
		Invokes the loop unroller.  This also sets the optimization level to 2 
		if the level is set to less than 2.
			
		c:m	Instructs the compiler to completely unroll loops with a
			constant loop count less than or equal to m, a supplied constant.
			If this value is not supplied, the m count is set to 4.

		n:u	Instructs the compiler to unroll u times, a loop which is
			not completely unrolled, or has a non-constant loop count.
			If u is not supplied, the unroller computes the number of times a
			candidate loop is unrolled.

-Mvect=sse  
		Instructs the vectorizer to search for loops, and where possible,
		use the SSE or SSE2 and prefetch instructions
		(depending on which processor is targeted).


Description of compiler flags for PathScale EKOPath(TM) Compiler Suite (Fortran, C and C++ compilers)
-----------------------------------------------------------------------------------------------------

Portability Flags:

-DSPEC_CPU2000_LP64           Compile using LP64 programming model. 
-DLINUX_i386                  Linux Intel system, use "long long" as
                              64bit variable.  
-DHAS_ERRLIST                 Prog env provides specification for
                              "sys_errlist[]".
-DNDEBUG                      Defining this disables any assert macros used for
	                          debugging.
-DSPEC_CPU2000_NEED_BOOL      Use SPEC provided definition of the boolean type.
-DSPEC_CPU2000_LINUX_I386     Compile for an I386 system running Linux.
-DPSEC_CPU2000_GLIBC22        Compatibility with 2.2 & later versions of glibc
-DSYS_IS_USG                  Specifies that the operating system is
                              USG compliant. 
-DSYS_HAS_TIME_PROTO          Do not explicitly declare  time().
-DSYS_HAS_IOCTL_PROTO         Do not explicitly declare  ioctl().
-DSYS_HAS_ANSI                System is ANSI compliant.
-DSYS_HAS_CALLOC_PROTO        Do not explicitly declare  calloc().
-fixedform                    tells f90 compiler to use fixed format
                              (F77 72 column format), instead of F90 free format.  


Optimization Flags:

Some suboptions either enable or disable the feature. To enable a feature, 
either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling 
a feature, is accomplished by adding =0, =OFF, or =FALSE.  These values are
insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate 
the enabling or disabling of a feature.

-CG[:...]       
                Code Generation option group: control the optimizations 
                and transformations of the instruction-level code generator.
                
-CG:cflow=(ON|OFF)
                A value of OFF disables control flow optimization in the code 
                generation. Default is ON. 

-CG:gcm=(ON|OFF)
                Specifying OFF disables the instruction-level global code 
                motion optimization phase. The default is ON.

-CG:load_exe=n 
                Specifies the threshold for subsuming a memory load operation into 
                the operand of an arithmetic instruction.  The value of 0 turns 
                off this subsumption optimization.  The default is 1, when this 
                subsumption is performed only when the result of the load has only 
                one use.  This subsumption is not performed if the number of times 
                the result of the load is used exceeds the value n, a non-negative 
                integer.

-CG:local_fwd_sched=(ON|OFF)
                Changes the instruction scheduling algorithm to work forward 
                instead of backward for the instructions in each basic block.
                The default is OFF.
                
-CG:movnti=N
                Convert ordinary stores to non-temporal stores when writing memory
                blocks of size larger than N KB. When N is set to 0, this 
                transformation is avoided.  The default value is 120 (KB).

-CG:p2align=(ON|OFF)
                Align loop heads to 64-byte boundaries.  The  default  is
                OFF.

-CG:p2align_freq=n
                Aligns branch targets based on execution frequency.  This option
                is meaningful only under feedback-directed compilation.  The
                default value n=0 turns off the alignment optimization.  Any 
                other value specifies the frequency threshold at or above which 
                this alignment will be performed by the compiler.

-CG:prefetch=(ON|OFF) 
                Turning this OFF suppresses any generation of prefetch instructions 
                in the code generator.  This has the same effect as -LNO:prefetch=0.
                The default is ON which implies using default prefetch algorithms.

-CG:prefetchnta=(ON|OFF) 
                Prefetch when data is non-temporal at all levels of the cache
                hierarchy.  This is for data streaming situations in which the
                data will not need to be re-used soon.  The default is OFF.

-fb_create <prefix for feedback data files> 
                Used to specify that an instrumented executable program
                is to be generated. Such an executable is suitable for
                producing feedback data files with the specified prefix
                for use in feedback-directed compilation (FDO).  The commonly 
                used prefix is "fbdata".  This is OFF by default.
 
-fb_opt <prefix for feedback data files> 
                Used to specify feedback-directed compilation (FDO) by extracting
                feedback data from files with the specified prefix, which were
                previously generated using -fb_create.  The commonly used prefix
                is "fbdata".  This optimization is off by default.

-fno-exceptions
                Tells the compiler that the program does not use exception
                handling, so it can perform more aggressive optimization in
                the code.  The generation of exception handling constructs 
                is also suppressed.  Under this flag, code that uses exception
                handling cannot be guaranteed to work correctly.  Note that
                the absence of exception handling construct does not mean
                that the function can be compiled with this flag.  For
                exception handling to work preperly, the scopes
                crossed between throwing and catching an exception must
                all have been compiled with exceptions on. 

-fno-math-errno 
                Do not set ERRNO after calling math functions that are executed
                with a single instruction, e.g., sqrt.  A program that relies 
                on IEEE exceptions for math error handling may want to use this 
                flag for speed while maintaining IEEE arithmetic compatibility.
                This is implied by -Ofast.  The default is -fmath-errno.
                
-GRA:optimize_boundary=(ON|OFF)
                 Allow the Global Register Allocator to allocate the same 
                 register to different variables in the same basic-block.  
                 Default is OFF.               

-INLINE:aggressive=(ON|OFF)
                Tells the compiler to be more aggressive about inlining.  The
                default is -INLINE:aggressive=OFF.

-IPA[:...]
                IPA option group:  control the inter-procedural analyses and
                transformations performed.  Note that giving just the group name
                without any options, i.e., -IPA, will invoke the interprocedural
                analyzer.  -IPA is off by default unless -Ofast is specified.

-ipa            Same as -IPA alone.

-IPA:callee_limit=(n)
                Functions whose size exceeds this limit will never be
                automatically inlined by the compiler.  The default is n=2000.

-IPA:min_hotness=N
		Is applicable only under feedback compilation. A call site's
		invocation count must be at least N befre it can be inlined by
		IPA. The default is 10.

-IPA:ctype=(ON|OFF)    
                Turns on optimizations that speed up interfaces to the constructs 
                defined in ctype.h by assuming that the program will not be run 
                in a multi-threaded environment.  The default is OFF.
                
-IPA:field_reorder=(ON|OFF)
                Enables the re-ordering of fields in large structs based
                on their reference patterns in feedback compilation to
                minimize data cache misses. The default is OFF.

-IPA:linear=(ON|OFF)
               Controls conversion of a multi-dimensional array to a single
               dimensional (linear) array that covers the same block of memory.
               When inlining Fortran subroutines, IPA tries to map formal 
               array parameters to the shape of the actual parameter.  In the 
               case that it cannot map the parameter, it linearizes the array 
               reference. By default, IPA will not inline such callsites 
               because they may cause performance problems. The default is OFF. 

-IPA:plimit=(n)
                Inline calls to a procedure until the procedure has grown to
                size of n.  The default is 2500.
                
-IPA:pu_reorder=(0|1|2)
                Controls the phase that optimizes the layout of the program 
                units (functions) in the program.
                0 = Disables procedure reordering (default)
                1 = Reorder based on the frequency in which different 
                    procedures are invoked.
                2 = Reorder based on caller-callee relationship.

-IPA:small_pu=(n)        
                A procedure with size smaller than n is not subjected to the
                plimit restriction.The default is n=30
                  
-IPA:space=N
                Inline until a program expansion of N% is reached. For example, 
                -IPA:space=20 limits code expansion due to inlining to approx-
                imately 20%. Default is no limit.

-IPA:use_intrinsic[=(ON|OFF)]
                Enable/disable loading the intrinsic version of standard library
                functions.  The default is OFF.
                
-L<path_of_acml> -lacml
                The flags above are needed to use the PathScale compiler to link 
                with the ACML (AMD Core Math Library) library.
                ACML is available as a free download
                from http://www.developwithamd.com/acml.

-LANG:short_circuit_conditionals=(ON|OFF)
                Handles .AND. and .OR. via short-circuiting, in which the
                second operand is not evaluated if unnecessary, even if it
                contains side effects.  Applies only to Fortran.  Default is ON.

-LNO:
                option group specifies options and transformations performed
                on loop nests.  The -LNO: option group is enabled only if the -O3
                option is also specified on the compiler command line.

-LNO:blocking[=(ON|OFF)]
                Enable/disable the cache blocking transformation.  The default
                is on at -O3 or higher.
                
-LNO:fission=(0|1|2)
                This option controls loop fission. The options can be one of the  
                following:

                0 = Disables loop fission (default)

                1 = Performs normal fission as necessary

                2 = Specifies that fission be tried before fusion

                If -LNO:fission=1:fusion=1 or -LNO:fission=2:fusion=2 are spec-  
                ified, then fusion is performed.
                

-LNO:full_unroll,fu=N
                Fully unroll innermost loops with trip_count <= N inside LNO. 
                N can be any integer between 0 and 100.  The default value for N 
                is 5.  Setting this flag to 0 disables full unrolling of small 
                trip count loops inside LNO.

-LNO:full_unroll_size=N
                Fully unroll innermost loops with unrolled loop size <= N inside 
                LNO.  N can be any integer between 0 and 10000. The conditions
                implied by the full_unroll option must also be satisfied for 
                the loop to be fully unrolled. The default value for N is 1600.

-LNO:full_unroll_outer=(ON|OFF)
                Control the full unrolling of loops with known trip count that 
                do not contain a loop and are not contained in a loop.  The 
                conditions implied by both the full_unroll and the 
                full_unroll_size options must be satisfied for the loop to be 
                fully unrolled. The default is OFF.

-LNO:fusion=n
                Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive.
                The default is 1.

-LNO:opt=(0|1)
                This option controls the LNO optimization level.
                The options can be one of the following:
                     0 = Disable nearly all loop nest optimizations.
                     1 = Perform full loop nest transformations.
                         This is the default.
		
-LNO:prefetch[=(0|1|2|3)]
                Specify level of prefetching.
                     0 = Prefetch disabled.
                     1 = Prefetch is done only for arrays that are always 
                         referenced in each iteration of a loop, the default.
                     2 = Prefetch is done without the above restrictions.
                     3 = Most aggressive.

-LNO:prefetch_ahead=n
                Prefetch  n  cache line(s) ahead.  The default is 2.

-LNO:sclrze=(ON|OFF)
                Turns on/off the optimization that replaces an array by a scalar 
                variable.  The default is ON.

-LNO:simd=(0|1|2)
                This option enables or disables inner loop vectorization.

                0 = Turn off the vectorizer.

                1 = (Default) Vectorize only if the compiler can determine that
                    there is no undesirable performance impact due to sub-optimal 
                    alignment. Vectorize only if vectorization does not introduce 
                    accuracy problems with floating-point operations.

                2 = Vectorize without any constraints (most aggressive).

 
-LNO:simd_reduction=(ON|OFF)
                This controls whether reduction loops will be vectorized.
                Default is ON.

-m32            
                Generates code according to the 32-bit ABI, also known as x86 
                or IA32.
                
-m64            
                Compile for 64-bit ABI, also known as AMD64, x86_64, or IA32e. 
                This is the default.

-m3dnow         Enable use of 3DNow instructions. The default is OFF.

-mcpu=(opteron|athlon64|athlon64fx|em64t|pentium4|xeon|anyx86|auto)
                Compiler  will  optimize code for selected platform.  auto means to 
                optimize for the platform that the compiler is running on, which 
                the compiler determines by reading /proc/cpuinfo.  anyx86 means  a  
                generic 32-bit  x86 processor without SSE2 support.  The 
                default is opteron.

-msse2	        
                Enable use of SSE2 instructions.   This is the default under 
                both -m64 and -m32.

-mno-sse2	This flag is only applicable to -m32. -mno-sse2 is ignored 
		under -m64 with a warning.

-O or -O2
                Turn on extensive optimization.  The optimizations at this level are
                generally conservative, in the sense that they (1) are virtually
                always beneficial, (2) provide improvements commensurate to the
                compile time spent to achieve them, and (3) avoid changes which
                affect such things as floating point accuracy.

-O3             
                Turn on aggressive optimization.  The optimizations at this level
                are distinguished from -O2 by their aggressiveness, generally
                seeking highest-quality generated code even if it requires extensive
                compile time.  They may include optimizations which are generally
                beneficial but occasionally hurt performance.  This includes but 
                is not limited to turning on the Loop Nest Optimizer, -LNO:opt=1, 
                and setting -OPT:ro=1:IEEE_arith=2:Olimit=9000.

-Ofast          Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno."  -OPT:Ofast is
                described below.

-OPT:alias=<name>
                Specifies the pointer aliasing model to be used.  By
                specifiying one or more of the following for <name>, the
                compiler is able to make assumptions throughout the compilation:
                typed        Assume that the code adheres to the ANSI/ISO C
                             standard which states that two pointers of different
                             types cannot point to the same location in memory.
                             This is on by default when -Ofast is specified.

                restrict     Specifies that distinct pointers are assumed
                             to point to distinct, non-overlapping objects.
                             This is off by default.

                disjoint     Specifies that any two pointer expressions are
                             assumed to point to distinct, non-overlapping objects.
                             This is off by default.

-OPT:div_split=(ON|OFF)
                Enable/disable changing x/y into x*(recip(y)).  This is 
                OFF by default but is enabled by -OPT:Ofast or 
                -OPT:IEEE_arithmetic=3.

-OPT:early_intrinsics=(ON|OFF)
                When ON, this option causes calls to intrinsics to be 
                expanded to inline code early in the backend compilation.
                This may enable more vectorization opportunities if vector
                forms of the expanded operations exist.  Default is OFF.
                
-OPT:fast_complex=(ON|OFF)
                Setting fast_complex=ON enables fast calculations for values 
                declared to be of type complex.  When this is set to ON, 
                complex absolute value (norm) and complex division use fast 
                algorithms that are more likely to overflow or underflow than
                the standard algorithms.  OFF is the default. fast_complex=ON 
                is enabled if -OPT:roundoff=3 is in effect.

-OPT:fast_nint=(ON|OFF)
                This option uses a hardware feature to implement NINT and ANINT 
                (both single- and double-precision versions). Default is OFF but 
                fast_nint=ON is enabled by default if -OPT:ro=3 is in effect.

-OPT:goto=(OFF|ON)
                Disable/enable the conversion of GOTOs into higher level
                structures like FOR loops.  The default is ON for -O2 or higher.

-OPT:IEEE_arithmetic,IEEE_arith,IEEE_a=(n)
                specify level of conformance to IEEE 754 floating pointing
                roundoff/overflow behavior.  n can be one of the following:

                1   Adheres to IEEE accuracy.  This is the default when 
                    optimization levels -O0, -O1 and -O2 are in effect.

                2.  May produce inexact result not conforming to IEEE 754.
                    This is the default when -O3 is in effect.

                3.  All mathematically valid transformations are allowed.  

-OPT:IEEE_NaN_Inf=(ON|OFF)
                OFF specifies non-IEEE-754 results in operations that might 
                have IEEE 754 NaN or infinity operands; this enables many
                optimizations which would be invalid for NaN or infinity
                operands.  The default is ON.

-OPT:Ofast
                Use optimizations selected to maximize performance.  
                Although the optimizations are generally safe,
                they may affect floating point accuracy due to rearrangement
                of computations.  This effectively turns on the following
                optimizations:  -OPT:ro=2:Olimit=0:div_split=ON:alias=typed.
 
-OPT:Olimit=(n)
                Disable optimization when size of program unit is > n. When n
                is 0, program unit size is ignored and optimization process
                will not be disabled due to compile time limit.  The default is
                0 when -Ofast is specified, otherwise the default is 6000
                under -O2 and 9000 under -O3.
                
-OPT:roundoff,ro=(n)
                Specifies the level of acceptable departure from source
                language floating-point, round-off, and overflow semantics. n
                can be one of the following:

                0   Inhibits optimizations that might affect the
                    floating-point behavior.  This is the default when
                    optimization levels -O0, -O1, and -O2 are in effect.

                1   Allows simple transformations that might cause limited
                    round-off or overflow differences.  Compounding such
                    transformations could have more extensive effects.
                    This is the default level when -O3 is in effect.
    
                2   Allows more extensive transformations, such as the
                    reordering of reduction loops.  This is the default 
                    level when -Ofast is specified.

                3   Enables any mathematically valid transformation.

-OPT:transform_to_memlib=(ON|OFF)
                When ON, this option enables transformation of loop constructs 
                to calls to memcpy or memset. Default is ON when target 
                processor is EM64T, OFF otherwise.

-OPT:treeheight=(ON|OFF)
                The value ON turns on re-association in expressions to reduce 
                the expressions' tree height.  The default value is OFF.
                
-OPT:unroll_analysis=(ON|OFF)
                The default value of ON lets the compiler analyze the
                content of the loop to determine the best unrolling
                parameters, instead of strictly adhering to the
                -OPT:unroll_times_max and -OPT:unroll_size parameters.

-OPT:unroll_times_max,unroll_times=(n)
                Unroll inner loops by a maximum of  n.  The default is 4.

-OPT:unroll_size=(n)
                Sets the ceiling of maximum number of instructions for an
                unrolled inner loop. If n = 0, the ceiling is disregarded.
            
-static
                Suppresses dynamic linking at run-time for shared libraries; 
                uses static linking instead.

-TENV:X=(0|1|2|3|4)
                Specify the level of enabled exceptions that will be assumed
                for purposes of performing speculative code motion (default
                is 1 at all optimization levels).  In general, an instruction
                will not be speculated (i.e. moved above a branch by the
                optimizer) unless any exceptions it might cause are disabled
                by this option.  At level 0, no speculative code motion may
                be performed.  At level 1, safe speculative code motion may
                be performed, with IEEE-754 underflow and inexact exceptions
                disabled.  At level 2, all IEEE-754 exceptions are disabled
                except divide by zero.  At level 3, all IEEE-754 exceptions
                are disabled including divide by zero.  At level 4, memory
                exceptions may be disabled or ignored.

-TENV:frame_pointer=(ON|OFF)
                Default is ON for C++ and OFF otherwise.
                Local variables in the function stack frame are addressed via 
                the frame pointer register.  Ordinarily, the compiler will 
                replace this use of frame pointer by addressing local variables 
                via the stack pointer when it determines that the stack pointer 
                is fixed throughout the function invocation.  This frees up the 
                frame pointer for other purposes.  Turning this flag on forces 
                the compiler to use the frame pointer to address local variables.  
                This flag defaults to on for C++ because the exception handling 
                mechanism relies on the frame pointer register being used to 
                address local variables.  This flag can be turned off for C++ 
                for programs that do not throw exceptions. 

-Wl,-x
                Passes the -x option to the linker.  With this flag set, the
                linker will not preserve local (non-global) symbols in the output
                symbol table.  The linker enters external and static symbols
                only.  This option conserves space in the output file.  This is
                OFF by default.

-WOPT:aggstr=(ON|OFF) 
                ON instructs the scalar optimizer to perform aggressive strength
                reduction, in which all induction expressions within a loop are 
                replaced by temporaries that are incremented together with
                the loop variable.  When OFF, strength reduction is only
                performed for non-trivial induction expressions.  Turning this
                off sometimes can improve performance when registers are scarce.

-WOPT:if_conv=(ON|OFF)
                Enables the translation of simple IF statements to condi-
                tional  move  instructions  in the target CPU. Default is
                ON.

-WOPT:mem_opnds=(ON|OFF) 
                ON makes the scalar optimizer preserve any memory operands of 
                arithmetic operations so as to help bring about subsumption of  
                memory loads into the operands of arithmetic operations. Load 
                subsumption is the combining of an arithmetic instruction and 
                a memory load into one instruction.  The default is OFF.

-WOPT:retype_expr=(ON|OFF) 
                ON enables the optimization in the compiler that converts 64-bit
                address computation to use 32-bit arithmetic as much as
                possible.  The default is OFF.

-WOPT:unroll=(0|1|2)
                Control the unrolling of innermost loops in the scalar optimizer.
                Setting to 0 suppresses this unroller.  The default is 1, which 
                makes the scalar optimizer unroll only loops that contain IF
                statements.  Setting to 2 makes the unrolling to also apply to 
                loop bodies that are straight line code, which duplicates the
                unrolling done in the code  generator,  and  is thus unnecessary.
                The default setting of 1 makes this unrolling complementary to
                what is done in the code generator.   This unrolling is not 
                affected by the unrolling options under the -OPT group.

-WOPT:val=(0|1|2)
                Controls the number of times the value-numbering optimization is
                performed in the global optimizer, with the  default  being  1.
                This optimization tries to recognize expressions that will 
                compute identical run-time values and changes the program to avoid 
                re-computing them.


Description of the submit command MYMASK=`printf '0x%x' \$((1<<\$SPECUSERNUM))`; /usr/bin/taskset \$MYMASK $command
-------------------------------------------------------------------------------------------------------------------
/usr/bin/taskset [options] [mask] [pid | command [arg] ... ]

     taskset is used to set or retreive the CPU affinity of a running process given its
     PID or to launch a new COMMAND with a given CPU affinity. The CPU affinity is 
     represented as a bitmask, with the lowest order bit corresponding to the first logical
     CPU and highest order bit corresponding to the last logical CPU.
     When the taskset returns, it is gauranteed that the given program has been scheduled to 
     a legal CPU.
     The default behaviour of taskset is to run a new command with a given affinity mask:
       taskset [mask] [command] [arguments]

     The taskset command is used in the following form in the config file:	
 
     submit= MYMASK=`printf '0x%x' \$((1<<\$SPECUSERNUM))`; /usr/bin/taskset \$MYMASK $command  

     $MYMASK is the bitmask (in hexadecimal) corresponding to a specific SPECUSERNUM. For example, $MYMASK value for the first copy
     of a rate run will be 0x00000001, for the second copy of the rate will be 0x00000002
     etc. Thus, the first copy of the rate run will have a CPU affinity of CPU0, the second copy will have the
     affinity CPU1 etc.


Description of the submit command 'specperl -e "system sprintf qq{start /b /wait /affinity %x %s}, (1<<$SPECUSERNUM), qq{$command}"'
------------------------------------------------------------------------------------------------------------------------------------
This command is used to bind CPUs to processes

specperl -e     'script' 
		perl command to invoke one line of script

system "command"
		specperl starts "command" and waits for its termination.

sprintf "format-string" arg1 arg2
		specperl writes a string (to be used later by 'system').
		The operation of sprintf is controlled by format-string.
		%x is conversion to hexadecimal format.
		%s is conversion to string (in effect, it is just appending a string)
		The string written by sprintf is something like
			/start /b /wait /affinity 4 command_to_be_executed

qq
		qq is a double-quoted interpolated string
		It is a Perl way to say double quotes (") without using double quotes in a string
		the generated string is:
			system sprintf "start /b /wait /affinity %x %s",(1<<$SPECUSERNUM),"$command"
		where $SPECUSERNUM and $command have been (at this point) replaced by the shell.

1<<$SPECUSERNUM
		The number 1 is left-shifted by $SPECUSERNUM positions, yielding a mask where only
		bit number $SPECUSERNUM is set. (Bits are counted from right, starting with zero.)
		This mask is used for the affinity in the 'start' command.
		

Description of the 'start' command used for rate runs:
------------------------------------------------------
Starts a separate window to run a specified program or command.

START ["title"] [/D path] [/I] [/MIN] [/MAX] [/SEPARATE | /SHARED]
      [/LOW | /NORMAL | /HIGH | /REALTIME | /ABOVENORMAL | /BELOWNORMAL]
      [/AFFINITY <hex affinity>] [/WAIT] [/B] [command/program]
      [parameters]

    "title"     Title to display in  window title bar.
    path        Starting directory
    B           Start application without creating a new window. The
                application has ^C handling ignored. Unless the application
                enables ^C processing, ^Break is the only way to interrupt
                the application
    I           The new environment will be the original environment passed
                to the cmd.exe and not the current environment.
    MIN         Start window minimized
    MAX         Start window maximized
    SEPARATE    Start 16-bit Windows program in separate memory space
    SHARED      Start 16-bit Windows program in shared memory space
    LOW         Start application in the IDLE priority class
    NORMAL      Start application in the NORMAL priority class
    HIGH        Start application in the HIGH priority class
    REALTIME    Start application in the REALTIME priority class
    ABOVENORMAL Start application in the ABOVENORMAL priority class
    BELOWNORMAL Start application in the BELOWNORMAL priority class
    AFFINITY    The new application will have the specified processor
                affinity mask, expressed as a hexadecimal number.
    WAIT        Start application and wait for it to terminate
    command/program
                If it is an internal cmd command or a batch file then
                the command processor is run with the /K switch to cmd.exe.
                This means that the window will remain after the command
                has been run.

                If it is not an internal cmd command or batch file then
                it is a program and will run as either a windowed application
                or a console application.

    parameters  These are the parameters passed to the command/program


Description of Boot options (boot.ini file):
--------------------------------------------

/EXECUTE 
This option disables no-execute protection. See the /NOEXECUTE switch for more information. 

/NOEXECUTE
This option is only available on 32-bit versions of Windows when running on 
processors supporting no-execute protection. It enables no-execute protection
(also known as Data Execution Protection - DEP), which results in the Memory 
Manager marking pages containing data as no-execute so that they cannot be 
executed as code. This can be useful for preventing malicious code from exploiting 
buffer overflow bugs with unexpected program input in order to execute arbitrary 
code. No-execute protection is always enabled on 64-bit versions of Windows on 
processors that support no-execute protection. There are several options you can 
specify with this switch:
	
	/NOEXECUTE=OPTIN     Enables DEP for core system images and those specified 
			             in the DEP configuration dialog.
	/NOEXECUTE=OPTOUT    Enables DEP for all images except those specified in 
			             the DEP configuration dialog.</li>
	/NOEXECUTE=ALWAYSON  Enables DEP on all images.
	/NOEXECUTE=ALWAYSOFF Disables DEP.

/NOPAE 
Forces Ntldr to load the non-Physical Address Extension (PAE) version of the Windows kernel,
even if the system is detected as supporting x86 PAEs and has more than 4 GB of physical memory.

/USEPMTIMER
This switch forces the MP HAL to use the frequency-independent PMTimer,
and NOT use the frequency-dependent Time Stamp Counter.


Description of BIOS options:
----------------------------

DRAM Bank Interleave
	AUTO		Interleave memory blocks across DRAM chip selects. BIOS will AUTO 
				detect capability on each Node (Default: DISABLED)

Memory Timing
	1T			Enable fast memory timing (Default: 2T)