Description of compiler flags for Intel C++ Compiler 8.0
--------------------------------------------------------

-O2
		Optimizes for speed. The -O2 option has the same effect as specifying
		the following options: -Og, -Oi, -Ot, -Oy, -Ob1, -Gf, -Gs, and -Gy.
		This options defaults to ON.

-O3    	
		Optimizes for speed. Enables high-level optimization. This level does 
		not guarantee higher performance. Using this option may increase the
		compilation time. Impact on performance is application dependent, some
		applications may not see a performance improvement.

-Oa[-] 	
		Assume [not assume] no aliasing

-Obn      	
		Controls the compiler's inline expansion. The amount of inline
		expansion performed varies with the value of n as follows:
		0:  Disables inlining.
		1:  Enables (default) inlining of functions declared with the
		    __inline keyword. Also enables inlining according to the
                    C++ language.
		2:  Enables inlining of any function.  However, the 
		    compiler decides which functions to inline.  Enables 
		    interprocedural optimizations and has the same effect as 
		    -Qip.
		Default n=1.

-Og	
		Enables global optimizations.  Default ON.

-Ot	
		Enables all speed optimizations.  Overrides -Os

-Oi[-] 	
		Enables/disables inline expansion of intrinsic functions.  Default Enabled.

-Ow[-]	
		Assume[not assume] no aliasing within functions, but assume aliasing
		across calls.

-Oy[-]	
		Enables [disables] the use of the EBP register in optimizations. When
		you disable with -Oy-, the EBP register is used as frame pointer. 
		Default Enabled.

-Gf	
		Enables string-pooling optimization.  Default ON.

-Gs[n]	
		Disables stack-checking for routines with n or more bytes of local
		variables and compiler temporaries. Default: n=4096

-Gy	
		Packages functions to enable linker optimization.  Default ON.

-Qax{K|W|N}
		Generates specialized code for processor specific codes 
		K, W, N while also generating generic IA-32 code. 
		K  = Intel Pentium III and compatible Intel processors
		W  = Intel Pentium 4 and compatible Intel processors
		N  = Intel Pentium 4 and compatible Intel processors. These options also enable
		advanced data layout and code restructuring optimizations to improve memory
		accesses for Intel processors.
    
-Qx{K|W|N}	
		Generate specialized code to run exclusively on processors
		supporting the extensions indicated by <codes> as 
		described above.


-Qip        
		Enables single-file interprocedural optimizations within a file.  
		Same as -Ob2.

-Qipo       
		multi-file ip optimizations that includes:
		- inline function expansion
		- interprocedural constant propagation
		- monitoring module-level static variables
		- dead code elimination
		- propagation of function characteristics
		- passing arguments in registers
		- loop-invariant code motion

-Qprof_gen       
		Instruments the  program for profiling: to get the execution
		count of each basic block.

-Qprof_use       
		Enables the use of profiling dynamic feedback information
		during optimization.  Turns on -Qfnsplit.

-Qrcd            
		Enables[disables] fast conversions of floating-point to 
		integer conversions. This option does not guarantee that
		any particular rounding mode will be used.

-Qansi_alias[-]  
		-Qansi_alias directs the compiler to assume[not assume] the following: 
		    - Arrays are not accessed out of bounds. 
		    - Pointers are not cast to non-pointer types, and vice-versa. 
		    - References to objects of two different scalar types cannot alias. 
		      For example, an object of type int cannot alias with an object 
		      of type float, or an object of type float cannot alias with an 
		      object of type double. 
		If your program satisfies the above conditions, setting the -Qansi_alias 
		flag will help the compiler better optimize the program. However, if your
		program does not satisfy one of the above conditions, the -Qansi_alias
		flag may lead the compiler to generate incorrect code.
       		 

-GR[-]           
		Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]           
		Enables[disables] C++ Exception Handling.

-fast            
		Maximize speed across the entire program. Turns on -O3 and -Qipo.

-Qfp_port   	 
		round fp results at assignments & casts (some speed impact)

-Qprefetch       
		Enable prefetch insertion.  Default ON.

-Qunroll[n]	 
		Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.

-Qoption,tool,optlist 
		-Qoption passes an option specified by optlist to a tool, where
		optlist is a comma-separated list of options.

		tool		Description
		------------------------------------  
		cpp		Specifies the compiler front-end preprocessor 
		c		Specifies the C++ compiler
		asm		Specifies the assembler
		link		Specifies the linker
		oplist		Indicates one or more valid argument strings for the
				designated program. If the argument is a command-line
				option, you must include the hyphen. If the argument
				contains a space or tab character, you must enclose the
				entire argument in quotation characters (""). You must
				separate multiple arguments with commas
		      

		-Qoption can be used with the -Qipo flag to refine IPO. The valid options
		that can be used for this purpose are:

		-ip_args_in_regs=0		
				Disables the passing of arguments in registers.
		
		-ip_ninl_max_stats=n		
				Sets the valid max number of intermediate
				language statements for a function that is 
				expanded in line. The number n is a positive
				integer. The number of intermediate language
				statements usually exceeds the actual number of
				source language statements. The default value
				for n is 230. The compiler uses a larger limit
				for user inline functions. 
						
		-ip_ninl_min_stats=n      
				Sets the valid min number of intermediate 
				language statements for a function that is 
				expanded in line. The number n is a positive 
				integer. The default values for 
				ip_ninl_min_stats are: 
				IA-32 compiler: ip_ninl_min_stats = 7 
 
		-ip_ninl_max_total_stats=n 
				Sets the maximum increase in size of a function,
				measured in intermediate language statements, 
				due to inlining. n is a positive integer whose 
				default value is 2000. 

		      


shlW32M.lib:    
		MicroQuill SmartHeap Library 7.0 available from 
		http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		Specifies the strictest alignment constraint for structure and union 
		types as 1, 2. 4. 8 or 16 bytes. Default is 16.


-arch:SSE        
		Enables the compiler to use SSE instructions.

-arch:SSE2
		Enables the compiler to use SSE2 instructions.

-EHc             
		Specifies that C functions do not throw exceptions. Default ON.

-G7              
		Target optimization to Intel Pentium 4 processors.  Default ON.

-ML              
		Compiles and links with the static, single-thread C run time library.  Default ON.

-QA              
		Enables all predefined macros and all assertions.  Default ON.

-Qfnsplit        
		Enables function splitting.  Default ON.

-Qms1            
		Instructs the compiler to enable most Microsoft compatability bugs.  Default ON.

-Qmspp           
		Enables Microsoft C++ 6.0 Processor Pack binary compatability.  Default ON.

-Qpc64           
		Enables floating-point significand precision control.  The value is used to round
		the significand to the correct number of bits.  The value must be either 32, 64, 
		or 80.  Default ON.

-Qpchi           
		Enables precompiled header files coexistence to reduce build time.  Default ON.

-Qsfalign8	 
		May align stack for functions with 8 or 16 byte vars.  Default ON.

-Qvc7            
		Enables compatability with Visual C++ .NET.  Default ON.

-Qvec_report1    
		Indicate vectorized loops in diagnostic information.  Default ON.

-vmb             
		Selects the smallest representation for pointers to members. Use this
		option if you define each class before you declare a pointer to a member of the class.
		Default ON.




Description of compiler flags for Intel C++ Compiler 9.0
--------------------------------------------------------

-O2	
		Optimizes for speed. The -O2 option includes the following options: 
		-Og, -Oi-, -Os, -Oy, -Ob1, and -Gs  This options defaults to ON.
		This option also enables.
		* inlining of intrinsics
		* Intra-file interprocedural optimizations including:
		  * inlining
		  * constant propagation
		  * forward substitution
		  * routine attribute propagation
		  * variable address-taken analysis
		  * dead static function elimination
		  * removal of unreferenced variables.
		* The following performance optimizations:
		  * copy propogation.
		  * dead-code elimination
		  * global register allocation
		  * global instruction scheduling and control speculation
		  * loop unrolliing
		  * optimized code selection
		  * partial redundancy elimination
		  * strength reduction/induction variable simplification
		  * variable renaming
		  * exception handling optimizations
		  * tail recursions
		  * peephole optimizations
		  * structure assignment lowering and optimizations
		  * dead store elimination

-O3    	
		Optimizes for speed. Enables high-level optimization. This level does 
		not guarantee higher performance. Using this option may increase the
		compilation time. Impact on performance is application dependent, some
		applications may not see a performance improvement.  The optimizations
		include:
		* All optimizations done with -O2
		* loop unrolling, including instruction scheduling
		* code replication to eliminate branches
		* padding the size of certain power-of-two arrays to allow more efficient
		  cache use.
		* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
		  data dependency analysis than for -O2.

-Oa[-] 	
		Assume [not assume] no aliasing.  Default Disabled.

-Obn      	
		Controls the compiler's inline expansion. The amount of inline
		expansion performed varies with the value of n as follows:
		0:  Disables inlining.  Statement functions are always inlined.
		1:  Enables (default) inlining of functions declared with the
		    __inline keyword. Also enables inlining according to the
		    C++ language.
		2:  Enables inlining of any function.  However, the 
		    compiler decides which functions to inline.  Enables 
		    interprocedural optimizations and has the same effect as 
		    -Qip.
		Default n=2.

-Og	
		Enables global optimizations.  Default ON.

-Ot	
		Enables all speed optimizations.  Overrides -Os

-Oi[-] 	
		Enables/disables inline expansion of intrinsic functions.  Default Enabled.

-Ow[-]	
		Assume[not assume] no cross function aliasing.

-Oy[-]	
		Enables [disables] the use of the EBP register in optimizations. When
		you disable with -Oy-, the EBP register is used as frame pointer.  -Oy has
		the effect of reducing the number of general-purpose registers by 1, and can
		produce slightly less efficient code.
		Default Enabled.

-Gf	
		Enables string-pooling optimization.

-Gs[n]	
		Disables stack-checking for routines with n or more bytes of local
		variables and compiler temporaries. Default: n=4096

-Gy	
		Packages functions to enable linker optimization.  Default ON.

-Qax{K|W|N}	
		Generates specialized code for processor specific codes 
		K, W, N while also generating generic IA-32 code. 
		K  = Intel Pentium III and compatible Intel processors
		W  = Intel Pentium 4 and compatible Intel processors
		N  = Intel Pentium 4 and compatible Intel processors. These options also enable
		     advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
    
-Qx{K|W|N}	
		Generate specialized code to run exclusively on processors
		supporting the extensions indicated by <codes> as 
		described above.


-Qip        
		Enables single-file interprocedural optimizations within a file.  

-Qipo       
		Enables multi-file ip optimizations which allows inline function expansion for
		calls to functions defined in separate files.  The compiler decides whether to create
		one or more object files based on an estimate of the size of the application.  It
		generates one object file for small applications and two for large ones.

-Qprof_gen       
		Instruments the  program for profiling: to get the execution
		count of each basic block.

-Qprof_use       
		Enables the use of profiling dynamic feedback information
		during optimization.  Turns on -Qfnsplit.  Forces function grouping.

-Qrcd            
		Enables[disables] fast conversions of floating-point to 
		integer conversions. This option does not guarantee that
		any particular rounding mode will be used.

-Qansi_alias[-]  
		-Qansi_alias directs the compiler to assume the following: 
		    - Arrays are not accessed out of bounds. 
		    - Pointers are not cast to non-pointer types, and vice-versa. 
		    - References to objects of two different scalar types cannot alias. 
		      For example, an object of type int cannot alias with an object 
		      of type float, or an object of type float cannot alias with an 
		      object of type double. 
		If your program satisfies the above conditions, setting the -Qansi_alias 
		flag will help the compiler better optimize the program. However, if your
		program does not satisfy one of the above conditions, the -Qansi_alias
		flag may lead the compiler to generate incorrect code.
       		 

-GR[-]           
		Enables[disables] C++ Run Time Type Information (RTTI).

-GX[-]           
		Enables[disables] C++ Exception Handling.  Default Disabled.

-fast            
		Maximize speed across the entire program. Turns on -O3, -Qipo,
		-Qprec-div-,  and -QxP.

-Qfp_port   	 
		round fp results at assignments & casts (some speed impact)

-Qprefetch       
		Enable prefetch insertion.  Default ON.

-Qunroll[n]	 
		Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.  Default: the compiler uses default heuristics when 
		unrolling loops.

-Qoption,tool,optlist 
		-Qoption passes an option specified by optlist to a tool, where
		optlist is a comma-separated list of options.

		tool		Description
		------------------------------------  
		cpp		Specifies the compiler front-end preprocessor 
		c		Specifies the C++ compiler
		asm		Specifies the assembler
		link		Specifies the linker
		oplist		Indicates one or more valid argument strings for the
				designated program. If the argument is a command-line
				option, you must include the hyphen. If the argument
				contains a space or tab character, you must enclose the
				entire argument in quotation characters (""). You must
				separate multiple arguments with commas

                NOTE: If 'tool' is incorrectly specified, the compiler gives an
		warning and the option is ignored. For example, if
		-Qoption,f,...  is used with the Intel C++ compiler, the
		option is ignored with an warning.		      
		      
		-Qoption can be used with the -Qipo flag to refine IPO. The valid options
		that can be used for this purpose are:

		-ip_args_in_regs=0        
				Disables the passing of arguments in registers.

		-ip_ninl_max_stats=n      
				Sets the valid max number of intermediate
				language statements for a function that is 
				expanded in line. The number n is a positive
				integer. The number of intermediate language
				statements usually exceeds the actual number of
				source language statements. The default value
				for n is 230. The compiler uses a larger limit
				for user inline functions. 
						
                      -ip_ninl_min_stats=n      
				Sets the valid min number of intermediate 
				language statements for a function that is 
				expanded in line. The number n is a positive 
				integer. The default values for 
				ip_ninl_min_stats are: 
				IA-32 compiler: ip_ninl_min_stats = 7 
 
                      -ip_ninl_max_total_stats=n 
				Sets the maximum increase in size of a function,
				measured in intermediate language statements, 
				due to inlining. n is a positive integer whose 
				default value is 2000. 

		      


shlW32M.lib:    
		MicroQuill SmartHeap Library 7.0 available from 
		http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		Specifies the strictest alignment constraint for structure and union 
		types as 1, 2. 4. 8 or 16 bytes. Default is 16.


-arch:SSE        
		Enables the compiler to use SSE instructions.

-arch:SSE2       
		Enables the compiler to use SSE2 instructions.

-Qprec-div[-]    
		Enables[disables] improved precision of floating-point divides.  Disabling may
		slightly improve speed.  Default Enabled.

-Qpc64           
		Enables floating-point significand precision control.  The value is used to round
		the significand to the correct number of bits.  The value must be either 32, 64, 
		or 80.  Default ON.






Description of compiler flags for Intel Fortran Compiler 9.0
------------------------------------------------------------

-O2	
		Optimizes for speed. The -O2 option includes the following options: 
		-Og, Ot, -Oy, -Ob1, and -Gs  This options defaults to ON.
		This option also enables.
		* inlining of intrinsics
		* Intra-file interprocedural optimizations including:
		  * inlining
		  * constant propagation
		  * forward substitution
		  * routine attribute propagation
		  * variable address-taken analysis
		  * dead static function elimination
		  * removal of unreferenced variables.
		* The following performance optimizations:
		  * copy propogation.
		  * dead-code elimination
		  * global register allocation
		  * global instruction scheduling and control speculation
		  * loop unrolliing
		  * optimized code selection
		  * partial redundancy elimination
		  * strength reduction/induction variable simplification
		  * variable renaming
 		  * exception handling optimizations
		  * tail recursions
		  * peephole optimizations
		  * structure assignment lowering and optimizations
		  * dead store elimination

-O3    	
		Optimizes for speed. Enables high-level optimization. This level does 
		not guarantee higher performance. Using this option may increase the
		compilation time. Impact on performance is application dependent, some
		applications may not see a performance improvement.  The optimizations
		include:
		* All optimizations done with -O2
		* loop unrolling, including instruction scheduling
		* code replication to eliminate branches
		* padding the size of certain power-of-two arrays to allow more efficient
		  cache use.
		* When used with -Qax or -Qx, it causes the compiler to perform more aggressive
		  data dependency analysis than for -O2.

-Oa[-] 	
		Assume [not assume] no aliasing

-Ob{0|1|2}	
		Controls the compiler's inline expansion. The amount of inline
		expansion performed varies as follows:
		-Ob0:  Disable inlining.
		-Ob1:  Disables (default) inlining unless -Qip or -Ob2 is
		       specified. Enables inlining of functions.
		-Ob2:  Enables inlining of any function.  However, the 
		       compiler decides which functions to inline.  Enables 
		       interprocedural optimizations and has the same effect as 
		       -Qip.

-Og	
		Enables global optimizations.

-Ot	
		Enables all speed optimizations.

-Oi[-] 	
		Enables/disables inline expansion of intrinsic functions

-Ow[-]	
		Assume[not assume] no cross-function aliasing.

-Ox   	
		Same as the -O2 option: enables -Gs, and -Ob1, -Og, -Oy, and -Ot.

-Oy[-]	
		Enables [disables] the use of the EBP register in optimizations. When
		you disable with -Oy-, the EBP register is used as frame pointer.

-auto   
		Determines whether local variables are put on the run-time stack.

-Gf	
		Enables string-pooling optimization.

-Gs[n]	
		Disables stack-checking for routines with n or more bytes of local
		variables and compiler temporaries. Default: n=4096

-Gy	
		Packages functions to enable linker optimization.

-fast   
		Maximize speed across the entire program. Turns on -O3, -Qprec-div-, -QxP, and -Qipo.

-Qax{K|W|N|P} 
		Generates specialized code for processor specific codes 
		K, W, N, P while also generating generic IA-32 code. 
		K  = Intel Pentium III and compatible Intel processors
		W  = Intel Pentium 4 and compatible Intel processors
		N  = Intel Pentium 4 and compatible Intel processors. These option also enable
		     advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
		P  = Intel Pentium 4 processor with Streaming SIMD 3 (SSE3) support. These option 
		     also enable advanced data layout and code restructuring optimizations to improve memory
		     accesses for Intel processors.
    
-Qx{K|W|N|P} 
		Generate specialized code to run exclusively on processors
		supporting the extensions indicated by <codes> as 
		described above.


-Qip        
		Enables single-file interprocedural optimizations within a file.

-Qipo      
		multi-file ip optimizations that includes:
		- inline function expansion
		- interprocedural constant propagation
		- monitoring module-level static variables
		- dead code elimination
		- propagation of function characteristics
		- passing arguments in registers
		- loop-invariant code motion

-Qprof_gen       
		Instruments the  program for profiling: to get the execution
		count of each basic block.

-Qprof_use       
		Enables the use of profiling dynamic feedback information
		during optimization.

-Qrcd            
		Enables[disables] fast conversions of floating-point to 
		integer conversions. This option does not guarantee that
		any particular rounding mode will be used.

-Qansi_alias     
		Enables (default) or disables the compiler to assume that the program 
		adheres to the ANSI Fortran type aliasablility rules. For example, an object 
		of type real cannot be accessed as an integer. You should see the ANSI 
		Standard for the complete set of rules.


-Qscalar_rep[-]	 
		Enables[disables] scalar replacement performed during loop
		transformations. (requires /O3).

-Qunroll[n]	
		Specifies the maximum number of times to unroll a loop. n=0 disables
		loop unrolling.

-Qprefetch[-]    
		Enables[disables] prefetch insertion (requires -O3).

-Qoption,tool,optlist 
		-Qoption passes an option specified by optlist to a tool, where
		optlist is a comma-separated list of options.

		tool		Description
		------------------------------------  
		fpp		Specifies the Fortran preprocessor 
		f		Specifies the Fortran compiler
		asm		Specifies the assembler
		link		Specifies the linker
		oplist		Indicates one or more valid argument strings for the
				designated tool. You must separate multiple arguments with commas.
		      
		-Qoption can be used with the -Qipo flag to refine IPO. The valid option
		list that can be used for this purpose are

		-ip_args_in_regs=0        
				Disables the passing of arguments in registers.

		-ip_ninl_max_stats=n      
				Sets the valid max number of intermediate
				language statements for a function that is 
				expanded in line. The number n is a positive
				integer. The number of intermediate language
				statements usually exceeds the actual number of
				source language statements. The default value
				for n is 230. The compiler uses a larger limit
				for user inline functions. 
						
		-ip_ninl_min_stats=n      
				Sets the valid min number of intermediate 
				language statements for a function that is 
				expanded in line. The number n is a positive 
				integer. The default values for 
				ip_ninl_min_stats are: 
				IA-32 compiler: ip_ninl_min_stats = 7 
 
		-ip_ninl_max_total_stats=n Sets 
				the maximum increase in size of a function,
				measured in intermediate language statements, 
				due to inlining. n is a positive integer whose 
				default value is 2000. 


shlW32M.lib:    
		MicroQuill SmartHeap Library 7.0 available from 
		http://www.microquill.com/

-Zp{1|2|4|8|16}	 
		Specifies the strictest alignment constraint for structure and union 
		types as 1, 2. 4. 8 or 16 bytes. Default is 16.

-Qprec-div[-]    
		Enables[disables] improved precision of floating-point divides.  Disabling may
		slightly improve speed.  Default Enabled.



Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Compiler options for PGI Fortran compiler 6.0 for Windows XP IA32
-----------------------------------------------------------------

The optimization levels and their meanings are as follows:	

-lacml  
		Link with the AMD Core Math Library 2.5.3, packaged with the
		compiler. Also available at www.amd.com

-O0	
		A basic block is generated for each Fortran statement.  No scheduling 
	
		is done between statements.  No global optimizations are performed.

-O1	
		Scheduling within extended basic blocks is performed.  Some register 
		allocation is performed.  No global optimizations are performed.

-O2	
		All level 1 optimizations are performed.  In addition,  scalar
		optimizations such as induction recognition and loop invariant motion 
		are performed by the global optimizer. 
                
-O3	
		This level performs all level-one and level-two optimizations and 
		enables more aggressive hoisting and scalar replacement optimizations.

-fast	 
		Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" 

-fastsse 
		Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Mpfi    
		Generate profile feedback instrumentation; this
		includes extra code to collect run-time statistics to
		be used in a subsequent compile; -Mpfi must also appear
		when the program is linked.  When the program is run, a
		profile feedback file pgfi.out will be generated; see
		-Mpfo.

-Mpfo    
		Enable profile feedback optimizations; there must be a
		profile feedback file pgfi.out in the current
		directory, which contains the result of an execution of
		the program compiled with -Mpfi.

-Mcache_align    
		Align unconstrained objects of length greater than or equal to 16 bytes on
		cache-line boundaries. An unconstrained object is a data object that is not
		a member of an aggregate structure or common block. This option does
		not affect the alignment of allocatable or automatic arrays.

		Note: To effect cache-line alignment of stack-based local variables, the
		main program or function must be compiled with -Mcache_align.

-Mfixed 
		Process source using Fortran90 freeform specifications.

-Mflushz 	 
		Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  
		Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
		Instructs the IPA to recognize when pointer targets are all cache-line 
		aligned, allowing better SSE code generation.

-Mipa=arg  
		Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
		Enable propagation of constants across procedure calls.

-Mipa=fast  
		Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial 

-Mipa=f90ptr
		Enable Fortran 90 pointer disambiguation across procdure calls.
              	
-Mipa=globals  
		Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=inline
		Automatically determine which functions to inline

-Mipa=safe
		Assume unknown function references are safe

-Mipa=localarg  
		Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
		Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
		Instructs the IPA to eliminate functions that are not called.

-Mlre
		Enables loop-carried redundancy elimination.
	
-Mnoframe  
		Eliminate operations that set up a true stack frame pointer for functions.

-Mnovect
		Disables the vectorizer.

-Mscalarsse   
		Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
		Extensions) and SSE2  instructions to perform the operations coded. 
		This implies -Mflushz.

-Munix   
		Use UNIX calling conventions, no trailing underscores.

-Munroll  
		Invokes the loop unroller.  This also sets the optimization level to 2 
		if the level is set to less than 2.
			
		:m	Instructs the compiler to completely unroll loops with a
			constant loop count less than or equal to m, a supplied constant.
			If this value is not supplied, the m count is set to 4.

		n:u	Instructs the compiler to unroll u times, a loop which is
			not completely unrolled, or has a non-constant loop count.
			If u is not supplied, the unroller computes the number of times a
			candidate loop is unrolled.

-Mvect=sse  
		Instructs the vectorizer to search for loops, and where possible,
		use the SSE or SSE2 and prefetch instructions
		(depending on which processor is targeted).


Compiler options for PGI C compiler 6.0 for Windows XP
------------------------------------------------------

The optimization levels and their meanings are as follows:	

-lacml  
		Link with the AMD Core Math Library 2.5.3. Available from www.amd.com

-O0	
		A basic block is generated for each C statement.  No scheduling 
		is done between statements.  No global optimizations are performed.

-O1	
		Scheduling within extended basic blocks is performed.  Some register 
		allocation is performed.  No global optimizations are performed.

-O2	
		All level 1 optimizations are performed.  In addition,  scalar
		optimizations such as induction recognition and loop invariant motion 
		are performed by the global optimizer. 
                
-O3	
		This level performs all level-one and level-two optimizations and 
		enables more aggressive hoisting and scalar replacement optimizations.

-fast	 
		Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" 

-fastsse 
		Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Mpfi    
		Generate profile feedback instrumentation; this
		includes extra code to collect run-time statistics to
		be used in a subsequent compile; -Mpfi must also appear
		when the program is linked.  When the program is run, a
		profile feedback file pgfi.out will be generated; see
		-Mpfo.

-Mpfo    
		Enable profile feedback optimizations; there must be a
		profile feedback file pgfi.out in the current
		directory, which contains the result of an execution of
		the program compiled with -Mpfi.

-Mcache_align    
		Align unconstrained objects of length greater than or equal to 16 bytes on
		cache-line boundaries. An unconstrained object is a data object that is not
		a member of an aggregate structure or common block. This option does
		not affect the alignment of allocatable or automatic arrays.

		Note: To effect cache-line alignment of stack-based local variables, the
		main program or function must be compiled with -Mcache_align.


-Mflushz 	 
		Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  
		Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
		Instructs the IPA to recognize when pointer targets are all cache-line 
		aligned, allowing better SSE code generation.

-Mipa=arg  
		Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
		Enable propagation of constants across procedure calls.

-Mipa=fast  
		Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial 

-Mipa=f90ptr
		Enable Fortran 90 pointer disambiguation across procdure calls.
              	
-Mipa=globals  
		Instructs the IPA to optimize references to globals when not used in procedure calls.

-Mipa=inline
		Automatically determine which functions to inline

-Mipa=safe
		Assume unknown function references are safe

-Mipa=localarg  
		Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
 		Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
		Instructs the IPA to eliminate functions that are not called.

-Mlre
		Enables loop-carried redundancy elimination.
	
-Mnoframe  
		Eliminate operations that set up a true stack frame pointer for functions.

-Mnovect
		Disables the vectorizer.

-Mscalarsse   
		Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
		Extensions) and SSE2  instructions to perform the operations coded. 
		This implies -Mflushz.

-Munix   
		Use UNIX calling conventions, no trailing underscores.

-Munroll  
		Invokes the loop unroller.  This also sets the optimization level to 2 
		if the level is set to less than 2.
			
		c:m	Instructs the compiler to completely unroll loops with a
			constant loop count less than or equal to m, a supplied constant.
			If this value is not supplied, the m count is set to 4.

		n:u	Instructs the compiler to unroll u times, a loop which is
			not completely unrolled, or has a non-constant loop count.
			If u is not supplied, the unroller computes the number of times a
			candidate loop is unrolled.

-Mvect=sse  
		Instructs the vectorizer to search for loops, and where possible,
		use the SSE or SSE2 and prefetch instructions
		(depending on which processor is targeted).


Description of the 'start' command used for rate runs:
------------------------------------------------------
Starts a separate window to run a specified program or command.

START ["title"] [/D path] [/I] [/MIN] [/MAX] [/SEPARATE | /SHARED]
      [/LOW | /NORMAL | /HIGH | /REALTIME | /ABOVENORMAL | /BELOWNORMAL]
      [/AFFINITY <hex affinity>] [/WAIT] [/B] [command/program]
      [parameters]

    "title"     Title to display in  window title bar.
    path        Starting directory
    B           Start application without creating a new window. The
                application has ^C handling ignored. Unless the application
                enables ^C processing, ^Break is the only way to interrupt
                the application
    I           The new environment will be the original environment passed
                to the cmd.exe and not the current environment.
    MIN         Start window minimized
    MAX         Start window maximized
    SEPARATE    Start 16-bit Windows program in separate memory space
    SHARED      Start 16-bit Windows program in shared memory space
    LOW         Start application in the IDLE priority class
    NORMAL      Start application in the NORMAL priority class
    HIGH        Start application in the HIGH priority class
    REALTIME    Start application in the REALTIME priority class
    ABOVENORMAL Start application in the ABOVENORMAL priority class
    BELOWNORMAL Start application in the BELOWNORMAL priority class
    AFFINITY    The new application will have the specified processor
                affinity mask, expressed as a hexadecimal number.
    WAIT        Start application and wait for it to terminate
    command/program
                If it is an internal cmd command or a batch file then
                the command processor is run with the /K switch to cmd.exe.
                This means that the window will remain after the command
                has been run.

                If it is not an internal cmd command or batch file then
                it is a program and will run as either a windowed application
                or a console application.

    parameters  These are the parameters passed to the command/program





Portability options for CPU2000:
-------------------------------
176.gcc:     
         -Dalloca=_alloca : so as to use the built-in optimized alloca
         /Fn              : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -Mfixed                : Assume free-format source


186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included. 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executable without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executable to be as close as possible to 
                            the Unix-based one.
   /MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.