########################################################################
#10/19/2006  
#Acer Incorporated.
#SPEC CPU2000 v1.3 Flag Descriptions
#Intel C/C++ Compiler 9.1 for Linux
#Intel Fortran Compiler 9.1 for Linux

########################################################################

========================================================================
General Options (C/C++/FORTRAN)
========================================================================

-fast		This option maximizes speed across the entire program.

		It sets the following options (IA-32 and EM64T):
		-O3 -ipo -no-prec-div -xP -static (Linux)

-O{1|2|3}	Optimization-level options:
		1: optimize for speed, but disable some optimizations that
		   increase code size for a small speed benefit.  Includes
		   inline expansion for intrinsic functions, global
		   optimizations, string pooling optimizations.
		2: optimizes for speed (DEFAULT).  The -O2 option includes O1 
		   optimizations and in addition enables inlining of 
		   intrinsics and more speed optimizations.
		3: builds on -01 and -02 optimizations by enabling high-level 
		   optimization. This level does not guarantee higher performance 
		   unless loop and memory access transformation take place. In 
		   conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes
		   the compiler to perform more aggressive data dependency 
		   analysis than for -O2. This may result in longer compilation 
		   times.

-Oa[-]	Assume [do not assume] no aliasing in program.


-ansi-alias[-] (Linux)
		Enable/disable use of ANSI aliasing rules in
		optimizations; user asserts that the program adheres to
		these rules.  The default for C++ is -ansi_alias-
		which is that aliasing rules are not assumed.  The default for
		the Fortran compiler is -Qansi_alias. For C++, the -ansi_alias
		flag will enable optimizations that would otherwise be
		prevented by potential aliasing.


-ipo (Linux)
		Multi-file ip optimizations that includes:
		- inline function expansion
		- interprocedural constant propogation
		- dead code elimination
		- propagation of function characteristics
		- passing arguments in registers
		- loop-invariant code motion


-[no-]prec-div (Linux)
		Improves precision of floating point divides.



-prof_gen (Linux)
		Instrument program for profiling for the first phase of 
		two-phase profile guided optimization.

-prof_use (Linux)
		Instructs the compiler to produce a profile-optimized 
		executable and merges available dynamic information (.dyn) 
		files into a pgopti.dpi file. If you perform multiple 
		executions of the instrumented program, -Qprof_use merges 
		the dynamic information files again and overwrites the 
		previous pgopti.dpi file.  Without any other options,
		the current directory is searched for .dyn files.


-rcd (Linux)
		The Intel compiler uses the -rcd option to improve the
		performance of code that requires floating-point-to-integer
		conversions. 

		The system default floating point rounding mode is
		round-to-nearest. This means that values are rounded during 
		floating point calculations. However, the C language requires 
		floating point values to be truncated when a conversion to an
		integer is involved. To do this, the compiler must change the 
		rounding mode to truncation before each floating 
		point-to-integer conversion and change it back afterwards.

		The -Qrcd option disables the change to truncation of the 
		rounding mode for all floating point calculations, including   
		floating point-to-integer conversions. Turning on this option 
		can improve performance, but floating point conversions to 
		integer may not conform to C semantics.


-unroll[n] (Linux)
		Specifies the maximum number of times to unroll a loop. Omit n 
		to let the compiler decide whether to perform unrolling or not. 
		Use n = 0 to disable unroller.


-ax<processor> (Linux)
		Directs the compiler to generate processor-specific code if there is a 
                performance benefit, while also generating generic IA-32 code.

		<processor> is the processor for which you want to target your program. 
		Possible values are: 

  		K: Code is optimized for Intel Pentium III and compatible Intel processors. 
  		W: Code is optimized for Intel Pentium 4 and compatible Intel processors. 
  		N: Code is optimized for Intel Pentium 4 and compatible Intel processors 
		   with Streaming SIMD Extensions 2. The resulting code may contain 
		   unconditional use of features that are not supported on other 
		   processors.
		   This option also enables new optimizations in addition to Intel 
		   processor-specific optimizations including advanced data layout and 
		   code restructuring optimizations to improve memory accesses for Intel 
		   processors. 
  		B: Code is optimized for Intel Pentium M and compatible Intel processors. 
		   This option also enables new optimizations in addition to Intel 
	 	   processor-specific optimizations. 
  		P: Code is optimized for Intel?Core?Duo processors, Intel?Core?Solo 
		   processors, Intel?Pentium?4 processors with Streaming SIMD 
		   Extensions 3, and compatible Intel processors with Streaming SIMD 
		   Extensions 3. The resulting code may contain unconditional use of 
		   features that are not supported on other processors. 
		   This option also enables new optimizations in addition to Intel 
		   processor-specific optimizations including advanced data layout and 
		   code restructuring optimizations to improve memory accesses for Intel 
		   processors.


-x<processor> (Linux)
		Generate specialized code for processor specified by <codes>
		while also generating generic code.

		<processor> is the processor for which you want to target your program. 
		Possible values are: 

  		K: Code is optimized for Intel Pentium III and compatible Intel processors. 
  		W: Code is optimized for Intel Pentium 4 and compatible Intel processors. 
  		N: Code is optimized for Intel Pentium 4 and compatible Intel processors 
		   with Streaming SIMD Extensions 2. The resulting code may contain 
		   unconditional use of features that are not supported on other 
		   processors.
		   This option also enables new optimizations in addition to Intel 
		   processor-specific optimizations including advanced data layout and 
		   code restructuring optimizations to improve memory accesses for Intel 
		   processors. 
  		B: Code is optimized for Intel Pentium M and compatible Intel processors. 
		   This option also enables new optimizations in addition to Intel 
	 	   processor-specific optimizations. 
  		P: Code is optimized for Intel?Core?Duo processors, Intel?Core?Solo 
		   processors, Intel?Pentium?4 processors with Streaming SIMD 
		   Extensions 3, and compatible Intel processors with Streaming SIMD 
		   Extensions 3. The resulting code may contain unconditional use of 
		   features that are not supported on other processors. 
		   This option also enables new optimizations in addition to Intel 
		   processor-specific optimizations including advanced data layout and 
		   code restructuring optimizations to improve memory accesses for Intel 
		   processors.  

		Additional Notes on <codes> N and P:
		------------------------------------
		The N and P options target your program to run on Intel Pentium 4
		and compatible Intel processors.  The resulting code might
		contain unconditional use of features that are not supported
		on other processors.  Programs, where the function main() is
		compiled with this option, will detect non compatible processors
		and generate an error message during execution. These options also 
                enable new optimizations in addition to Intel processor-specific 
                optimizations including advanced data layout and code restructuring 
                optimizations to improve memory accesses for Intel processors.
	
-Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and 
                union types as one of the following: 1, 2, 4, 8, or 16 (default)
                bytes.

	
-static (Linux)
                This option prevents linking with shared libraries. It causes 
                the executable to link all libraries statically.

-auto-ilp32 (Linux)
                This option instructs the compiler to analyze the program to 
                determine if there are 64-bit pointers which can be safely 
                shrunk into 32-bit pointers. In order for this option to be 
                effective the compiler must be able to optimize using the 
                -ipo/-Qipo option, and must be able to analyze all library/
                external calls the program makes. This option imposes the 
                following restriction on the program:

                The program cannot malloc any objects greater than 2**31 bytes 
                in size.

                If the program does not satisfy this restriction, unpredictable 
                behavior may occur.


=============================================================================
Flags Specific to C/C++
=============================================================================

 


=================================================================================
Flags Specific to FORTRAN
=================================================================================

 
-auto (Linux)
		Causes all variables to be allocated on the stack, rather than 
		in local static storage. 


-[no-]scalar-rep (Linux)
		Enables (DEFAULT) [disables] scalar replacement performed 
		during loop transformations.

==============================================================================
General Options and Libraries
==============================================================================
The starting tokens "/" and "-" are both equivalent for flags passed to the 
compiler.  For example, -QxW and /QxW are identical switches. 

+FDO		PASS1= -Qprof_gen  PASS2= -Qprof_use	
		Using feedback-directed optimization, a profile is generated 
		on the first pass of compilation and used on the second pass.

shlW32M.lib
		MicroQuill SmartHeap Library available from www.microquill.com


===============================================================================
Benchmark-Specific Portability Options
===============================================================================
	-DSPEC_CPU2000_LP64
				Compile using LP64 programming model.

176.gcc:
	-Dalloca=_alloca	So as to use the built-in optimized alloca.
	/F10000000		176.gcc uses alloca and this option tells the
				linker to pre-allocate 10MB of stack. The default
				amount of stack allocated is not enough and 176.gcc
				crashes with a run-time error

178.galgel:
	-FI (Linux)
	/FI			Fixed-format F90 source code.
	/F32000000		Same as with 176.gcc, pre-allocates a 32MB stack

186.crafty:
	-DNT_i386		Specifies that it is a Windows NT Intel processor-based 
				system which makes the compiler use "_int64" 
				as the 64-bit variable that 186.crafty needs.        

253.perlbmk:
	-DSPEC_CPU2000_NTOS
				This enables the code changes for porting to
				Windows get included.
	-DPERLDLL		On Windows, we need a perl.exe instead of a
				perl.exe and perl.dll. This pre-defines ensures
				that the changes necessary to get a single, UNIX-style
				executible without getting the indirect calls that
				can cause a 10% performance degradation. This allows
				the Windows-based executible to be as close as possible
				to the Unix-based one.
	/MT			Use the static multi-threaded library else it
				will not compile.

254.gap:
	-DSYS_HAS_CALLOC_PROTO
	-DSYS_HAS_MALLOC_PROTO
				These two pre-defines tell of the existence of
				malloc and calloc prototypes.




============================================================================================================
BIOS Setting Description
============================================================================================================
1. Hardware Prefetch: Default is enabled.
The NetBurst architecture introduced a hardware prefetch mechanism, which automatically prefetches data into the L2 cache in case of sequential data accesses with constant stride.


2. Adjacent Cache Line Prefetch: Default is enabled.
The adjacent cache line prefetch feature, when enabled, will cause the CPU to fetch 2 adjacent cachelines when updating the cache rather than just a cacheline at a time.