-------------------------------------------------------
Hewlett-Packard Company
SPEC CPU2000 FLAG DESCRIPTIONS

  - Portland Group International (PGI) FORTRAN COMPILERS 5.1-6

    - hp-20040621-PGI51-Windows.txt

----------------------------------------------------------------------------
Description of compiler flags for PGI Compiler 5.1 
----------------------------------------------------------------------------

The optimization levels and their meanings are as follows:	

-O0	A basic block is generated for each Fortran statement.  No scheduling 
	is done between statements.  No global optimizations are performed.

-O1	Scheduling within extended basic blocks is performed.  Some register 
	allocation is performed.  No global optimizations are performed.

-O2	All level 1 optimizations are performed.  In addition,  scalar
	optimizations such as induction recognition and loop invariant motion 
	are performed by the global optimizer. 
                
-O3	This level performs all level-one and level-two optimizations and 
	enables more aggressive hoisting and scalar replacement optimizations.



-fast	 Equivalent to "-O2 -Munroll -Mnoframe -Mlre" 

-fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 


-Mcache_align    
     Align unconstrained objects of length greater than or equal to 16 bytes on
     cache-line boundaries. An unconstrained object is a data object that is not
     a member of an aggregate structure or common block. This option does
     not affect the alignment of allocatable or automatic arrays.

     Note: To effect cache-line alignment of stack-based local variables, the
     main program or function must be compiled with -Mcache_align.

-Mfixed 
     Process source using Fortran90 freeform specifications.

-Mflushz 	 
     Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
     Instructs the IPA to recognize when pointer targets are all cache-line 
     aligned, allowing better SSE code generation.

-Mipa=arg  
     Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
     Enable propagation of constants across procedure calls.

-Mipa=fast  
     Equivalent to: -Mipa=const,globals,localarg,ptr,vestigial 
              	
-Mipa=globals  
     Instructs the IPA to optimize references to globals when not used in procedure calls.		

-Mipa=localarg  
      Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
     Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
     Instructs the IPA to eliminate functions that are not called.
	
-mp  Enable OpenMP
	
-Mnoframe  
     Eliminate operations that set up a true stack frame pointer for functions.

-Mnosmart   
     Don't run the Smart assembly re-write tool to enable post-compilation 
     linear assembly scheduling and optimization

-Mscalarsse   
     Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
     Extensions) and SSE2  instructions to perform the operations  coded. 
     This assumes the user has an assembler capable of interpreting SSE/SSE2  
     instructions, as in later versions of Linux.  This implies -Mflushz.

-Munroll  
     Invokes the loop unroller.  This also sets the optimization level to 2 
     if the level is set to less than 2.
			
      c:m	Instructs the compiler to completely unroll loops with a
	constant loop count less than or equal to m, a supplied constant.
	If this value is not supplied, the m count is set to 4.

      n:u	Instructs the compiler to unroll u times, a loop which is
	not completely unrolled, or has a non-constant loop count.
	If u is not supplied, the unroller computes the number of times a
	candidate loop is unrolled.

-Mvect=sse  
     Instructs the vectorizer to search for loops, and where possible,
     use the SSE or SSE2 and prefetch instructions
     (depending on which processor is targeted).
     

Portability options for CPU2000:
-------------------------------
176.gcc:     
   -Dalloca=_alloca       : so as to use the built-in optimized alloca
   -F10000000             : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -Mfixed                : Fixed-format F90 source code. 
   
186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executible without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executible to be as close as possible to 
                            the Unix-based one. 
   -MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.