-------------------------------------------------------
Hewlett-Packard Company
SPEC CPU2000 FLAG DESCRIPTIONS

  - Portland Group International (PGI) FORTRAN COMPILERS 6.0-4

    - hp-20050726-PGI60-Windows.txt

----------------------------------------------------------------------------
Description of compiler flags for PGI Compiler 6.0 
----------------------------------------------------------------------------

The optimization levels and their meanings are as follows:	

-O0	
     Creates a basic block for each Fortran statement. Neither scheduling nor 
     global optimization is done. 

-O1	
     Schedules within basic blocks and performs some register allocations, but
     does no global optimization.

-O2	
     Performs all level 1 optimization, and also performs global scalar optimizations
     such as induction variable elimination and loop invariant movement. 
                
-O3	
     Level-three specifies aggressive global optimizations. This level performs 
     all level-one and level two optimizations and enables more aggressive 
     hoisting and scalar replacement optimizations that may or may not be 
     profitable.

-fast	 
     Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" 

-fastsse 
     Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Munix  (Windows NT only) 
     Use UNIX argument passing and symbol naming conventions.

-Mcache_align    
     Align unconstrained data objects of size greater than or equal to 16 bytes
     on cache-line boundaries.  An unconstrained object is a variable or array 
     that is not a member of an aggregate structure or common block, is not 
     allocatable, and is not an automatic array.

     Note: To effect cache-line alignment of stack-based local variables, the
     main program or function must be compiled with -Mcache_align.

-Mfixed 
     Process Fortran source using fixed form specifications. The -Mfree options
     specify free form formatting.  By default files with a .f or .F extension 
     use fixed form formatting.


-Mflushz 	 
     Set SSE MXCSR register to flush-to-zero mode.

-M[no]ipa[=option[,option,...]] (-Mnoipa default)
     Enable and specify options for InterProcedural Analysis (IPA).  This also
     sets the optimization level to a minimum of 2; see -O.  If no option list 
     is specified, then it is equivalent to -Mipa=const.  The options are:

          [no]align (noalign default)
               Enable [disable] recognition when pointer targets are all cache-
               line aligned, allowing better SSE code generation.

          [no]arg (noarg default)
               Remove [don't remove] arguments replaced by -Mipa=ptr,const.  
               -Mipa=noarg implies -Mipa=nolocalarg.

          [no]const (const default) 
               Enable [disable] propagation of constants across procedure calls.

          [no]f90ptr (nof90ptr default)
               Enable [disable] Fortran 90 pointer disambiguation across procedure
               calls.

          fast      
               Chooses generally optimal -Mipa flags for the target platform; 
               use pgf90 -Mipa -help to see the equivalent options.

          force     
               Force all objects to recompile regardless of whether IPA information
               has changed.

          [no]globals (noglobals default)
               Analyze [don't analyze] which globals are modified by procedure 
               calls.

          inline:n  
               Determine additional functions to inline, allowing up to n levels
               of inlining.

          ipofile   
               Save IPA information in a .ipo file instead of the default of 
               appending the information to the object file.

          [no]keepobj (keepobj default) 
               Keep [don't keep] the optimized object files, using file name 
               mangling, to reduce recompile time in subsequent application builds.

          [no]libinline (nolibinline default)
               Allow [don't allow] inlining from routines in libraries; 
               -Mipa=libinline implies -Mipa=inline.

          [no]libopt (nolibopt default)
               Allow [don't allow] recompiling and reoptimizing routines from 
               libraries with IPA information.

          [no]localarg (nolocalarg default)
               Enable [disable] feature to externalize local variables to allow
               arguments to be replaced by -Mipa=ptr.  -Mipa=localarg implies
               -Mipa=arg.

          main:func 
               Specify a function to serve as a global entry point; may appear
               multiple times; disables linking.

          [no]ptr (noptr default)
               Enable [disable] pointer disambiguation across procedure calls.

          [no]pure (nopure default)
               Detect (don't detect) pure functions.

          required  
               Return an error condition if IPA is inhibited for any reason, 
               rather than the default behavior of linking without IPA optimization.

          safe:[function|library]
               Declares that the named function, or all functions in the named
               library are safe; a safe procedure does not call back into the
               known procedures and does not change any known global variables.
               Without -Mipa=safe, any unknown procedures will cause IPA to fail.

          [no]safeall (nosafeall default)
               Declares that all unknown functions are safe (not safe); see 
               -Mipa=safe.

          [no]shape (noshape default)
               Perform [don't perform] Fortran 90 shape propagation.

          summary   
               Only collect IPA summary information when compiling; this prevents
               IPA optimization of this file, but allows optimization for other
               files linked with this file.

          [no]vestigial (novestigial default)
               Remove [don't remove] functions that are not called.

-M[no]lre[=assoc|noassoc] -Mnolre
     Enable [disable] loop-carried redundancy elimination. The assoc option allows
     expression reassociation, and the noassoc option disallows expression 
     reassociation.
	
 -M[no]frame (-Mnoframe default)
     Set up [don't set up] a true stack frame pointer for functions; -Mnoframe 
     allows slightly more efficient operation when a stack frame is not needed, 
     but some options override -Mnoframe.

-Mnosmart   
     Don't run the Smart assembly re-write tool to enable post-compilation 
     linear assembly scheduling and optimization

 -M[no]scalarsse 
     Utilize [don't use] SSE (Pentium 3, 4, AthlonXP/MP, Opteron) and SSE2 
     (Pentium 4, Opteron) instructions to perform the operations coded. This
     requires the assembler to be capable of interpreting SSE/SSE2 instructions.
     The default is -Mscalarsse for Opteron in 64-bit mode, and -Mnoscalarsse 
     otherwise.


-M[no]unroll[=option[,option...]] (-Mnounroll default)
     Invoke [don't invoke] the loop unroller. This also sets the optimization 
     level to a minimum of 2; see -O.  The option is one of the following:

          c:m       Instructs the compiler to completely unroll
                    loops with a constant loop count less than or
                    equal to m, a supplied constant. If this
                    value is not supplied, the m count is set to
                    4.

          n:u       Instructs the compiler to unroll u times, a
                    loop which is not completely unrolled, or has
                    a non-constant loop count. If u is not
                    supplied, the unroller computes the number of
                    times a candidate loop is unrolled.

          -Mnounroll instructs the compiler not to unroll loops.



-Mvect[=option[,option,...]]
     Pass options to the internal vectorizer. This also sets the optimization level
     to a minimum of 2; see -O.  If no option list is specified, then the following
     vector optimizations are used:  assoc,cachesize:262144,nosse. The vect options 
     are:

          [no]altcode:n (noaltcode default)
               Generate (don't generate) alternate scalar code for vectorized loops.
               If altcode is specified without arguments, the vectorizer determines
               an appropriate cutoff length and generates scalar code to be executed
               whenever the loop count is less than or equal to that length. If 
               altcode:n is specified, the scalar altcode is executed whenever the
               loop count is less than or equal to n.

          [no]assoc (assoc default) 
               Enable (disable) certain associativity conversions that can change 
               the results of a computation due to floating point roundoff error
               differences.  A typical optimization is to change the order of
               additions, which is mathematically correct, but can be computationally
               different, due to roundoff error.

          cachesize:number (default=automatic)
               Instructs the vectorizer, when performing cache tiling optimizations,
               to assume a cache size of number.

          prefetch  
               Use prefetch instructions in loops where profitable.

          [no]sse (nosse default)
               Use (don't use) SSE, SSE2, 3Dnow, and prefetch instructions in loops
               where possible.

          -Mnovect disables the vectorizer, and is the default.

     

Portability options for CPU2000:
-------------------------------
176.gcc:     
   -Dalloca=_alloca       : so as to use the built-in optimized alloca
   -F10000000             : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -Mfixed                : Fixed-format F90 source code. 
   
186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executible without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executible to be as close as possible to 
                            the Unix-based one. 
   -MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.