-------------------------------------------------------
Hewlett-Packard Company
SPEC CPU2000 FLAG DESCRIPTIONS

  - INTEL C++ AND FORTRAN COMPILERS 8.1

    - hp-20041115-IC81-Windows.txt
-------------------------------------------------------

Description of compiler flags for Intel C++ Compiler 8.1
--------------------------------------------------------
-arch:SSE2	  Enables the compiler to use SSE2 instructions.

-fast   The -fast option enhances execution speed across the entire program by 
          including the following options that can improve run-time performance: 

               -O3 (maximum speed and high-level optimizations). 
               -Qipo (enables interprocedural optimizations across files). 
               -QxP (specific optimization for Intel Pentium 4 processor with Streaming 
                    SIMD Extensions 3). The -fast option does not include -QxP when 
                    compiling on Itanium�-based systems. 

          To override one of the options set by -fast, specify that option after the /fast 
          option on the command line. To target -fast optimizations for a specific 
          processor, use one of the -Qx options. For example:

               prompt>icl -fast -QxW source_file.cpp 

          The options set by -fast may change from release to release.

-GR   Enables C++ Runtime Type Information (RTTI). 

-Gs[n] Disables stack-checking for routines with n or more bytes of local variables 
          and compiler temporaries. Default: n=4096 

-GX   Enables the full C++ Exception Handling unwind semantics. 

-Qcxx_features  Enables both -GX and -GR as described above so C++ Runtime Type Information and 
                Exception Handling are both enabled

O1    Optimizes to favor code size and code locality. Disables 
          loop unrolling. /O1 may improve performance for applications 
          with very large code size, many branches, and execution time 
          not dominated by code within loops. In most cases /O2 is 
          recommended over /O1. 

          IA-32 systems: Enables options /Og, /Oi-, /Os, /Oy, /Ob1, 
          and /Gs. Disables intrinsics inlining to reduce code size.

-O2    This is the default level of optimization.  
          Optimizes for code speed. This is the generally recommended 
          optimization level.

          IA-32 systems: Enables options /Og, /Os, /Oy, /Ob1, and /Gs..


-O3    Enables /O2 optimizations and more aggressive optimizations such 
          as loop and memory access transformation. The /O3 optimizations 
          may slow down code in some cases compared to /O2 optimizations. 
          Recommended for applications that have loops with heavy use of 
          floating-point calculations and process large data sets.

          IA-32 systems: In conjunction with /Qax{K|W|N|B|P} and /Qx{K|W|N|B|P} 
          options, this option causes the compiler to perform more aggressive data 
          dependency analysis than for /O2. This may result in longer compilation times.

-Oa[-] Assume [do not assume] no aliasing in program

-Obn  Controls the compiler's inline expansion. The amount of inline expansion 
          performed varies with the value of n as follows: 
               0: Disables inlining. 
               1: Enables (default) inlining of functions declared with the __inline 
                   keyword. Also enables inlining according to the C++ language. 
               2: Enables inlining of any function. However, the compiler decides 
                   which functions to inline. Enables interprocedural optimizations and 
                   has the same effect as /Qip. 

-Og    Enables global optimizations.

-Oi[-] Enables [disables] inline expansion of intrinsic functions. 

-Os    Enables most speed optimizations, but disable optimizations that increase 
          code size for a small speed benefit.

-Oy[-] Enables [disables] the use of the EBP register in optimizations. When you 
          disable with /Oy-, the EBP register is used as frame pointer.

-Qansi_alias [-]    Directs the compiler to assume that the program adheres 
          to the type-based aliasing rules defined in the ISO C Standard.  If your 
          program adheres to these rules, this option will allow the compiler to 
          optimize more aggressively.  If it doesn't adhere to these rules, it can 
          cause the compiler to generate incorrect code.

-Qax<codes> Generate code specialized for processor extensions 
          specified by <codes> while also generating generic IA-32 code. 
          <codes> includes one or more of the following characters:
                i  Pentium Pro and Pentium II processor instructions
              M  MMX(TM) instructions
               K  streaming SIMD extensions (implies i and M above)
              W  Pentium 4 processor with Streaming SIMD Extensions 2 
                   (implies i, M and K)
               N  Pentium 4 processor with Streaming SIMD Extensions 2 
               P  Pentium 4 processor with Streaming SIMD Extensions 3 
    
-Qx<codes>  Generate specialized code to run exclusively on processors
           supporting the extensions indicated by <codes> as 
           described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P}   The /QxN and /QxP options target your program to run on Intel 
           Pentium 4 and compatible Intel processors.  The resulting code might 
           contain unconditional use of features that are not supported 
           on other processors.  Programs, where the function main() is 
           compiled with this option, will detect non compatible processors 
           and generate an error message during execution.  This option 
           also enables new optimizations in addition to Intel processor 
           specific optimizations.

           These options also enable advanced data layout and code restructuring
           optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Qip   Enable single-file IP optimizations (within files, same as -Ob2)

-Qipo [value]    Enables interprocedural optimizations across files. The 
          optional value argument controls the maximum number of link-time 
          compilations (or number of object files) that are spawned. The default 
          for value is 1 when value is not specified. This option is not compatible 
          with the Microsoft /GL option.

          Multi-file ip optimizations that includes:
               - inline function expansion
               - interprocedural constant propogation
               - monitoring module-level static variables
               - dead code elimination
               - propagation of function characteristics
               - multifile optimization
               - passing arguments in registers
              - loop-invariant code motion

-Qoption, tool,optlist	Passes an argument list to another program in the 
          compilation sequence, such as the assembler or linker. The parameter
          'tool' can be:

               cpp      Specifies the compiler front-end preprocessor
               c           Specifies the C++ compiler
               asm      Specifies the assembler
               link       Specifies the linker
               optlist  Indicates one or more valid argument strings for the designated 
                            program. If the argument is a command-line option, you must 
                            include the hyphen. If the argument contains a space or tab character, 
                            you must enclose the entire argument in quotation characters (""). 
                            You must separate multiple arguments with commas.

          -Qoption can be used with the -Qipo flag to refine IPO. The valid options
          that can be used for this purpose are:

               -ip_args_in_regs=0        Disables the passing of arguments in registers.

               -ip_ninl_max_stats=n    Sets the valid max number of intermediate
		                 language statements for a function that is 
                                                      expanded in line. The number n is a positive
                                                      integer. The number of intermediate language
                                                      statements usually exceeds the actual number of
                                                      source language statements. The default value
                                                      for n is 230. The compiler uses a larger limit
                                                      for user inline functions. 
						
               -ip_ninl_min_stats=n    Sets the valid min number of intermediate 
                                                      language statements for a function that is 
                                                      expanded in line. The number n is a positive 
                                                      integer. The default values for ip_ninl_min_stats are: 
                                                
                                                           IA-32 compiler: ip_ninl_min_stats = 7 
 
               -ip_ninl_max_total_stats=n  Sets the maximum increase in size of a function,
                                                      measured in intermediate language statements, 
                                                      due to inlining. n is a positive integer whose default 
                                                      value is 2000. 

-Qprefetch[-]   Enables [disables] the insertion of software prefetching by the 
               compiler. Default is /Qprefetch. 

-Qprof_gen   Instrument program for profiling for the first phase of two-phase
               profile guided otimization

-Qprof_use   Instructs the compiler to produce a profile-optimized executable
               and merges available dynamic information (.dyn) files into a pgopti.dpi file. 
               If you perform multiple executions of the instrumented program, -Qprof_use 
               merges the dynamic information files again and overwrites the previous 
               pgopti.dpi file. Without any other options, the current directory is searched for 
               .dyn files

-Qrcd      The Intel compiler uses the -Qrcd option to improve the
               performance of code that requires floating-point-to-integer                        
               conversions. 

               The system default floating point rounding mode is
               round-to-nearest. This means that values are rounded during 
               floating point calculations. However, the C language requires 
               floating point values to be truncated when a conversion to an                      
               integer is involved. To do this, the compiler must change the 
               rounding mode to truncation before each floating 
               point-to-integer conversion and change it back afterwards.

               The -Qrcd option disables the change to truncation of the 
               rounding mode for all floating point calculations, including                       
               floating point-to-integer conversions. Turning on this option 
               can improve performance, but floating point conversions to 
               integer will not conform to C semantics.

-Qunroll[n]     Specifies the maximum number of times to unroll a loop. Omit n
               to let the compiler decide whether to perform unrolling or not.
               Use n = 0 to disable unroller. 
               If n is not specified, the compiler automatically chooses the 
               maximum number of times to unroll a loop.


-Zp{1|2|4|8|16}   Specifies the strictest alignment constraint for structure and 
               union types as one of the following: 1, 2, 4, 8, or 16 (default)
               bytes.

Description of compiler flags for Intel FORTRAN Compiler 8.1
-------------------------------------------------------------
-fast   The -fast option enhances execution speed across the entire program by 
          including the following options that can improve run-time performance: 

               -O3 (maximum speed and high-level optimizations). 
               -Qipo (enables interprocedural optimizations across files). 
               -QxP (specific optimization for Intel Pentium 4 processor with Streaming 
                    SIMD Extensions 3). The -fast option does not include -QxP when 
                    compiling on Itanium�-based systems. 

          To override one of the options set by -fast, specify that option after the /fast 
          option on the command line. To target -fast optimizations for a specific 
          processor, use one of the -Qx options. For example:

               prompt>icl -fast -QxW source_file.cpp 

          The options set by -fast may change from release to release.

-Gs[n] Disables stack-checking for routines with n or more bytes of local variables 
          and compiler temporaries. Default: n=4096 

-inline:speed   Enable speed optimizations (same as -Ob2 -Ot)

-[no]f77rtl   Specifies that the FORTRAN-77-specific run-time support should be
          used.

-[no]fpp   Determines whether the Fortran preprocessor is run on source files 
          prior to compilation.    

-O1    Optimizes to favor code size and code locality. Disables 
          loop unrolling. /O1 may improve performance for applications 
          with very large code size, many branches, and execution time 
          not dominated by code within loops. In most cases /O2 is 
          recommended over /O1. 

          IA-32 systems: Enables options /Og, /Oi-, /Os, /Oy, /Ob1, 
          and /Gs. Disables intrinsics inlining to reduce code size.

-O2    This is the default level of optimization.  
          Optimizes for code speed. This is the generally recommended 
          optimization level.

          IA-32 systems: Enables options /Og, /Os, /Oy, /Ob1, and /Gs..


-O3    Enables /O2 optimizations and more aggressive optimizations such 
          as loop and memory access transformation. The /O3 optimizations 
          may slow down code in some cases compared to /O2 optimizations. 
          Recommended for applications that have loops with heavy use of 
          floating-point calculations and process large data sets.

          IA-32 systems: In conjunction with /Qax{K|W|N|B|P} and /Qx{K|W|N|B|P} 
          options, this option causes the compiler to perform more aggressive data 
          dependency analysis than for /O2. This may result in longer compilation times.

-Oa[-] Assume [do not assume] no aliasing in program

-Obn  Controls the compiler's inline expansion. The amount of inline expansion 
          performed varies with the value of n as follows: 
               0: Disables inlining. 
               1: Enables (default) inlining of functions declared with the __inline 
                   keyword. Also enables inlining according to the C++ language. 
               2: Enables inlining of any function. However, the compiler decides 
                   which functions to inline. Enables interprocedural optimizations and 
                   has the same effect as /Qip. 

-Og    Enables global optimizations.

-Oi[-] Enables [disables] inline expansion of intrinsic functions. 

-Os    Enables most speed optimizations, but disable optimizations that increase 
          code size for a small speed benefit.

-Ot    Enables all speed optimizations.

-Oy[-] Enables [disables] the use of the EBP register in optimizations. When you 
          disable with /Oy-, the EBP register is used as frame pointer.

-Qansi_alias     Enables (default) or disables the compiler to assume that the
          program adheres to the ANSI Fortran type aliasablility rules. For 
          example, an object of type real cannot be accessed as an integer. 
          You should see the ANSI standard for the complete set of rules 

-Qauto   Causes all variables to be allocated on the stack, rather than in local 
          static storage. Does not affect variables that have the SAVE attribute or 
          appear in an EQUIVALENCE statement or common block; same as 
          -automatic, -auto or -4Ya. Opposite of -Qsave.

          If -recursive or -Qopenmp is specified, the default is -Qauto.

-Qax<codes> Generate code specialized for processor extensions 
          specified by <codes> while also generating generic IA-32 code. 
          <codes> includes one or more of the following characters:
                i  Pentium Pro and Pentium II processor instructions
              M  MMX(TM) instructions
               K  streaming SIMD extensions (implies i and M above)
              W  Pentium 4 processor with Streaming SIMD Extensions 2 
                   (implies i, M and K)
               N  Pentium 4 processor with Streaming SIMD Extensions 2 
               P  Pentium 4 processor with Streaming SIMD Extensions 3 
    
-Qx<codes>  Generate specialized code to run exclusively on processors
           supporting the extensions indicated by <codes> as 
           described above.

----------------------------------------------------------------------------------
Additional Notes on /QxN and /QxP:
----------------------------------------------------------------------------------
-Qx{N|P}   The /QxN and /QxP options target your program to run on Intel Pentium
          4 and compatible Intel processors.  The resulting code might 
          contain unconditional use of features that are not supported 
          on other processors.  Programs, where the function main() is 
          compiled with this option, will detect non compatible processors 
          and generate an error message during execution.  This option 
          also enables new optimizations in addition to Intel processor 
          specific optimizations.

          These options also enable advanced data layout and code restructuring
          optimizations to improve memory accesses for Intel processors.
----------------------------------------------------------------------------------

-Qip        enable single-file IP optimizations (within files, same as -Ob2)

-Qipo[n]    Enables multifile interprocedural (IP) optimizations (between files). 
          When you specify this option, the compiler performs inline function 
           expansion for calls to functions defined in separate files.

          n is an optional integer that specifies the number of object files the compiler 
          should create. Any integer greater than or equal to 0 is valid.

          If n is 0, the compiler decides whether to create one or more object files based 
          on an estimate of the size of the object file. It generates one object file for small 
          applications, and two or more object files for large applications.

          If n is greater than 0, the compiler generates n object files, unless n exceeds the 
          number of source files (m), in which case the compiler generates only m object files.

          If you do not specify n, the default is 1.

          Multi-file ip optimizations that includes:
               - inline function expansion
               - interprocedural constant propogation
               - monitoring module-level static variables
               - dead code elimination
               - propagation of function characteristics
               - multifile optimization
               - passing arguments in registers
              - loop-invariant code motion

-Qoption, tool,list	Passes an argument list to another program in the
          compilation sequence, such as the assembler or linker. The 
          parameter 'tool' can be:

               fpp    Specifies the Intel Fortran preprocessor
               f         Specifies the Fortran compiler
               asm    Specifies the assembler
               link    Specifies the linker

          -Qoption can be used with the -Qipo flag to refine IPO. The valid options
          that can be used for this purpose are:

               -ip_args_in_regs=0   Disables the passing of arguments in registers.

               -ip_ninl_max_stats=n   Sets the valid max number of intermediate
                    language statements for a function that is  expanded in line. The 
                    number n is a positive integer. The number of intermediate language
                    statements usually exceeds the actual number of  source language 
                    statements. The default value for n is 230. The compiler uses a larger 
                    limit for user inline functions. 
						
               -ip_ninl_min_stats=n    Sets the valid min number of intermediate 
                    language statements for a function that is  expanded in line. The 
                    number n is a positive integer. The default values for ip_ninl_min_stats 
                    are: 
                         IA-32 compiler: ip_ninl_min_stats = 7 
 
               -ip_ninl_max_total_stats=n   Sets the maximum increase in size of a function,
                    measured in intermediate language statements,  due to inlining. n is a 
                    positive integer whose default value is 2000. 

-Qparallel   Automatically detects loops capable of being executed safely in parallel 
          and generates multithreaded code for these loops.

-Qprec   Improves floating-point precision. Some speed impact.

-Qprefetch[-]   Enables [disables] the insertion of software prefetching by the
          compiler (requires -O3). Default is /Qprefetch. 

-Qprof_gen   Instrument program for profiling for the first phase of 
          two-phase profile guided otimization

-Qprof_use   Instructs the compiler to produce a profile-optimized 
          executable and merges available dynamic information (.dyn) 
          files into a pgopti.dpi file. If you perform multiple 
          executions of the instrumented program, -Qprof_use merges 
          the dynamic information files again and overwrites the 
          previous pgopti.dpi file.
          Without any other options, the current directory is 
          searched for .dyn files

-Qrcd   Enables fast float-to-int conversion.

-Qscalar_rep[-]   Enables [disables] scalar replacement performed during loop 
          transformations (requires -O3).

-Qunroll[n]     Specifies the maximum number of times to unroll a loop. Omit n to 
                let the compiler decide whether to perform unrolling or not. Use
                n = 0 to disable unroller. 
                If n is not specified, the compiler automatically chooses the maximum 
                number of times to unroll a loop.

Additional Libraries Used 
-------------------------

Supplied by MicroQuill:

shlW32M.lib:    MicroQuill SmartHeap Library 7.0 available from 
                http://www.microquill.com/

Other Notes: 
------------
"/" and "-" are both allowable starting tokens for flags passed to the 
compiler i.e. -QxK and /QxK are identical switches. 


Portability options for CPU2000:
-------------------------------
176.gcc:     
   -Dalloca=_alloca       : so as to use the built-in optimized alloca
   -F10000000             : 176.gcc uses alloca and this options tells
                            the linker to pre-allocate n bytes of stack. 
                            The default amount of stack allocated is not 
                            enough and  176.gcc crashes with a run-time 
                            error

178.galgel: 
   -FI                    : Fixed-format F90 source code. 
   -F32000000             : Same as with 176.gcc, pre-allocates a 32MB 
                            stack

186.crafty: 
   -DNT_i386              : Specifies that it is a Windows NT Intel 
                            processor-based system which makes the compiler 
                            use "long long" as the 64-bit variable that 
                            186.crafty needs.        

253.perlbmk: 
   -DSPEC_CPU2000_NTOS    : This enables the code changes for porting to 
                            Windows get included 
   -DPERLDLL              : On Windows, we need a perl.exe instead of a 
                            perl.exe and perl.dll. This pre-define ensures 
                            that the changes necessary to get a single, 
                            UNIX-style executible without getting the 
                            indirect calls that can cause a 10% performance 
                            degradation. This allows the Windows-based 
                            executible to be as close as possible to 
                            the Unix-based one. 
   -MT                    : Use the static multi-threaded library else 
                            it will not compile.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  
   -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence 
                            of malloc and calloc prototypes.