===================================================
HP-UX Flag Descriptions for CPU2000 - June 2004
===================================================


-----------------------------------------------------------------
Common Flags for HP-UX F90 Compiler, C Compiler and aCC Compiler
   Compiler specific flags are mentioned below or in other notes
-----------------------------------------------------------------

+Olevel        Invoke optimizations selected by level.  Defined
               values for level are:

                    0    Perform minimal optimizations.
                    1    Perform optimizations within basic
                         blocks only.  This is the default.
                    2    Perform level 1 and global
                         optimizations.  Same as -O.
                    3    Perform level 2 as well as
                         interprocedural global optimizations
                         within translation units.
                    4    Perform level 3 as well as doing
                         interprocedural optimizations across
                         translation units (link time
                         optimizations).  Requires concurrent use
                         of the +Oprofile=use option.
                    NOTE: Object files generated at this
                    level contain an intermediate
                    representation of the user code and are
                    intended to be temporary files.  These
                    intermediate object files are not
                    guaranteed to be compatible from one
                    version of the compiler to the next.


+O[no]datalayout
               Enables [disables] profile-driven layout of global
               and static data items to improve cache memory
               utilization. This option is currently ignored in
               the absence of the dynamic profile feedback
               option, +Oprofile=use.  The default is
               +Onodatalayout.


+O[no]dataprefetch
               Enable [disable] optimizations to generate data
               prefetch instructions for data structures
               referenced within innermost loops.
               +Odataprefetch is the same as
               +Odataprefetch=indirect.
               +Onodataprefetch is the same as
               +Odataprefetch=none.


+Oentrysched   Perform instruction scheduling on
               a subprogram's entry and exit code sequences.
               This option can be used at optimization level 1
               and higher.  The default is +Onoentrysched.


+Ofast         This option selects a combination of compilation
               and link options for optimum execution speed and
               reasonable build times.  Currently: +O2,
               +Onolimit, +Olibcalls, +Ofltacc=relaxed,
               +DSnative, +FPD, -Wl,+pi,4M, -Wl,+pd,4M and
               -Wl,+mergeseg.  This option is a synonym for
               -fast.  Some of the linker settings above can be
               changed with chatr(1).


+Ofaster       This option selects +Ofast, but with an increased
               optimization level.  For f90 the level is +O4.  
               For aCC, if used with +Oprofile=use the optimization 
               level will be +O4. Otherwise it will be +O3.


+Ofltacc=level Controls the level of floating point optimizations
               that the compiler may perform.  The defined values
               for level are:

                    default
                         Allows contractions, such as fused
                         multiply-add (FMA), but disallows any
                         other floating point optimization that
                         can result in numerical differences.

                    limited
                         Like default, but also allows floating
                         point optimizations which may affect the
                         generation and propagation of
                         infinities, NaNs, and the sign of zero.

                    relaxed
                         In addition to the optimizations allowed
                         by limited, permits optimizations, such
                         as reordering of expressions, even if
                         parenthesized, that may affect rounding
                         error.  This is the same as +Onofltacc.

                    strict
                         Disallows any floating point
                         optimization that can result in
                         numerical differences.  This is the same
                         as +Ofltacc.

               The default is +Ofltacc=default.


+Olibcalls     NOTE: This option is deprecated and may not be
               supported in future releases.  On Itanium(R)-based HP-
               UX, including a system header file will cause the
               functions declared therein to be eligible for libcalls
               transformations, regardless of the state of
               +O[no]libcalls.


+O[no]initcheck
               Enable [disable] initialization to zero of any
               local, scalar, non-static variable that is
               uninitialized with respect to at least one path
               leading to its use.  This optimization can occur
               at optimization levels 2, 3, and 4.  The default
               is to enable initialization if the variable is
               uninitialized with respect to every path leading
               to its use.


+O[no]inline   Request [disable] inlining and cloning.  This option can be
               used at optimization level 3 and higher.  The
               default is +Oinline.


+O[no]inline=function1[,function2...]
               Enable [disable] optimizer inlining for the named
               functions.  This optimization can occur at
               optimization levels 3 and 4.  The default is +Oinline.
			

+Oinlinebudget=n  aCC(1)/cc(1)  
+Oinline_budget=n f90(1)
               Perform more aggressive inlining, where n
               specifies the degree of aggressiveness, as
               follows:

                    100    Default level of inlining.

                    > 100  More aggressive inlining at the
                           expense of compilation time and code
                           size.  The maximum for n is 1000000.

                    2 - 99 Less aggressive inlining.  The
                           optimizer gives more weight to
                           compilation time and code size when
                           determining whether to inline.

                    1      Inline only if it reduces code size.

               This option can be used at optimization level 3 or
               higher.


+O[no]limit    Suppress [do not suppress] optimizations that
               significantly increase compile-time or consume
               enormous amounts of memory.
               +Olimit is the same as +Olimit=min.
               +Onolimit is the same as +Olimit=none.


+Olimit=level  Controls the amount of compile-time spent
               performing optimization.  The defined values for
               level are:

                    default
                         Based on tuning heuristics, the
                         optimizer will spend a reasonable amount
                         of time processing large procedures.

                    min  For large procedures, the optimizer will
                         avoid non-linear time optimizations.

                    none The optimizer will fully optimize large
                         procedures, possibly resulting in
                         significantly increased compile time.

+O[no]loop_unroll[=unroll_factor]
               Enable [disable] loop unrolling. This optimization
               can occur at optimization levels 2, 3, and 4.  The
               default is +Oloop_unroll.  The default is 4, that
               is, four copies of the loop body.  The
               unroll_factor controls code expansion.


+Oprefetch_latency=cycles
               +Oprefetch_latency applies to loops for which the
               compiler generates data prefetch instructions.
               cycles must be in the range of 0 to 10000.  A
               value of zero instructs the compiler to use the
               default value, which is 480 cycles for loops
               containing floating-point accesses and 150 cycles
               for loops that do not contain any floating-point
               accesses.  See the HP aC++ Online Programmer's
               Guide or HP C Online Help.


+O[no]procelim Enable [disable] the elimination of functions that
               are not referenced by the application.  Only
               functions with the hidden export class may be
               eliminated.  The default is +Onoprocelim at
               optimization levels 0 and 1; at levels 2, 3 and 4,
               the default is +Oprocelim.


+O[no]ptrs_to_globals[=name1,name2,...,nameN]
               Tell the optimizer whether global variables are
               modified [are not modified] through pointers.
               This optimization can occur at levels 2, 3, 4. The
               default is +Optrs_to_globals


+O[no]recovery Generate [do not generate] recovery code for
               control speculation.  The default is +Onorecovery.


+Oshortdata[=size]
               All objects of size size bytes or smaller will be
               placed in the short data area, and references to such
               data will assume it resides in the short data area.
               Valid values of n are 0, or a decimal number between 8
               and 4,194,304 (4MB).  If no size is specified, all data
               is placed in the short data area.  If size is 0, no
               data will be placed in the short data area, and all
               data references will use long offsets.  The default is
               +Oshortdata=8.


+O[no]type_safety=[off|limited|ansi|strong]
               Enable [disable] aliasing across types.

                    off  The default.  Specifies that aliasing
                         can occur freely across types.  This is
                         a synonym to +Onoptrs_ansi and
                         +Onoptrs_strongly_typed options in cc.

                    limited
                         Code follows ANSI aliasing rules, and
                         that unnamed objects should be treated
                         as if they had an unknown type.

                    ansi 
                         Code follows ANSI aliasing rules, and
                         unnamed objects should be treated the
                         same as named objects.  This option is
                         synonym to +Optrs_ansi option in cc.

                    strong
                         Code follows ANSI aliasing rules, except
                         that accesses through lvalues of a
                         character type are not permitted to
                         touch objects of other types. This
                         assumes that field addresses are not
                         taken.  This option is synonym to
                         +Optrs_strongly_typed option in cc.


-Bprotected[=symbol[,symbol...]]
               The named symbols, or all symbols if no symbols are
               specified, are assigned the protected export class.
               That means these symbols will not be preempted by
               symbols from other load modules, so the compiler may
               bypass the linkage table for both code and data
               references and bind them to locally defined code and
               data symbols.


-Bprotected_data
               Marks only data symbols as having the protected export
               class.


(e.g. +DD32 or +DD64)
+DDdata_model  Generate code using either the ILP32 or LP64 data
               model.  Defined values for data_model are:

                    32   Use the ILP32 data model.  The sizes of
                         the int, long and pointer data types are
                         32-bits.

                    64   Use the LP64 data model.  The size of
                         the int data type is 32-bits, and the
                         sizes of the long and pointer data types
                         are 64-bits.  Defines __LP64__ to the
                         preprocessor.

               The default is +DD32.


+DSmodel       Perform instruction scheduling appropriate for a
e.g.           specific implementation of the architecture.
+DSnative
               ON IPF the defined values for model are:

                   blended   Tune for best performance on a
                             combination of processors (i.e.,
                             Itanium or Itanium 2 processor).

                   itanium   Tune for best performance on an
                             Itanium processor.

                   itanium2  Tune for best performance on an
                             Itanium 2 processor.

                   native    Tune for best performance on the
                             processor on which the compiler is
                             running.

               The default model is blended.

+FPflag        Specify how the environment for floating-point
e.g.           operations should be initialized at program
+FPD           start-up.  By default, all behaviors are disabled.
               The following flags are supported (upper case flag
               enables; lower case flag disables):

               D (d)     Enable sudden underflow (flush to zero)
                         of denormalized values.


-Wl,-asearch
e.g.           (ld option -a search) Specifies library search order.  
-Wl,-aarchive_shared
               Specify whether shared or archive libraries are
               searched with the -l option.  The value of search
               should be one of archive, shared, archive_shared,
               shared_archive, or default.  This option can
               appear more than once, interspersed among -l
               options, to control the searching for each
               library.  The default is to use the shared version
               of a library if one is available, or the archive
               version if not.

               If either archive or shared is active, only the
               specified library type is accepted.
          
               If archive_shared is active, the archive form is
               preferred, but the shared form is allowed.
          
               If shared_archive is active, the shared form is
               preferred but the archive form is allowed.

-dynamic       Produces dynamically bound executables.  See -minshared
               for partially statically bound executables.  The
               default behavior is dynamic.

-exec          Indicates that any object files created will be used to
               create an executable file.  Constants with a protected
               or hidden export class are placed in the read-only data
               section.  This option also implies -Bprotected_def.

-minshared     Indicates that the result of the current
               compilation is going into an executable file that
               will make minimal use of shared libraries. Equivalent 
               to -exec -Bprotected -Wl,-a,archive_shared.


[Profile Feedback Related Options]

+Oprofile=collect
+Oprofile=collect[:<qualifiers>]
+I             Instrument the application for profile based
               optimization.  The profile collection <qualifiers>
               are:

                    arc  Collect arc counts (equivalent to
                         +Oprofile=collect).  This is the
                         default.

                    stride
                         Collect stride data.

                    all  Collect all types of profile data.
                         Equivalent to the command
                         +Oprofile=collect=arc,stride
               <qualifiers> are a comma-separated list of profile
               collection qualifiers.


+Oprofile=use
+P             Optimize the application based on profile data found in
               the database file flow.data, produced by compilation
               with +I.  +P is equivalent to +Oprofile=use or
               +Oprofile=use:filename.  See ld(1), +I, and +df, for
               more details. The +P option is incompatible with the +I
               and -S options.  It is incompatible with the -g option
               only during compile time.


-----------------------------------------------
Specific Flags for HP-UX F90 Compiler
-----------------------------------------------
+cat           Concatenates all source files of the same source
               form together, then compiles the concatenated
               source all at once. This enables inlining at +O3
               within the concatenated file.


-----------------------------------------------
Specific Flags for HP-UX C and aCC Compiler
-----------------------------------------------
-AOe           In addition to specifying the extended ANSI C language dialect
               as per -Ae (the default), allows the optimizer to aggressively
               exploit the assumption that the source code conforms to the ANSI
               programming language C standard ISO 9899:1990 plus the
               extensions.  At present, the effect is to make
               +Otype_safety=ansi the default (it can of course be overridden).
               As new independently-controllable optimizations are developed
               that depend on the extended ANSI C standard, the flags that
               enable those optimizations may also become the default under
               -AOe.


-Ae            Turns on ANSI C c89 mode. This option allows
               compilation of c89 compatible C source programs just
               like C compiler. 


+inline_level [i]num
               This option controls how C/C++ inlining hints influence
               aCC or cc.  Specify num as 0, 1, 2, or 3.

               num  Meaning

               0    No inlining is done (same effect as the +d option).
               1    Only small functions are inlined.
               2    Only large functions are not inlined.
               3    Inlining hints are respected in all cases,
                    except when the called function is recursive or
                    when it has a variable number of arguments.

               The default level depends on +Olevel as shown in the
               following table:

               level   num

               0       1
               1       1
               2       2
               3       2
               4       2

               If i is also specified, then implicit inlining is
               invoked for "small" functions without the inline
               keyword.

               NOTE: This option controls functions declared with the
               inline keyword or within the class declaration and is
               effective at all optimization levels.  The options
               +Oinline and +Oinlinebudget control the high level
               optimizer that recognizes other opportunities in the
               same source file (+O3) or amongst all source files
               (+O4).


-----------------------------------------------
Other descriptions
-----------------------------------------------

-llapack       Link in highly tuned math library functions found in the
               LAPACK library.  B6061AA (HP MLIB) is an optional HP 
               product which contains the LAPACK library.


effmem.o       Replacement for malloc/free that assumes ANSI compliance 
               and improves spatial locality and minimizes memory usage
               by not maintaining a free list.


fastmem.o      Replacement for malloc/free that assumes ANSI compliance.

-----------------------------------------------
Descriptions of Portability Flags
-----------------------------------------------

+source={fixed|free|default}
               Accept source files in fixed format
               (+source=fixed) or free format (+source=free).
               The default, +source=default, is free for .f90
               files and fixed for .f and .F source files.

176.gcc
  -DHOST_WORDS_BIG_ENDIAN : controls how bytes are numbered within 
                            a word. 

181.mcf
  -DWANT_STDC_PROTO : allows use of the designated prototype.

186.crafty
  -DHP : selects header files and code paths compatible with HPUX.

252.eon
  -DFMAX_IS_DOUBLE    : function fmax returns a double
  -DNDEBUG            : do not include debug code
  -DSPEC_CPU2000_LP64 : use code to make longs and pointers 64 bit

253.perlbmk
  -DSPEC_CPU2000_HP : Compile the SPEC CPU2000 modified perl for an 
                      HPUX system.

254.gap:
   -DSYS_HAS_CALLOC_PROTO :  allows use of the designated prototype
   -DSYS_HAS_IOCTL_PROTO  :  allows use of the designated prototype
   -DSYS_HAS_TIME_PROTO   :  allows use of the designated prototype

   -DSPEC_CPU2000_HP : selects header files and code paths compatible 
                       with HPUX.
   -DSYS_IS_USG      : Compile for a USGish system.

-----------------------------------------------
Descriptions of Kernel Tunables 
-----------------------------------------------
(Unless otherwise noted, units are in bytes)

dbc_max_pct    Maximum dynamic buffer cache size as a percent of system memory

dbc_min_pct    Minimum dynamic buffer cache size as a percent of system memory

maxdsiz        Maximum data size

maxdsiz_64bit  Maximum data size for 64 bit applications

maxssiz        Maximum stack size

maxssiz_64bit  Maximum stack size for 64 bit applications

maxtsiz        Maximum thread data size

maxtsiz_64bit  Maximum thread data size for 64 bit applications

vps_ceiling    Maximum System-Selected Page Size (in Kbytes)

vps_pagesize   Default user page size (in Kbytes)

swapmem_on     Swap to memory flag.

-----------------------------------------------
Descriptions of Other Options and Commands
-----------------------------------------------

mpsched        Control the processor or locality domain on which a
               specific  process executes. (HP-UX command)

tmplog         In tmplog mode, the intent log is almost always delayed.
               This improves performance, but recent changes may
               disappear if the system crashes.  This mode is only
               recommended for temporary file systems.

nolog          nolog is an alias for tmplog.

convosync=delay
               Alters the caching behavior of the file system for O_SYNC
               and O_DSYNC I/O operations.
               The delay value delays O_SYNC or O_DSYNC writes so that
               they do not take effect immediately.  With this option,
               VxFS changes O_SYNC or O_DSYNC writes into delayed
               writes.  No special action is performed when closing a
               file.  This option effectively cancels data integrity
               guarantees normally provided by opening a file with
               O_SYNC or O_DSYNC.

mincache=tmpcache 
               Alter the caching behavior of the file system.
               The tmpcache value disables delayed extending writes,
               trading off integrity for performance.  When this option
               is chosen, VxFS does not zero out new extents allocated
               as files are sequentially written.  Uninitialized data
               may appear in files being written at the time of a system
               crash.