--------------------------------------------------------------------------------
Fujitsu PRIMEPOWER flags/tunables description                   (May.06 2003)
--------------------------------------------------------------------------------
Fujitsu Parallelnavi 1.0.2 compiler flag description            (May.06 2003)

Compiler options      Remark
--------------------------------------------------------------------------------
-dy/-dn               Specifies dynamic(-dy) or static(-dn) linkage of
                      libraries. -dy is the default unless -Kfast_GP=n (n>=3)
                      is specified and -Klargepage is not specified, in that
                      case -dn is the default.

-Kbcopy               Convert memory copy loop to memmove or memcpy function.

-Kcfunc               This uses high speed mathematical functions and library
                      functions (malloc,calloc,realloc,free) prepared by this
                      compilation system.

-Kfast_GP[={0|1|2|3|4|5}]
                      This performs optimization for SPARC64 GP series.

                      0: This performs optimization suitable for SPARC64 GP.

                      1: This generates multiply and add instruction in
                      addition to -Kfast_GP=0. (default)

                      2: This performs reordering of expression evaluation in
                      addition to -Kfast_GP=1.

                      3:(C and C++ compiler)
                      This generates crossfile optimization, and inter-
                      procedural optimization in addition to -Kfast_GP=2.
                      (Fortran compiler)
                      This performs loop restructuring optimization in  
                      addition to -Kfast_GP=2.

                      4:(C and C++ compiler only) 
                      This generates advanced branch optimization in
                      addition to -Kfast_GP=3.

                      5:(C and C++ compiler only)
                      This generates global instruction scheduling
                      optimization suit for scientific application in
                      addition to -Kfast_GP=3.

-KGREG                The global registers g2 through g7 are subject to register
                      allocation in the compile stage.

-Klargepage           Specifies to generate executable program which utilizes 
                      Parallelnavi largepage facility.
                        This option expands the page size of virtual memory
                        from original 8KB to 4MB.

-Knounroll            Prevents loop unrolling optimizations.

-Kpg                  Generates instructions to produce a profile file for
                      subsequent optimization (global instruction scheduling
                      etc.).

-Kpopt                This option specifies the optimization data pointed to by
                      pointers using a limited interpretation that the areas
                      referred by pointers are only referred by pointers.

-Kpu[=file]           This performs optimization (global instruction scheduling,
                      etc.) using program runtime profile information obtained
                      by specifying -Kpg option.

-Kprefetch[={1|2|3|4}]
                      Generate prefetch instruction correspond to each prefetch
                      level.

                      1: Basic level prefetch for array elements only
                      inner-most loop.

                      2: In addition to the -Kprefetch=1, generates the
                      prefetch instruction for array elements within the
                      loop pre-header which access the first iteration in
                      the loop.

                      3: In addition to the -Kprefetch=2, when the stride of
                      access for array elements are larger than cache line
                      size, compiler generates prefetch instruction for
                      each cache line size access.

                      4: Maximum level for generating prefetch instruction.
                      In addition to -Kprefetch=3, compiler generates the
                      prefetch instruction for array elements which access
                      in the outer loop.

-Kstaticclump           When this option is specified, all variables in the
                        source file which are declared as static or global
                        (except large arrays) are compiled together 
                        (renamed to avoid conflicts if necessary) into one 
                        big structure and can therefore be addressed
                        with the base address of the structure and an index 
                        from this base address.

-Kuse_rodata          This option specifies whether string constant, floating 
                      point constant and initialization value of aggregate type
                      local storage variable are allocated to read-only data
                      section.

-Kxi=N                Inline expansion, instead of function calls, is performed.
                      Expanded function is selected by result of profiler. N is
                      the percentage that means increased object size.

-O[level]             Specifies the optimization level.

                      0: No optimization.
                      1: Basic optimization.
                      2: Loop unrolling in addition to -O1.
                      3: Global instruction scheduling and restructuring of
                         nested loop in addition to -O2.
                      4: Enhanced optimization of loop restructuring
                         rather than -O3.

--------------------------------------------------------------------------------
Fujitsu Parallelnavi 2.1 compiler flag description              (May.06 2003)

Compiler options      Remark
--------------------------------------------------------------------------------

-Am                   Required if a source file contains modules which will 
                      be referenced by USE statements in other source files
                      or if a source file contains USE statements that reference
                      modules in another source file.

-dy/-dn               Specifies dynamic(-dy) or static(-dn) linkage of
                      libraries. -dy is the default unless -Kfast_GP=n (n>=3)
                      is specified and -Klargepage is not specified, in that
                      case -dn is the default.

-f omitmsg            Set the level of diagnostic messages output and inhibit
                      specific messages.

                      omitmsg is one of i, w, or s, and/or a list of msgnum.

                      i :   All messages are output, this is the default.

                      w :   i level messages are not output.

                      s :   i and w level messages are not output.

                 msgnum :   Message number msgnum is inhibited. msgnum must
                            be an i or w level message.

-Fixed                Specifies that Fortran source programs are written in
                      fixed source form.

-fs                   Do not print any warnings or diagnostic messages other
                      than fatal errors.
-Kalignc[=N]
                      Adjust entry of global data alignment at n-byte boundary.
                      N can be specified from 1 to 32768.

-Kalignl[=N]     
                     Adjust entry of local data alignment at n-byte boundary. 
                     N can be specified from 1 to 32768.

-Karraypad_const[=N]  Insert padding elements after each row of an array whose
                      size is declared with constants for efficient use of cache.

-Kauto                Local variables (without an initial value or the SAVE
                      attribute) are allocated on the runtime stack. Their
                      values are lost when the  procedure ends.

-Kcfunc               This uses high speed mathematical functions and library
                      functions (malloc,calloc,realloc,free) prepared by this
                      compilation system.

-Kcommonpad[=N]       Insert padding elements in common blocks for efficient
                      use of cache.  N can be specified from 4 to 4096. 

-Kcrossfile           This option specifies crossfile optimization.
                      If program consists of several files, the compiler refers
                      these files at one time, and analyzes data dependency and
                      control relation across these files.

-Kfast_GP2[={0|1|2}]
                      This performs optimization for SPARC64 GP2 series.

                      0: This performs optimization suitable for SPARC64 GP2.

                      1: This generates multiply and add instruction in
                      addition to -Kfast_GP2=0. (default)

                      2: This performs reordering of expression evaluation in
                      addition to -Kfast_GP2=1.

-KFMADD              This option specifies use of the combined multiply-
                     add/subtract floating-point instructions.

-Kfrecipro            This option specifies to convert a floating point
                      division into multiplication by the reciprocal.

-Kfuse                Fuses neighboring loops.

-KGREG_SYSTEM         The global registers g5 through g7 are subject to register
                      allocation in the compile stage.

-Kgs                  Performs global instruction scheduling.

-Kilfunc              This option replaces several and double precision
                      mathematical functions,sin,cos,log10,log and exp with
                      complier builtin functions.

-Klargepage           Specifies to generate executable program which utilizes 
                      Parallelnavi largepage facility.

-KNOFMADD             This option suppresses use of the combined multiply-
                      add/subtract floating-point instructions.

-Knoprefetch          Suppresses use the prefetch instruction.

-Knounroll            Prevents loop unrolling optimizations.

-Knovfunc             Suppresses to change the intrinsic function
                      (including power operation) to a multi-operation function. 

-Kpreex               This option specifies the optimization by moving the
                      evaluation of invariant expressions beyond branch.

-Kpopt                This option specifies the optimization data pointed to by
                      pointers using a limited interpretation that the areas
                      referred by pointers are only referred by pointers.

-Kpg                  Generates instructions to produce a profile file for
                      subsequent optimization (global instruction scheduling
                      etc.).

-Kprefetch[={1|2|3|4}]
                      Generate prefetch instruction correspond to each prefetch
                      level.

                        1: Basic level prefetch for array elements only
                           inner-most loop.

                        2: In addition to the -Kprefetch=1, generates the
                           prefetch instruction for array elements within the
                           loop pre-header which access the first iteration in
                           the loop.

                        3: In addition to the -Kprefetch=2, when the stride of
                           access for array elements are larger than cache line
                           size, compiler generates prefetch instruction for
                           each cache line size access.

                        4: Maximum level for generating prefetch instruction.
                           In addition to -Kprefetch=3, compiler generates the
                           prefetch instruction for array elements which access
                           in the outer loop.

-Kprefetch_cache_level=[1/2/3]  LEVEL-1: Generate prefetch instructions in order
                                for the data to reside in the primary cache.

                                LEVEL-2: Generate prefetch instructions in order
                                for the data to reside only in the secondary cache.
                                
                                LEVEL-3: Generate both level-1 and -2 prefetch
                                instructions.

-Kprefetch_infer                For data prefetch control, in cases where 
                                the stride of array accesses is not clear from
                                static analysis of the source files, 
                                the compiler is told to use its internal 
                                heuristics for the addresses that have been
                                determined by the compiler as prefetch 
                                addresses.

-Kprefetch_iteration=N   Generate the prefetch instruction of the data which is 
                         referred after N iterations.

-Kprefetch_line=N     Generate the prefetch instruction to get the data located
                      N times bytes in a line of the primary data cache ahead of
                      the address a neighboring load or store instruction points.

-Kprefetch_line_L2=N  Same to -Kprefetch_line=N, except the data reside only
                      in the secondary cache.

-Kpreload             Moves load instructions across branches.

-Kpreschedule_length[=N]        When -O5 is used, the instruction scheduler 
                        works twice, before and after the register allocation, 
                        which are named pre-pass and post-pass
                        scheduling respectively. -Kpreschedule_length[=N] 
                        can control how aggressively pre-pass scheduling works.
                        The unit is the upper limit of the distance an
                        instruction can move from its original place.

-Kpu[=file]           This performs optimization (global instruction scheduling,
                      etc.) using program runtime profile information obtained
                      by specifying -Kpg option.

-Kunroll[=N]          Performs loop unrolling.  N means upper limit of unrolling 
                      expansion number, whose value should be from 2 to 9999.

-O[level]             Specifies the optimization level.

                       0: No optimization.
 
                       1: Basic optimization.
  
                       2: Loop unrolling in addition to -O1.
 
                       3: Global instruction scheduling and restructuring of
                          nested loop in addition to -O2.

                       4: Enhanced optimization of loop restructuring rather 
                          than -O3.

                       5: Creates an object program by applying further
                          optimizations of register allocation in addition    
                          to -O4.

-SSL2                  The whole set of routines from SSL II, SSL II Thread-
                       Parallel Capabilities and BLAS/LAPACK becomes part of
                       link-edit libraries.

-x-                   Inline expansion, instead of function calls, is performed
                      for all functions defined in the C source code.

-x stm_no             Applying optimization for inline expansion of user-defined
                      external procedure having fewer than specified number of
                      execution statements in the stm_no arguments.

-x dir=directory_name  Performs inline expansion of procedures defined in the files
                       under the directory specified and in the file currently
                       being compiled.

--------------------------------------------------------------------------------
Sun C, C++ and Fortran Sun ONE Studio 8                         (May.06 2003)

Compiler options        Remark
--------------------------------------------------------------------------------

-array_pad_rows,<n>     Enable padding of arrays by n.
(Fortran)

autoup=<n>              When the file system flush daemon fsflush runs,
(Unix)                  it will write to disk all modified file buffers
                        that are more than n seconds old. 

cc                      Invoke the Sun ONE Studio 8 Compiler C 
(C compiler)

CC                      Invoke the Sun ONE Studio 8 Compiler C++ 
(C++ compiler)

-crit                   Enable optimization of critical control paths 
(optimizer)

-dalign                 Assume data is naturally aligned. 
(C, C++, Fortran)

-Dalloca=__builtin_alloca       Portability switch, used for 176.gcc:
(Portability: SPEC Tools)       allow use of compiler's internal builtin alloca.

-depend                 Synonym for -xdepend.
(Fortran)

-DHOST_WORDS_BIG_ENDIAN         Portability switch, used for 176.gcc:
(Portability: SPEC Tools)       controls how bytes are numbered within a word. 

-D__MATHERR_ERRNO_DONTCARE      Allows the compiler to assume that your code
(C)                             does not rely on setting of the errno variable.

-DSPEC_CPU2000_SOLARIS          Portability switch, used for 253.perlbmk:
(Portability: SPEC Tools)       selects header files and code paths compatible
                                with Solaris.

-DSUN                           Portability switch, used for 186.crafty:
(Portability: SPEC Tools)       selects header files and code paths
                                compatible with solaris. 

-DSYS_HAS_CALLOC_PROTO          Portability switch, used for 254.gap:
(Portability: SPEC Tools)       allows use of the designated prototype.

-DSYS_HAS_IOCTL_PROTO           Portability switch, used for 254.gap:
(Portability: SPEC Tools)       allows use of the designated prototype.

-DSYS_HAS_SIGNAL_PROTO          Portability switch, used for 254.gap: 
(Portability: SPEC Tools)       allows use of the designated prototype.

-DSYS_HAS_TIME_PROTO            Portability switch, used for 254.gap:
(Portability: SPEC Tools)       allows use of the designated prototype.

-DSYS_IS_USG                    Portability switch, used for 254.gap:
(Portability: SPEC Tools)       selects code compatible with USG-based systems. 

-e                              Portability switch, used for 178.galgel:
(Portability, Fortran)          allows source lines to be up to 132 characters long. 

f90                     Invoke the Sun ONE Studio 8 Compiler Fortran 90
(Fortran compiler)

-fast                   A convenience option, this switch selects the
(C)                     following switches that are defined elsewhere
                        in this page: 

        -D__MATHERR_ERRNO_DONTCARE
        -fns
        -fsimple=2
        -fsingle
        -ftrap=%none
        -xalias_level=basic
        -xarch=generic
        -xbuiltin=%all
        -xcache=generic
        -xchip=generic 
        -xdebugformat=stabs
        -xdepend
        -xlibmil
        -xmemalign=8s
        -xO5
        -xprefetch=auto,explicit

-fast                   A convenience option, this switch selects the
(C++)                   following switches that are defined elsewhere
                        in this page: 

        -dalign
        -fns
        -fsimple=2
        -ftrap=%none
        -xarch
        -xlibmil
        -xlibmopt
        -xmemalign
        -xO5
        -xtarget=native
        -xbuiltin=%all

-fast                   A convenience option, this switch selects the
(Fortran)               following switches that are defined elsewhere
                        in this page: 

        -dalign
        -depend
        -fns
        -fsimple=2 
        -ftrap=common 
        -xlibmil 
        -xlibmopt 
        -xO5 
        -xpad=local 
        -xprefetch=auto,explicit 
        -xtarget=native
        -xvector=yes       

-fixed                  Portability switch, used for 178.galgel:
(Portability, Fortran)  assume fixed-format source input.

-fns                    Selects faster (but nonstandard) handling of
(C, C++, Fortran)       floating point arithmetic exceptions and
                        gradual underflow.

-fsimple=<n>            Controls simplifying assumptions for
(C, C++, Fortran)       floating point arithmetic:

        -fsimple=0      Permits no simplifying assumptions.
                        Preserves strict IEEE 754 conformance. 

        -fsimple=1      Allows the optimizer to assume: 
                        The IEEE 754 default rounding/trapping
                        modes do not change after process initialization. 
                        Computations producing no visible result other
                        than potential floating-point exceptions may
                        be deleted. Computations with Infinity or NaNs
                        as operands need not propagate NaNs to their
                        results. For example, x*0 may be replaced by 0. 
                        Computations do not depend on sign of zero. 

        -fsimple=2      Permits more aggressive floating point
                        optimizations that may cause programs to
                        produce different numeric results due to
                        changes in rounding. Even with -fsimple=2,
                        the optimizer still is not permitted to
                        introduce a floating point exception in a
                        program that otherwise produces none. 

-fsingle                Evaluate float expressions as single precision. 
(C)

-ftrap=common           Sets the IEEE 754 trapping mode to common exceptions
(C, C++, Fortran)       (invalid, division by zero, and overflow).

-ftrap=%none            Turns off all IEEE 754 trapping modes.
(C, C++, Fortran)

-library=iostream       Portability switch, used for 252.eon:
(Portability, C++)      allow use of the classic iostream library.

-ll2amm                 Include a library containing chip specific
(linker)                memory routines.

-lm                     Include the math library.
(linker)

-lmopt                  Include the optimized math library. This option
(linker)                usually generates faster code, but may produce
                        slightly different results. Usually these results
                        will differ only in the last bit.

-lprism32 (linker)      Library to enable Intimate Shared Memory (ISM)
(linker)                (4MB page) usage.

-noex                   Do not allow C++ exceptions. A throw specification
(C++)                   on a function is accepted but ignored; the compiler
                        does not generate exception code.

-O                      A synomym for -xO3.
(Fortran)

-Qoption <phase> <flags>        Pass flags along to compiler phase:

                                        f90comp Fortran first pass
                                        iropt   Global optimizer
                                        cg      Code Genetator

-Qoption cg <flags>             See -Wc,<flags> below. (The code generator
(code generator)                phase is addressed via -Qoption gc in
                                Fortran and C++; and via -Wc in C.)

-Qoption cg -Qeps:enabled=1             See -Wc,-Qeps:enabled=1
(code generator)

-Qoption cg -Qeps:ws=<n>                See -Wc,-Qeps:ws=<n>
(code generator)

-Qoption cg -Qgsched-T<n>               See -Wc,-Qgsched-T<n>
(code generator)

-Qoption cg -Qgsched-trace_late=1       See -Wc,-Qgsched-trace_late=1
(code generator)

-Qoption iropt <flags>          See -W2,<flags> below. (The optimizer can
(optimizer)                     be addressed either via Qoption iropt in
                                Fortran and C++; or via -W2 in C.)

-Qoption iropt -Addint:sf=<n>           When considering whether to
(optimizer)                             interchange loops, set memory
                                        store operation weight to n.
                                        A higher value of n indicates
                                        a greater performance cost
                                        for stores.

-Qoption iropt -Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]
(optimizer)                             See -W2,[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]

-Qoption iropt -Apf:pdl=1               Do prefetching for one-level
(optimizer)                             indirect memory references. 

-Qoption iropt -Atile:skewp[:b<n>]      Perform loop tiling which is enabled
(optimizer)                             by loop skewing. Loop skewing is a
                                        transformation that transforms a
                                        non-fully interchangeable loop nest
                                        to a fully interchangeable loop
                                        nest. The optional b<n> sets the
                                        tiling block size to n. 

-Qoption iropt -Aujam:inner=g           Increase the probability that
(optimizer)                             small-trip-count inner loops
                                        will be fully unrolled.

-Qoption iropt -Mt<n>           See -W2,-Mt<n>

RM_SOURCES = lapak.f90          This option allows building the benchmark
(SPEC tools)                    178.galgel without its copy of the lapak
                                sources; instead, the lapak entry points
                                in the sunperf library are used.

rm -rf ./feedback.profile ./SunWS_cache         Remove any profile feedback
(Unix)                                          information from previous runs. 

-stackvar               Allocate routine local variables on the stack. 
(Fortran)

tune_t_fsflushr=<n>     Controls the number of seconds between runs of
(Unix)                  the file system flush daemon, fsflush.

ulimit -s unlimited     Allow stack size to grow without limit.
(Unix)

-W<phase>,<flags>       Pass flags along to compiler phase (2=optimizer,
                        c=code genetator).

-W2,-Abcopy             Increase the probability that the compiler will
(optimizer)             perform memcpy/memset transformations. 

-W2,-Abopt              Enable aggressive optimizations of all branches.
(optimizer)

-W2,-Aheap              Allows the compiler to recognize malloc-like
(optimizer)             memory allocation functions.

-W2,-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]
(optimizer)             Control the optimizer's loop inliner:

        cp=<n>          The minimum call site frequency counter in order to
                        consider a routine for inlining. 
    
        cs=<n>          Set inline callee size limit to n. The unit roughly
                        corresponds to the number of instructions. 
    
        inc=<n>         The inliner is allowed to increase the size of the
                        program by up to n%. 
    
        irs=<n>         Allow routines to increase by up to n. The unit
                        roughly corresponds to the number of instructions. 

        mi              Perform maximum inlining (without considering code
                        size increase). 

        recursion=1     Allow routines that are called recursively to still
                        be eligible for inlining. 

-W2,-crit               Enable optimization of critical control paths.
(optimizer)

-W2,-Amemopt:arrayloc   Reconstruct array subscripts during memory
(optimizer)             allocation merging and data layout program
                        transformation.

-W2,-Apf:llist=<n>:noinnerllist         Do speculative prefetching for
(optimizer)                             link-list data structures:
                                        llist=<n> perform prefetching n
                                        iterations ahead noinnerllist
                                        do not attempt for innermost loops. 

-W2,-Ashort_ldst        Convert multiple short memory operations into
(optimizer)             single long memory operations.

-W2,-Aunroll            Enables outer-loop unrolling.
(optimizer)

-W2,-Mr<n>              Maximum code increase due to inlining is limited
(optimizer)             to n triples.

-W2,-Ms<n>              Maximum level of recursive inlining.
(optimizer)

-W2,-Mt<n>              The maximum size of a routine body eligible for
(optimizer)             inlining is limited to n triples.

-W2,-reroll=1           Turns on loop rerolling.
(optimizer)

-W2,-whole              Do whole program optimizations.
(optimizer)

-Wc,-Qdepgraph-early_cross_call=1       There are several scheduling passes
(code generator)                        in the compiler. This option allows
                                        early passes to move instructions
                                        across call instructions. 

-Wc,-Qeps:do_spec_load=1        Allow generating speculative load during EPS.
(code generator)

-Wc,-Qeps:enabled=1             Use enhanced pipeline scheduling(EPS)
(code generator)                and selective scheduling algorithms for
                                instruction scheduling. 

-Wc,-Qeps:rp_filtering_margin=100       Turn off register pressure
(code generator)                        heuristics in EPS.

-Wc,-Qeps:ws=<n>                Set the EPS window size, that is, the number
(code generator)                of instructions it will consider across all
                                paths when trying to find independent
                                instructions to schedule a parallel group.
                                Larger values may result in better run time,
                                at the cost of increased compile time.

-Wc,-Qgsched-T<n>               Sets the aggressiveness of the trace
(code generator)                formation, where n is 4, 5, or 6. 
                                The higher the value of n, the lower
                                the branch probability needed to include
                                a basic block in a trace. 

-Wc,-Qgsched-trace_late=1       Turns on the late trace scheduler.
(code generator)

-Wc,-Qicache-chbab=1            Turn on optimization to reduce branch
(code generator)                after branch penalty: nops will be inserted
                                to prevent one branch from occupying 
                                the delay slot of another branch. 

-Wc,-Qipa:valueprediction       Use profile feedback data to predict values
(code generator)                and attempt to generate faster code along
                                these control paths, even at the expense of
                                possibly slower code along paths leading to
                                different values. Correct code is generated
                                for all paths.

-Wc,-Qiselect-funcalign=<n>     Do function entry alignment at n-byte
(code generator)                boundaries. 


-Wc,-Qiselect-sw_pf_tbl_th=<n>  Peels the most frequent test branches/cases
(code generator)                off a switch until the branch probability
                                reaches less than 1/n. This is effective
                                only when profile feedback is used.

-Wc,-Qlp=<n>[-av=<n>][-t=<n>][-fa=<n>][-fl=<n>]
(code generator)                Control irregular loop prefetching:


        lp=<n>  Turns the module on (1) or off (0)
                (default is on for F90; off for C/C++) 

        -av=<n> Sets the prefetch look ahead distance, in bytes.
                Default is 256.

        -t=<n>  Sets the number of attempts at prefetching. If not
                specified, t=2 if -xprefetch_level=3 has been set;
                otherwise, defaults to t=1.

        -fa=<n> 1=Force user settings to override internally computed values. 
    
        -fl=<n> 1=Force the optimization to be turned on for all languages. 

-Wc,-Qms_pipe+intdivusefp       In pipelined loops, use floating point
(code generator)                divide instructions for signed integer
                                division. 

-Wc,-Qms_pipe+prefolim=<n>      Set number of outstanding prefetches
(code generator)                in pipelined loops to <n>

-Wc,-Qms_pipe+unoovf            Assert (to the pipeliner) that unsigned
(code generator)                int computations will not overflow. 

-Wc,-Qms_pipe-prefst            Turn off prefetching for stores in the
(code generator)                pipeliner.

-Wc,-Qms_pipe-pref              Turn off prefetching within modulo scheduling.
(code generator)

-Wc,-Qpeep-Sh0                  Reduce the probability that the
(code generator)                compiler will hoist sethi
                                insructions out of loops. 

-xalias_level=[basic|std|strong]        Allows the compiler to perform
(C)                                     type-based alias analysis at the
                                        specified alias level:

                basic   Assume that memory references that involve 
                        different C basic types do not alias each other.

                std     Assume aliasing rules described in the ISO 1999 C
                        standard.

                strong  In addition to the restrictions at the std level,
                        assume that pointers of type char * are used only
                        to access an object of type char; and assume that
                        there are no interior pointers.

-xalias_level=compatible        Allows the compiler to assume that
(C++)                           layout-incompatible types are not aliased.

-xarch=<a>              Limit the set of instructions the compiler may use
(C, C++, Fortran)       to generic, generic64, native, native64, v7, v8a,
                        v8, v8plus, v8plusa, v8plusb, v9, v9a, v9b.
                        Typical settings include:

                                UltraSPARC-II, 32-bit mode: v8plusa
                                UltraSPARC-II, 64-bit mode: v9a
                                UltraSPARC-III, 32-bit mode: v8plusb
                                UltraSPARC-III, 64-bit mode: v9b

                        For more information, see the Fortran User's Guide
                        at docs.sun.com

-xbuiltin=%all          Substitute intrinsic functions or inline system
(C, C++)                functions where profitable for performance. 

-xchip=<c>              Specifies the target processor for use by the
(C, C++, Fortran)       optimizer. c must be one of: generic, generic64,
                        native, native64, old, super, super2, micro, micro2,
                        hyper, hyper2, powerup, ultra, ultra2, ultra2i,
                        ultra3, ultra3cu, 386, 486, pentium, pentium_pro,
                        603, 604.

-xcache=<c>             Defines the cache properties for use by the
(C, C++, Fortran)       optimizer. c must be one of  the following:
                        native (set parameters for the host environment)

                                * s1/l1/a1
                                * s1/l1/a1:s2/l2/a2
                                * s1/l1/a1:s2/l2/a2:s3/l3/a3

                        The si/li/ai are defined as follows:

                                si The size of the data cache
                                at level i, in kilobytes.
                                li The line size of the data cache
                                at level i, in bytes.
                                ai The associativity of the data cache
                                at level i.

-xdepend                Analyze loops for inter-iteration data dependencies,
(C, Fortran)            and do loop restructuring.

-xinline=               Turn off inlining.
(C, C++, Fortran)

-xipo[=2]               Perform optimizations across all object files in the
(C, C++, Fortran)       link step:

                                0=off
                                1=on
                                2=performs whole-program detection and analysis

-xlibmil                Use inline expansion for math library, libm.
(C, C++, Fortran)

-xlibmopt               Select the optimized math library.
(C++, Fortran)

-xlic_lib=sunperf       Link with Sun supplied licensed sunperf library.
(C, C++, Fortran)

-xlinkopt               Perform link-time optimizations, such as branch
(C, C++, Fortran)       optimization and cache coloring.

-xO<n>                  Specify optimization level n:
(C, C++, Fortran)

                -xO1    Does only basic local optimizations (peephole).

                -xO2    Do basic local and global optimizations, such as
                        induction variable elimination, common
                        subexpression elimination, constant propogation,
                        register allocation, and basic block merging. 

                -xO3    Add global optimizations at the function level,
                        loop unrolling, and software pipelining.

                -xO4    Adds automatic inlining of functions in the
                        same file.

                -xO5    Uses optmization algorithms that may take
                        significantly more compilation time or that
                        do not have as high a probability of improving
                        execution time, such as speculative code motion.

-xpad=common[:<n>]      If multiple same-sized arrays are placed in common,
(Fortran)               insert padding between them for better use of cache.
                        n specifies the amount of padding to apply, in units
                        that are the same size as the array elements. If no
                        parameter is specified then the compiler selects one
                        automatically.

-xpad=local             Pad local variables, for better use of cache.
(Fortran)

-xprefetch=auto,explicit        Allow generation of prefetch instructions.
(C, C++, Fortran)               -xprefetch and -xprefetch=yes is a synonym
                                for -xprefetch=auto,explicit. 

-xprefetch=latx:<n>             Adjust the compiler's assumptions about
(C, C++, Fortran)               prefetch latency by the specified factor.
                                Typically values in the range of 0.5 to 2.0
                                will be useful. A lower number might
                                indicate that data will usually be cache
                                resident; a higher number might indicate
                                a relatively larger gap between the
                                processor speed and the memory speed
                                (compared to the assumptions built into
                                the compiler).

-xprefetch=no%auto              Turn off prefetch instruction generation.
(C, C++, Fortran) 

-xprefetch_level=<n>            Control the level of searching that the
(C, C++, Fortran)               compiler does for prefetch opportunities
                                by setting n to 1, 2, or 3, where higher
                                numbers mean to do more searching.
                                The default is 2.

-xprofile=collect:./feedback    Collect profile data for feedback-directed
(C, C++, Fortran)               optimization, and store it in a subdirectory
                                of the current directory, named ./feedback.

-xprofile=use:./feedback        Use data collected for profile feedback.
(C, C++, Fortran)               Look for it in a subdirectory of the current
                                directory, named ./feedback.

-xrestrict                      Treat pointer-valued function parameters
(C)                             as restricted pointers.

-xsafe=mem                      Enables the use of non-faulting loads when
(C, C++, Fortran)               used in conjunction with -xarch=v8plus.
                                Assumes that no memory based traps will occur.

-xsfpconst                      Represents unsuffixed floating-point
(C, C++, Fortran)               constants as single precision.

-xtarget=[system_name]          Selects options appropriate for the system
(C, C++, Fortran)               where the compile is taking place, including
                                architecture, chip, and cache sizes.
                                (These can also be controlled separately,
                                via -xarch, -xchip, and -xcache, respectively.) 

-xvector                        Allow the compiler to transform math library
(C, Fortran)                    calls within loops into calls to the vector
                                math library.

--------------------------------------------------------------------------------
Environment Variables

Flag                    Remark
--------------------------------------------------------------------------------

LD_LIBRARY_PATH=<p>     Specify the locations to resolve dynamic link dependencies

PRISM_HEAP=<n>          Set the heap size limit for large pages

PRISM_MODE=2            Large page mode: Attempt to put text, data and heap
                        all into large pages.

--------------------------------------------------------------------------------
/etc/system (system configuration information file) description

System Tunables         Remark
--------------------------------------------------------------------------------
consistent_coloring     Controls the page coloring policy. It can be set to
                        one of the following:

                                0: (default) dynamic (uses various vaddr bits)
                                1: static (virtual=paddr)

tune_t_fsflushr         The number of seconds between fsflush invocations for
                        checking dirty memory.

autoup                  The frequency of file system sync operations.

shmsys:shminfo_shmmin   Minimum size of system V shared memory segment that 
                        can be created.

shmsys:shminfo_shmmax   Maximum size of system V shared memory segment that
                        can be created.

shmsys:shminfo_shmmni   System wide limit on number of shared memory
                        segments that can be created.

shmsys:shminfo_shmseg   Limit on the number of shared memory segments that
                        any one process can create.

--------------------------------------------------------------------------------
/etc/opt/FJSVpnrm/lpg.conf (Large page management information file) description

Tunables                Remark
--------------------------------------------------------------------------------
TSS=size[unit]          Size of total memory, to be used for large page
                        segments. At start of the system, this amount of
                        memory is reserved and initialized. "unit" can be M
                        for mega-byte and G for giga-byte.

SHMSEGSIZE=size[unit]   Size of large page segment. "unit" can be M for
                        mega-byte and G for giga-byte.

JOB=size [unit]         Specify the large page memory resource size for
                        in-job processes. "unit" can be T (terra byte),
                        G (giga byte), or M (mega byte) after size.

LIMITPOLICY=job | proc  Define the memory allocation/limitation type.
                        Default is job.

                                job : Limits for each job in Node.
                                proc: Limits for each process resource set.

--------------------------------------------------------------------------------