IBM AIX Flag Disclosure
SPEC CPU2000 & OMP2001
For use with AIX submissions with the IBM XL compilers.
Last Revised 22 August, 2005


Source Level Portability Options
================================
-DHOST_WORDS_BIG_ENDIAN  (176.gcc)
               Host system is big-endian.

-DAIX  (186.crafty)
               Sets some basic parameters like endian-ess, OS type, and ANSI 
               language extensions to be compatible with an AIX system.

-DNDEBUG  (252.eon)
               SPEC default for C++ compiler but also needed explicitly by
               some linkers. Defining this disables any assert macros used for
               debugging.

-DNEED_EXPLICIT_SPECIALIZATION  (252.eon)
               Supply function definitions with explicit types in two cases
               where templatized versions fail to compile.

-DSPEC_CPU2000_AIX  (253.perlbmk)
               Compile the SPEC CPU2000 modified perl for an AIX system.

-DSYS_IS_BSD  (254.gap)
               Compile gap for a BSDish system.

-DSYS_STRING_H  (254.gap) 
               Do not explicitly include string.h
               
-DSYS_HAS_TIME_PROTO  (254.gap)
               Do not supply prototypes for the time(), times() and getrusage()
               functions.
               
-DSYS_HAS_MALLOC_PROTO  (254.gap)
               Do not supply prototypes for malloc() and free().
               
-DSYS_HAS_CALLOC_PROTO (254.gap)
               Do not supply a prototype for calloc().

-DHAVE_SIGNED_CHAR  (300.twolf)
               System allows signed char type.


Compiler Invocation
===================
xlc       Invokes the compiler for C source files with a default language level
          of ansi and specifies that it allow type-based aliasing.

cc        Invokes the compiler for C source files with a default language of
          extended and specifies that it provide compatibility with older IBM
          compilers and allow placement of string literals or constant values in
          read/write storage. cc does not conform to the ISO/ANSI C standard.

xlC       Invokes the compiler for C++ source files with a default language level
          of ansi and specifies that it allow type-based aliasing.

xlc_r     The same as "xlc" except that it generates a threadsafe executable,
          compliant with the POSIX pthreads API.

xlf       Invokes the compiler for Fortran source files with a default language
          of Fortran 77.

xlf90     Invokes the compiler for Fortran source files with a default language
          of Fortran 90.

xlf90_r   The same as "xlf90" except that it generates a threadsafe executable,
          compliant with the POSIX pthreads API.

cleanpdf  Erase the information in the PDF directory if any exists to ensure
          no feedback information is reused between compilations.


Compiler Options
================
-ma            Use built-in alloca() function.

-O             Performs optimizations that the compiler developers considered 
               the best combination for compilation speed and runtime 
               performance. 

-O3            Perform some memory and compile time intensive optimizations in 
               addition to those executed with -O.  The -O3 specific 
               optimizations have the potential to slightly alter the semantics 
               of a user's program.  Optimizations may include, but are not
               limited to: Aggressive code motion, and scheduling on 
               computations that have the potential to raise an exception, but
               no valid exceptions will be suppressed; Relaxed conformance to
               IEEE rules in cases where the difference in the results is not
               important to an application;  Rewriting of floating point expressions. 

-O4            Equivalent to -O3 -qipa -qhot with automatic generation of
               architecture ( -qarch= )and tuning ( -qtune= )options ideal for
               that platform. The qipa level defaults to level=1.

-O5            Equivalent to -O3 -qipa=level=2 -qhot with automatic generation
               of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal
               for that platform.

-D_ILS_MACROS  Defined in /usr/include/ctype.h to use the macro version of the 
               string classification functions (e.g. isupper()).
          
-Q, -qinline   The -Q option without any list inlines all appropriate 
               procedures, subject to limits on the number of inlined calls and 
               the amount of code size increase as a result. -qinline is an 
               alias for -Q.
          
-Q=xxx         Inline all functions that contain less than xxx lines of
               abstract code units.

-q64           Selects 64-bit compiler mode.

-qalign=struct=natural        The compiler maps structure members to their
-qalign=natural               natural boundaries. The first form is used by
                              the Fortran compiler; the second form is used
                              by the C compiler and is a deprecated form for
                              the Fortran compiler.

-qansialias    Use type-based aliasing during optimization

-qarch=ppc     Produces object code containing instructions that will run on 
               any of the 32-bit PowerPC hardware platforms.

-qarch=pwr3    Produces object code containing instructions that will run on 
               power3 processors.

-qarch=pwr4    Produces object code containing instructions that will run on 
               power4/power4+ processors.

-qarch=pwr5    Produces object code containing instructions that will run on 
               power5 processors.

-qarch=rs64b   Produces object code containing instructions that will run on 
               RS64-II processors.

-qdatalocal    Changes the default to assume that all variables ar local.
               
-qessl         Specifies that, if either -lessl or -lesslsmp are also
               specified, then Engineering and Scientific Subroutine Library
               (ESSL) routines should be used in place of some Fortran 90
               intrinsic procedures when there is a safe opportunity to do so.
               
-qlibessl      Specifies that all functions whose names match ESSL library-
               functions are, in fact, the library functions.

-qfdpr         Collect information about programs for use with the AIX fdpr
               (Feedback Directed Program Restructuring) performance-tuning utility.

-qfixed        Indicates that the input source program is in fixed form.  Allows
               fixed format Fortran 77 programs to be compiled using the xlf90 
               compiler invocation.

-qfixed=<num>  States that Fortran code is in fixed source form, with optional
               argument specifying the maximum line length.

-qfloat=rsqrt  Changes a division by the result of a square root operation into a
               multiply by the reciprocal of the square root.

-qhot          Perform high-order transformations on loops during optimization.

-qhot=arraypad  Pad the sizes of arrays to align better in cache.
               
-qipa=level=1  Turns on interprocedural analysis with inlining, limited alias
               analysis, and limited call-site tailoring. This is the default
               level of -qipa.
               
-qipa=level=2  Turns on interprocedural analysis with inlining, cloning, full 
               alias analysis, constant propagation, call-site tailoring, and 
               dead code removal.

-qipa=noobject   Do not generate object files during the first stage of inter-
                 procedural analysis.

-qinline       Alias for -Q. See -Q.

-qipa=partition=large Specifies the size of the regions within the program to 
                      analyze. Larger partitions contain more procedures,
                      which result in better interprocedural analysis but
                      require more storage to optimize. 

-qlanglvl=ansi Compilation conforms to the ANSI standard. 

-qlargepage    Indicates that a program, designed to execute in a large page
               memory environment, can take advantage of large 16 MB pages
               provided on POWER4 or better CPUs.

-qmaxmem=-1    Allows the compiler to use as much memory as it needs to execute.

-qpdf1/pdf2    Profile directed feedback optimization

-qsave         Sets the default storage class for local variables to STATIC.

-qsmp=omp      Enable OpenMP parallelization directives.

-qsuffix=f=f90 Sets the suffix for source files to be .f90.  The .f90 suffix is
               required by xlf90 to compile Fortran 90 programs.

-qtune=604     Instruction selection, scheduling, and other implementation 
               dependent performance enhancements for the PowerPC 604/604e
               processor.

-qtune=pwr3    Instruction selection, scheduling, and other implementation 
               dependent performance enhancements for the Power3 processor.

-qtune=pwr4    Instruction selection, scheduling, and other implementation 
               dependent performance enhancements for the Power4/Power4+
               processors.

-qtune=rs64b   Instruction selection, scheduling, and other implementation 
               dependent performance enhancements for the RS64-II processor.

-qunroll=n     Unrolls inner loops in th program by a factor of n.

-w             Suppress warning messages from the C, C++, and Fortran compilers.


Linker Options
==============
-Ldir          Link looks in the directory that is specified by the
               option "dir". 

-lblacssmp     Link the Parallel ESSL SMP BLACS Library.

-lessl         Link the Engineering and Scientifc Subroutine Library (ESSL).
               
-lesslsmp      Link the threadsafe version of the ESSL library.
               
-lpesslsmp     Link the threadsafe, parallelized version of the ESSL library.

-lmass         Link the mathematical acceleration subsystem libraries (MASS),
               which contain libraries of tuned mathematical intrinsic 
               functions. See
               http://techsupport.services.ibm.com/server/mass?fetch=home.html

-lhmu          Link fast malloc libraries.  These libraries are part of the
               memdbg package that is included with IBM C compilers.

-lpdf          Routines used in the first pass of the profile directed 
               feedback process.  Routines from this library are not used in
               building the final executable.  In newer compilers, -qpdf1 does
               this automatically, so using this in conjunction with -qpdf1
               is redundant.

-blpdata       Sets the bit in the file's XCOFF header indicating that this
               executable will request the use of large pages when they are
               available on the system and when the user has an appropriate
               privilege

-bmaxdata:0x........    Sets the maximum combined size of the program's stack-
                        and data- segments to this number of byes, specified
                        in hexadecimal, when the default is too small.

-bnso          Brings referenced library procedures into the object file  

-bI:/lib/syscalls.exp     Create statically linked object files (syscalls.exp
                          supplies the names of the routines that can be 
                          imported).


FDPR:
=====
The fdpr (feedback directed program restructuring) program optimizes the 
executable image of a program by collecting information on the behavior of 
the program while the program is used for some typical workload, and then 
creating a new version.  It is available on AIX Version 4 and 5 systems as part 
of the Performance Toolbox for AIX.

Options:
    -o OutFile        Specifies the name of the output file from the optimizer.

    -p ProgramName    The name of the executable program to optimize.

    -v                Selects verbose output during processing/compilation

    -x Command        Specifies the command used for invoking the instrumented
                      program.  All the arguments after the -x flag are used
                      for the invocation. 

    -O2               Employ a program-reordering technique in which the
                      original structure of the program, including traceback
                      entries, is preserved. 

    -O3               Employ global reordering techniques that do not preserve
                      debug information. 

The compilers include an optional "-qfdpr" flag that assists FDPR analysis but
is not required for it.


Large Page Settings:
====================
vmo command options (AIX 5.2 & above):

    -r                   Apply changes at the system boot.
    
    -o lgpg_regions=#    Specifies the number of large pages to reserve.
                         Example: #=200 allows 200 large pages to be reserved.

    -o lgpg_size=#       Specifies the size in bytes of the hardware-supported 
                         large pages. Example:  #=16777216  is a 16M page size.


vmtune/vmtune64 command options (AIX 5.1 Only):

    -g                   Sets the page size for the large page.
                         Example: -g 16777216 for 16M page.

    -L                   Sets the number of large pages.
                         Example: -L 200 allows 200 large pages.

    -y1                  Enables the memory affinity.

    
chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE $USER
   
                      Allows $USER (non-root ID) to access the large pages that are
                      available. 

bosboot -a            Creates a boot image used on the next system reboot.

shutdown -r           Halt the operating system and reboot.


Memory Affinity:
================
vmo command options (AIX 5.2):

    -r                   Apply changes at the system boot
        
    -o memory_affinity=1 Enable the VMM to restrict the memory frames attached to
                         the executing MCM

Note that the system needs to be rebooted to activate the Memory Affinity feature,
i.e. "bosboot -a; shutdown -r" as described above, and the Large Page and Memory
Affinity options can be used together. Memory Affinity is active by default in
AIX Version 5.2 5765-H62 (05/2003) and above. In all cases the "MEMORY_AFFINITY"
environment variable, defined below, needs to be set for the job that is running.


AIX Environment Variables:
==========================
LDR_CNTRL=LARGE_PAGE_DATA=M    Asserts that there are sufficient large-pages available
                               for program data, allowing them to be allocated on first
                               reference, instead of allocating all of them at load time.

MEMORY_AFFINITY=MCM  Turn on Memory Affinity which has been enabled with the
                     vmo command.

MALLOCMULTIHEAP=1    Maintains multiple heaps in the process, for servicing simultaneous
                     "malloc" requests.

OMP_DYNAMIC=FALSE    Disables dynamic adjustment of the number of available threads.

OMP_NUM_THREADS=...  The exact number of threads available to be used, or if OMP_DYNAMIC
                     is TRUE, the upper limit on the number of available threads.

XLFRTEOPTS=NAMELIST=OLD   Allows a newly compiled program to read the namelist from a
                          binary compiled with the older namelist format.

XLSMPOPTS     A list of runtime settings affecting SMP execution. Here are
              some of the possibilities:

                     SCHEDULE=STATIC     Work is scheduled to threads round-robin.

                     SPINS=0             Allows work-requests to spin indefinitely without
                                         the thread having to yield the time-slice.

                     STACK=....          Specifies the largest allowable size of a thread's
                                         stack.

                     YIELDS=0            Allows the thread to yield an indefinite number
                                         of times without being driven into a sleep state.

                     STARTPROC=0         When assigning threads to CPU's, begin with thread
                                         0 on CPU 0.

                     STRIDE=X            When assigning the next thread to a CPU, add X to
                                         the current CPU index instead of using (CPU+1).


Firmware Configuration Information:
===================================
PRIVATE mode L3-Caches: Enter L3 Mode Menu; if the "Private/Shared L3 Mode" item reads
                        "Currently Shared (/Private)", type "1" to toggle the mode to
                        PRIVATE, otherwise do nothing.



Process Management:
===================
The following commands are used to bind processes to processors in SPEC/CPU runs.
The SPEC/CPU harness uses the $SPECUSERNUM variable to enumerate the different
processes in a rate-run; in the text of the SPEC/CPU config-file, this is expressed
as "\$SPECUSERNUM" in order for the variable-name to be evaluated at runtime.

bindprocessor X Y                     AIX command, binding process X to CPU Y.

schedule.8 \$SPECUSERNUM $command    Shell script that binds processes 0,1,2,3 to even-numbered
                                     CPU's 0,2,4,6 and processes 4,5,6,7 to odd-numbered CPU's
                                     1,3,5,7. This is written using the "bindprocessor" primitive.

schedule.16  ...                     Analogous scripts, binding the first 8, 16, 32, or 64
schedule.32  ...                     processes to the even-numbered CPU's and the second
schedule.64  ...                     8, 16, 32, or 64 processes to the odd-numbered CPU's.
schedule.128 ...