IBM AIX Flag Disclosure
SPEC CPU2000 & OMP2001
For use with AIX submissions with the IBM XL compilers.
Last Revised 13 October, 2005

Notes
=====
The IBM C/C++ & Fortran compilers produce 32-bit binaries by default. Flags are
described below which cause the compilers to produce 64-bit binaries.


Source Level Portability Options
================================
-DHOST_WORDS_BIG_ENDIAN  (176.gcc)
               Host system is big-endian.

-DAIX  (186.crafty)
               Sets some basic parameters like endian-ess, OS type, and ANSI 
               language extensions to be compatible with an AIX system.

-DNDEBUG  (252.eon)
               SPEC default for C++ compiler but also needed explicitly by
               some linkers. Defining this disables any assert macros used for
               debugging.

-DNEED_EXPLICIT_SPECIALIZATION  (252.eon)
               Supply function definitions with explicit types in two cases
               where templatized versions fail to compile.

-DSPEC_CPU2000_AIX  (253.perlbmk)
               Compile the SPEC CPU2000 modified perl for an AIX system.

-DSYS_IS_BSD  (254.gap)
               Compile gap for a BSDish system.

-DSYS_STRING_H  (254.gap) 
               Do not explicitly include string.h
               
-DSYS_HAS_TIME_PROTO  (254.gap)
               Do not supply prototypes for the time(), times() and getrusage()
               functions.
               
-DSYS_HAS_MALLOC_PROTO  (254.gap)
               Do not supply prototypes for malloc() and free().
               
-DSYS_HAS_CALLOC_PROTO (254.gap)
               Do not supply a prototype for calloc().

-DHAVE_SIGNED_CHAR  (300.twolf)
               System allows signed char type.


Compiler Invocation
===================
xlc       Invokes the compiler for C source files with a default language level
          of ansi and specifies that it allow type-based aliasing.

cc        Invokes the compiler for C source files with a default language of
          extended and specifies that it provide compatibility with older IBM
          compilers and allow placement of string literals or constant values in
          read/write storage. cc does not conform to the ISO/ANSI C standard.

xlC       Invokes the compiler for C++ source files with a default language level
          of ansi and specifies that it allow type-based aliasing.

xlc_r     The same as "xlc" except that it generates a threadsafe executable,
          compliant with the POSIX pthreads API.

xlf       Invokes the compiler for Fortran source files with a default language
          of Fortran 77.

xlf90     Invokes the compiler for Fortran source files with a default language
          of Fortran 90.

xlf90_r   The same as "xlf90" except that it generates a threadsafe executable,
          compliant with the POSIX pthreads API.

cleanpdf  Erase the information in the PDF directory if any exists to ensure
          no feedback information is reused between compilations.


Compiler Options
================
-ma              Use built-in alloca() function.

-O               Performs optimizations that the compiler developers considered 
                 the best combination for compilation speed and runtime 
                 performance. 

-O3              Perform some memory and compile time intensive optimizations in 
                 addition to those executed with -O.  The -O3 specific 
                 optimizations have the potential to slightly alter the semantics 
                 of a user's program.  Optimizations may include, but are not
                 limited to: Aggressive code motion, and scheduling on 
                 computations that have the potential to raise an exception, but
                 no valid exceptions will be suppressed; Relaxed conformance to
                 IEEE rules in cases where the difference in the results is not
                 important to an application;  Rewriting of floating point expressions. 

-O4              Equivalent to -O3 -qipa -qhot with automatic generation of
                 architecture ( -qarch= )and tuning ( -qtune= )options ideal for
                 that platform. The qipa level defaults to level=1.

-O5              Equivalent to -O3 -qipa=level=2 -qhot with automatic generation
                 of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal
                 for that platform.

-Dfloor=__floor  Causes the XL compiler to inline this function whenever possible.

-D_ILS_MACROS    Defined in /usr/include/ctype.h to use the macro version of the 
                 string classification functions (e.g. isupper()).

-Q, -qinline     The -Q option without any list inlines all appropriate 
                 procedures, subject to limits on the number of inlined calls and 
                 the amount of code size increase as a result. -qinline is an 
                 alias for -Q.
          
-Q=xxx           Inline all functions that contain less than xxx lines of
                 abstract code units.

-q64             Selects 64-bit compiler mode.

-qalign=struct=natural        The compiler maps structure members to their
-qalign=natural               natural boundaries. The first form is used by
                              the Fortran compiler; the second form is used
                              by the C compiler and is a deprecated form for
                              the Fortran compiler.

-qansialias      Use type-based aliasing during optimization

-qarch=ppc       Produces object code containing instructions that will run on 
                 any of the 32-bit PowerPC hardware platforms.

-qarch=ppc970    Produces object code containing instructions that will run on 
                 PPC970 processors.

-qarch=pwr3      Produces object code containing instructions that will run on 
                 power3 processors.

-qarch=pwr4      Produces object code containing instructions that will run on 
                 power4/power4+ processors.

-qarch=pwr5      Produces object code containing instructions that will run on 
                 power5 processors.

-qarch=pwr5x     Produces object code containing instructions that will run on 
                 power5+ processors.

-qarch=rs64b     Produces object code containing instructions that will run on 
                 RS64-II processors.

-qarch=auto      Produces object code containing instructions that will run on
                 the hardware platform on which the program is compiled.

-qdatalocal      Changes the default to assume that all variables ar local.

-qenablevmx      On PPC970 processors, binary can contain instructions for the
                 vector arithmetic (VMX) unit.

-qessl           Specifies that, if either -lessl or -lesslsmp are also
                 specified, then Engineering and Scientific Subroutine Library
                 (ESSL) routines should be used in place of some Fortran 90
                 intrinsic procedures when there is a safe opportunity to do so.
               
-qlibessl        Specifies that all functions whose names match ESSL library-
                 functions are, in fact, the library functions.

-qfdpr           Collect information about programs for use with the AIX fdpr
                 (Feedback Directed Program Restructuring) performance-tuning utility.

-qfixed          Indicates that the input source program is in fixed form.  Allows
                 fixed format Fortran 77 programs to be compiled using the xlf90 
                 compiler invocation.

-qfixed=<num>    States that Fortran code is in fixed source form, with optional
                 argument specifying the maximum line length.

-qfloat=rsqrt    Changes a division by the result of a square root operation into a
                 multiply by the reciprocal of the square root.

-qhot            Perform high-order transformations on loops during optimization.

-qhot=arraypad   Pad the sizes of arrays to align better in cache.
               
-qipa=level=1    Turns on interprocedural analysis with inlining, limited alias
                 analysis, and limited call-site tailoring. This is the default
                 level of -qipa.

-qipa=level=2    Turns on interprocedural analysis with inlining, cloning, full 
                 alias analysis, constant propagation, call-site tailoring, and 
                 dead code removal.

-qipa=noobject   Do not generate object files during the first stage of inter-
                 procedural analysis.

-qinline         Alias for -Q. See -Q.

-qipa=partition=large Specifies the size of the regions within the program to 
                      analyze. Larger partitions contain more procedures,
                      which result in better interprocedural analysis but
                      require more storage to optimize. 

-qlanglvl=ansi   Compilation conforms to the ANSI standard. 

-qlargepage      Indicates that a program, designed to execute in a large page
                 memory environment, can take advantage of large 16 MB pages
                 provided on POWER4 or better CPUs.

-qmaxmem=-1      Allows the compiler to use as much memory as it needs to execute.

-qpdf1/pdf2      Profile directed feedback optimization

-qsave           Sets the default storage class for local variables to STATIC.

-qsmp=omp        Enable OpenMP parallelization directives.

-qsuffix=f=f90   Sets the suffix for source files to be .f90.  The .f90 suffix is
                 required by xlf90 to compile Fortran 90 programs.

-qtune=604       Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the PowerPC 604/604e
                 processor.

-qtune=pwr3      Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the Power3 processor.

-qtune=pwr4      Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the Power4/Power4+
                 processors.

-qtune=pwr5      Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the Power5 processors.

-qtune=pwr5x     Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the Power5+ processors.

-qtune=rs64b     Instruction selection, scheduling, and other implementation 
                 dependent performance enhancements for the RS64-II processor.

-qtune=auto      Instruction selection, scheduling, and other implementation
                 dependent performance enhancements for the hardware platform
                 on which the program is compiled.

-qunroll=n       Unrolls inner loops in th program by a factor of n.

-w               Suppress warning messages from the C, C++, and Fortran compilers.


Linker Options
==============
-Ldir          Link looks in the directory that is specified by the
               option "dir". 

-lblacssmp     Link the Parallel ESSL SMP BLACS Library.

-lessl         Link the Engineering and Scientifc Subroutine Library (ESSL).
               
-lesslsmp      Link the threadsafe version of the ESSL library.
               
-lpesslsmp     Link the threadsafe, parallelized version of the ESSL library.

-lmass         Link the mathematical acceleration subsystem libraries (MASS),
               which contain libraries of tuned mathematical intrinsic 
               functions. See
               http://techsupport.services.ibm.com/server/mass?fetch=home.html

-lhmu          Link fast malloc libraries.  These libraries are part of the
               memdbg package that is included with IBM C compilers.

-lpdf          Routines used in the first pass of the profile directed 
               feedback process.  Routines from this library are not used in
               building the final executable.  In newer compilers, -qpdf1 does
               this automatically, so using this in conjunction with -qpdf1
               is redundant.

-blpdata       Sets the bit in the file's XCOFF header indicating that this
               executable will request the use of large pages when they are
               available on the system and when the user has an appropriate
               privilege

-bdatapsize:64K         These flags set the page-sizes of the data, stack, and
-bstackpsize:64K        text segments to 64K.
-btextpsize:64K

-bmaxdata:0x........    Sets the maximum combined size of the program's stack-
                        and data- segments to this number of byes, specified
                        in hexadecimal, when the default is too small.

-bnso          Brings referenced library procedures into the object file  

-bI:/lib/syscalls.exp     Create statically linked object files (syscalls.exp
                          supplies the names of the routines that can be 
                          imported).


FDPR:
=====
The fdpr (feedback directed program restructuring) program optimizes the 
executable image of a program by collecting information on the behavior of 
the program while the program is used for some typical workload, and then 
creating a new version.  It is available on AIX Version 4 and 5 systems as part 
of the Performance Toolbox for AIX.

Options:
    -o OutFile        Specifies the name of the output file from the optimizer.

    -p ProgramName    The name of the executable program to optimize.

    -q                Processing/compilation produces no output to STDOUT.

    -v                Selects verbose output during processing/compilation.

    -x Command        Specifies the command used for invoking the instrumented
                      program.  All the arguments after the -x flag are used
                      for the invocation. 

    -O2               Employ a program-reordering technique in which the
                      original structure of the program, including traceback
                      entries, is preserved. 

    -O3               Employ global reordering techniques that do not preserve
                      debug information. 

The compilers include an optional "-qfdpr" flag that assists FDPR analysis but
is not required for it.


Large Page Settings:
====================
vmo command options (AIX 5.2 & above):

    -r                   Apply changes at the system boot.
    
    -o lgpg_regions=#    Specifies the number of large pages to reserve.
                         Example: #=200 allows 200 large pages to be reserved.

    -o lgpg_size=#       Specifies the size in bytes of the hardware-supported 
                         large pages. Example:  #=16777216  is a 16M page size.


vmtune/vmtune64 command options (AIX 5.1 Only):

    -g                   Sets the page size for the large page.
                         Example: -g 16777216 for 16M page.

    -L                   Sets the number of large pages.
                         Example: -L 200 allows 200 large pages.

    -y1                  Enables the memory affinity.

    
chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE $USER
   
                      Allows $USER (non-root ID) to access the large pages that are
                      available. It takes effect on next login.

bosboot -a            Creates a boot image used on the next system reboot.

shutdown -rF          Halt the operating system and reboot.


Shared Memory Pinning:
======================
vmo command options (AIX 5.2 & above):

    -r                   Apply changes at the system boot
        
    -o v_pinshm=1        Shared memory segments are "pinned" in the sense that the
                         allocated pages cannot be swapped out of memory.


Memory Affinity:
================
vmo command options (AIX 5.2 & above):

    -r                   Apply changes at the system boot
        
    -o memory_affinity=1 Enable the VMM to restrict the memory frames attached to
                         the executing MCM

Note that the system needs to be rebooted to activate the Memory Affinity feature,
i.e. "bosboot -a; shutdown -r" as described above, and the Large Page, Shared Memory
Pinning, and Memory Affinity options can be used together. Memory Affinity is active
by default in AIX Version 5.2 5765-H62 (05/2003) and above. In all cases the
"MEMORY_AFFINITY" environment variable, defined below, needs to be set for the job
that is running.


AIX Environment Variables:
==========================
LDR_CNTRL=LARGE_PAGE_DATA=M    Asserts that there are sufficient large-pages available
                               for program data, allowing them to be allocated on first
                               reference, instead of allocating all of them at load time.

MEMORY_AFFINITY=MCM  Turn on Memory Affinity which has been enabled with the
                     vmo command.

MALLOCMULTIHEAP=1    Maintains multiple heaps in the process, for servicing simultaneous
                     "malloc" requests.

OMP_DYNAMIC=FALSE    Disables dynamic adjustment of the number of available threads.

OMP_NUM_THREADS=...  The exact number of threads available to be used, or if OMP_DYNAMIC
                     is TRUE, the upper limit on the number of available threads.

XLFRTEOPTS=NAMELIST=OLD   Allows a newly compiled program to read the namelist from a
                          binary compiled with the older namelist format.

XLFRTEOPTS=intrinthds={num_threads}  Specifies the number of threads for parallel execution
                                     for parallel execution of the MATMUL and RANDOM_NUMBER
                                     intrinsic procedures. The default value for num_threads
                                     when using the MATMUL intrinsic equals the number of
                                     processors online. The default value for num_threads
                                     when using the RANDOM_NUMBER intrinsic is equal to the
                                     number of processors online*2.

                                     Changing the number of threads available to the MATMUL
                                     and RANDOM_NUMBER intrinsic procedures can influence
                                     performance.

XLSMPOPTS     A list of runtime settings affecting SMP execution. Here are
              some of the possibilities:

                     SCHEDULE=STATIC     Work is scheduled to threads round-robin.

                     SPINS=0             Allows work-requests to spin indefinitely without
                                         the thread having to yield the time-slice.

                     STACK=....          Specifies the largest allowable size of a thread's
                                         stack, in bytes.

                     YIELDS=0            Allows the thread to yield an indefinite number
                                         of times without being driven into a sleep state.

                     STARTPROC=0         When assigning threads to CPU's, begin with thread
                                         0 on CPU 0.

                     STRIDE=X            When assigning the next thread to a CPU, add X to
                                         the current CPU index instead of using (CPU+1).


System & Process Management:
============================
The following commands are used to bind processes to processors in SPEC/CPU runs.
The SPEC/CPU harness uses the $SPECUSERNUM variable to enumerate the different
processes in a rate-run; in the text of the SPEC/CPU config-file, this is expressed
as "\$SPECUSERNUM" in order for the variable-name to be evaluated at runtime.

bindprocessor X Y             AIX command, binding process X to CPU Y.

smtctl -m on  -w boot         AIX commands enabling & disabling SMT (Simultaneous
smtctl -m off -w boot         Multi-Threading) which allows a single CPU core to
                              process multiple execution threads simultaneously.
							  These forms of the command must be followed by a
							  "bosboot -a" command and a "shutdown -r" reboot.

drmgr -r -c cpu               AIX command, deallocating one processor from the
                              Operating System partition so it is not available
							  for computation. The processors are reallocated
							  on system reboot.