-----------------------------------------------------------------------------------
Fujitsu PRIMEPOWER flags/tunables description			(Dec.22 2003)

(Each section is sorted in case insensitive, alphabetical order)

Table of Contents
[1] Fujitsu Parallelnavi 2.1 & 2.3 compiler flag description
[2] Sun C, C++ and Fortran Sun ONE Studio 8 flag description
[3] Environment Variables
[4] Kernel Parameters (/etc/system)
[5] Configuration file for large page manager (/etc/opt/FJSVpnrm/lpg.conf)
[6] Commands for feedback control

-----------------------------------------------------------------------------------
[1] Fujitsu Parallelnavi 2.1 & 2.3 compiler flag description		(Dec.22 2003)

Compiler options      Remark
-----------------------------------------------------------------------------------

-Am			Required if a source file contains modules which will 
			be referenced by USE statements in other source files
			or if a source file contains USE statements that reference
			modules in another source file.

-dy/-dn			Specifies dynamic(-dy) or static(-dn) linkage of
			libraries. -dy is the default unless -Kfast_GP=n (n>=3)
			is specified and -Klargepage is not specified, in that
			case -dn is the default.

-f omitmsg		Set the level of diagnostic messages output and inhibit
			specific messages.

			omitmsg is one of i, w, or s, and/or a list of msgnum.

			     i: All messages are output, this is the default.
			     w: i level messages are not output.
			     s: i and w level messages are not output.
			msgnum: Message number msgnum is inhibited. msgnum must
				be an i or w level message.

-Fixed			Specifies that Fortran source programs are written in
			fixed source form.

-fs			Do not print any warnings or diagnostic messages other
			than fatal errors.

-Kalignc[=N]		Adjust entry of global data alignment at n-byte boundary.
			N can be specified from 1 to 32768.

-Kalignl[=N]		Adjust entry of local data alignment at n-byte boundary. 
			N can be specified from 1 to 32768.

-Karraypad_const[=N]	Insert padding elements after each row of an array whose
			size is declared with constants for efficient use of cache.

-Kauto			Local variables (without an initial value or the SAVE
			attribute) are allocated on the runtime stack. Their
			values are lost when the  procedure ends.

-Kcfunc			This uses high speed mathematical functions and library
			functions (malloc,calloc,realloc,free) prepared by this
			compilation system.

-Kcommonpad[=N]		Insert padding elements in common blocks for efficient
			use of cache.  N can be specified from 4 to 4096. 

-Kcrossfile		This option specifies crossfile optimization.
			If program consists of several files, the compiler refers
			these files at one time, and analyzes data dependency and
			control relation across these files.

-Kfast_GP2[={0|1|2|3}]  This performs optimization for SPARC64 GP2 series.

                        0,1: This performs optimization suitable for SPARC64
GP2.

                        2: This generates -Keval option in addition to
                           -Kfast_GP2=1.

                        3: <Fortran>
                           This generates -Kpreex option in addition
                           to -Kfast_GP2=2.
                           <C>
                           This generates -Kcrossfile option in
                           addition to  -Kfast_GP2=1.

-KFMADD			This option specifies use of the combined multiply-
			add/subtract floating-point instructions.

-Kfrecipro		This option specifies to convert a floating point
			division into multiplication by the reciprocal.

-Kfuse			Fuses neighboring loops.

-KGREG			The global registers g2 through g7 (when -KV9 option is
			available, g2,g3,g6,g7 are used) are subject to register
			allocation in the compile stage.

-KGREG_SYSTEM		The global registers g5 through g7 are subject to register
			allocation in the compile stage.

-Kgs			Performs global instruction scheduling.

-Kilfunc		This option replaces several and double precision
			mathematical functions,sin,cos,log10,log and exp with
			compiler builtin functions.

-Klargepage		Specifies to generate executable program which utilizes 
			Parallelnavi largepage facility.

-Kmemalias		In case of indirect memory access through pointer, when the
			accessing types are different, no memory alias is assumed.

-KNOFMADD		This option suppresses use of the combined multiply-
			add/subtract floating-point instructions.

-Knoprefetch		Suppresses use the prefetch instruction.

-Knounroll		Prevents loop unrolling optimizations.

-Knovfunc		Suppresses to change the intrinsic function
			(including power operation) to a multi-operation function. 

-Kpreex			This option specifies the optimization by moving the
			evaluation of invariant expressions beyond branch.

-Kpopt			This option specifies the optimization data pointed to by
			pointers using a limited interpretation that the areas
			referred by pointers are only referred by pointers.

-Kpg			Generates instructions to produce a profile file for
			subsequent optimization (global instruction scheduling
			etc.).

-Kprefetch[={1|2|3|4}]	Generate prefetch instruction correspond to each prefetch
			level.

			1: Basic level prefetch for array elements only
			   inner-most loop.

			2: In addition to the -Kprefetch=1, generates the
			   prefetch instruction for array elements within the
			   loop pre-header which access the first iteration in
			   the loop.

			3: In addition to the -Kprefetch=2, when the stride of
			   access for array elements are larger than cache line
			   size, compiler generates prefetch instruction for
			   each cache line size access.

			4: Maximum level for generating prefetch instruction.
			   In addition to -Kprefetch=3, compiler generates the
			   prefetch instruction for array elements which access
			   in the outer loop.

-Kprefetch_cache_level=[1/2/3]
			LEVEL-1: Generate prefetch instructions in order
				 for the data to reside in the primary cache.

			LEVEL-2: Generate prefetch instructions in order
				 for the data to reside only in the secondary cache.
                                
			LEVEL-3: Generate both LEVEL-1 and -2 prefetch
				 instructions.

-Kprefetch_infer	For data prefetch control, in cases where 
			the stride of array accesses is not clear from
			static analysis of the source files, 
			the compiler is told to use its internal 
			heuristics for the addresses that have been
			determined by the compiler as prefetch 
			addresses.

-Kprefetch_iteration=N	Generate the prefetch instruction of the data which is 
			referred after N iterations.

-Kprefetch_line=N	Generate the prefetch instruction to get the data located
			N times bytes in a line of the primary data cache ahead of
			the address a neighboring load or store instruction points.

-Kprefetch_line_L2=N	Same to -Kprefetch_line=N, except the data reside only
			in the secondary cache.

-Kpreload		Moves load instructions across branches.

-Kpreschedule_length[=N]
			When -O5 is used, the instruction scheduler 
			works twice, before and after the register allocation, 
			which are named pre-pass and post-pass
			scheduling respectively. -Kpreschedule_length[=N] 
			can control how aggressively pre-pass scheduling works.
			The unit is the upper limit of the distance an
			instruction can move from its original place.

-Kpu[=file]		This performs optimization (global instruction scheduling,
			etc.) using program runtime profile information obtained
			by specifying -Kpg option.

-Kunroll[=N]		Performs loop unrolling.  N means upper limit of unrolling
			expansion number, whose value should be from 2 to 9999.

-Kuse_rodata		This option specifies that string  constant, floating  point
			constant and initialization value of aggregate type local
			storage variable are allocated to read-only data section.

-Kxi=N			Inline expansion, instead of function calls, is performed.
			Expanded function is selected by result of profiler. N is the
			percentage that means increased object size.

-O[level]		Specifies the optimization level.

			0: No optimization.

			1: Basic optimization.

			2: Loop unrolling in addition to -O1.
 
			3: Global instruction scheduling and restructuring of
			   nested loop in addition to -O2.

			4: Enhanced optimization of loop restructuring rather 
			   than -O3.

			5: Creates an object program by applying further
			   optimizations of register allocation in addition    
			   to -O4.

-SSL2			The whole set of routines from SSL II, SSL II Thread-
			Parallel Capabilities and BLAS/LAPACK becomes part of
			link-edit libraries.

-x-			Inline expansion, instead of function calls, is performed
			for all functions defined in the C source code.

-x stm_no		Applying optimization for inline expansion of user-defined
			external procedure having fewer than specified number of
			execution statements in the stm_no arguments.

-x dir=directory_name	Performs inline expansion of procedures defined in the files
			under the directory specified and in the file currently
			being compiled.

-----------------------------------------------------------------------------------
[2] Sun C, C++ and Fortran Sun ONE Studio 8 flag description    (Dec.22 2003)

Compiler options	Remark
-----------------------------------------------------------------------------------

-array_pad_rows,<n>	Enable padding of arrays by n.
(Fortran)

cc			Invoke the Sun ONE Studio 8 Compiler C 
(C compiler)

CC			Invoke the Sun ONE Studio 8 Compiler C++ 
(C++ compiler)

-crit			Enable optimization of critical control paths 
(optimizer)

-dalign			Assume data is naturally aligned. 
(C, C++, Fortran)

-Dalloca=__builtin_alloca
(Portability flag)	Portability switch, used for 176.gcc:
			allow use of compiler's internal builtin alloca.

-depend			Synonym for -xdepend.
(Fortran)

-DHOST_WORDS_BIG_ENDIAN	Portability switch, used for 176.gcc:
(Portability flag)	controls how bytes are numbered within a word. 

-D__MATHERR_ERRNO_DONTCARE	
(C)			Allows the compiler to assume that your code
			does not rely on setting of the errno variable.

-DSPEC_CPU2000_SOLARIS	Portability switch, used for 253.perlbmk:
(Portability flag)	selects header files and code paths compatible
			with Solaris.
			

-DSUN			Portability switch, used for 186.crafty:
(Portability flag)	selects header files and code paths
			compatible with Solaris. 

-DSYS_HAS_CALLOC_PROTO	Portability switch, used for 254.gap:
(Portability flag)	allows use of the designated prototype.

-DSYS_HAS_IOCTL_PROTO	Portability switch, used for 254.gap:
(Portability flag)	allows use of the designated prototype.

-DSYS_HAS_SIGNAL_PROTO	Portability switch, used for 254.gap: 
(Portability flag)	allows use of the designated prototype.

-DSYS_HAS_TIME_PROTO	Portability switch, used for 254.gap:
(Portability flag)	allows use of the designated prototype.

-DSYS_IS_USG		Portability switch, used for 254.gap:
(Portability flag)	selects code compatible with USG-based systems. 

-e			Portability switch, used for 178.galgel:
(Portability, Fortran)	allows source lines to be up to 132 characters long. 

f90			Invoke the Sun ONE Studio 8 Compiler Fortran 90
(Fortran compiler)

-fast			A convenience option, this switch selects the
(C)			following switches that are defined elsewhere
			in this page: 

			-D__MATHERR_ERRNO_DONTCARE
			-dalign
			-fns
			-fsimple=2
			-fsingle
			-ftrap=%none
			-xalias_level=basic
			-xbuiltin=%all
			-xdepend
			-xlibmil
			-xO5
			-xprefetch=auto,explicit
			-xtarget=native

-fast			A convenience option, this switch selects the
(C++)			following switches that are defined elsewhere
			in this page: 

			-dalign
			-fns
			-fsimple=2
			-ftrap=%none
			-xbuiltin=%all
			-xlibmil
			-xlibmopt
			-xO5
			-xtarget=native

-fast			A convenience option, this switch selects the
(Fortran)		following switches that are defined elsewhere
			in this page: 

			-dalign
			-depend
			-fns
			-fsimple=2
			-ftrap=common
			-xlibmil
			-xlibmopt
			-xO5
			-xpad=local
			-xprefetch=auto,explicit
			-xtarget=native
			-xvector=yes

-fixed			Portability switch, used for 178.galgel:
(Portability, Fortran)	assume fixed-format source input.

-fns			Selects faster (but nonstandard) handling of
(C, C++, Fortran)	floating point arithmetic exceptions and
			gradual underflow.

-fsimple=<n>		Controls simplifying assumptions for
(C, C++, Fortran)	floating point arithmetic:

	    -fsimple=0	Permits no simplifying assumptions.
			Preserves strict IEEE 754 conformance. 

	    -fsimple=1	Allows the optimizer to assume: 
			The IEEE 754 default rounding/trapping
			modes do not change after process initialization. 
			Computations producing no visible result other
			than potential floating-point exceptions may
			be deleted. Computations with Infinity or NaNs
			as operands need not propagate NaNs to their
			results. For example, x*0 may be replaced by 0. 
			Computations do not depend on sign of zero. 

	    -fsimple=2	Permits more aggressive floating point
			optimizations that may cause programs to
			produce different numeric results due to
			changes in rounding. Even with -fsimple=2,
			the optimizer still is not permitted to
			introduce a floating point exception in a
			program that otherwise produces none. 

-fsingle		Evaluate float expressions as single precision. 
(C)

-ftrap=common		Sets the IEEE 754 trapping mode to common exceptions
(C, C++, Fortran)	(invalid, division by zero, and overflow).

-ftrap=%none		Turns off all IEEE 754 trapping modes.
(C, C++, Fortran)

-library=iostream	Portability switch, used for 252.eon:
(Portability, C++)	allow use of the classic iostream library.

-ll2amm			Include a library containing chip specific
(linker)		memory routines.

-lm			Include the math library.
(linker)

-lmopt			Include the optimized math library. This option
(linker)		usually generates faster code, but may produce
			slightly different results. Usually these results
			will differ only in the last bit.

-lprism32 (linker)	Library to enable Intimate Shared Memory (ISM)
(linker)		(4MB page) usage.

-noex			Do not allow C++ exceptions. A throw specification
(C++)			on a function is accepted but ignored; the compiler
			does not generate exception code.

-O			A synomym for -xO3.
(Fortran)

-Qoption <phase> <flags>
			Pass flags along to compiler phase:

			f90comp	Fortran first pass
			iropt	Global optimizer
			cg	Code Genetator

-Qoption cg <flags>	See -Wc,<flags> below. (The code generator
(code generator)	phase is addressed via -Qoption cg in
			Fortran and C++; and via -Wc in C.)

-Qoption cg -Qeps:enabled=1
(code generator)	See -Wc,-Qeps:enabled=1


-Qoption cg -Qeps:ws=<n>
(code generator)	See -Wc,-Qeps:ws=<n>

-Qoption cg -Qgsched-T<n>
(code generator)	See -Wc,-Qgsched-T<n>

-Qoption cg -Qgsched-trace_late=1
(code generator)	See -Wc,-Qgsched-trace_late=1

-Qoption iropt <flags>	See -W2,<flags> below. (The optimizer can
(optimizer)		be addressed either via Qoption iropt in
			Fortran and C++; or via -W2 in C.)

-Qoption iropt -Addint:sf=<n>		
(optimizer)		When considering whether to interchange loops, set memory
			store operation weight to n. A higher value of n indicates
			a greater performance cost for stores.

-Qoption iropt -Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]
(optimizer)		See -W2,[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]

-Qoption iropt -Apf:pdl=1	
(optimizer)		Do prefetching for one-level indirect memory references. 

-Qoption iropt -Atile:skewp[:b<n>]
(optimizer)		Perform loop tiling which is enabled by loop skewing.
			Loop skewing is a transformation that transforms a
			non-fully interchangeable loop nest to a fully
			interchangeable loop nest. The optional b<n> sets the
			tiling block size to n.

-Qoption iropt -Aujam:inner=g		
(optimizer)		Increase the probability that small-trip-count inner
			loops will be fully unrolled.

-Qoption iropt -Mt<n>	See -W2,-Mt<n>

RM_SOURCES = lapak.f90	This option allows building the benchmark 178.galgel
(SPEC tools)		without its copy of the lapak sources; instead,
			the lapak entry points in the sunperf library are used.

rm -rf ./feedback.profile ./SunWS_cache		
(Unix)			Remove any profile feedback information from previous runs. 

-stackvar		Allocate routine local variables on the stack.
(Fortran)

submit=echo 'pbind -b...' > dobmk; sh dobmk
(SPEC tools, Unix)	When running multiple copies of benchmarks, the SPEC
			config file feature submit is sometimes used to
			cause individual jobs to be bound to specific processors:

	       submit=  causes the SPEC tools to use this line when
			submitting jobs.
       echo ...> dobmk  causes the generated commands to be written to a
			file, namely dobmk. 
	      pbind -b  causes this copy's processes to be bound to the CPU
			specified by the expression that follows it. See the
			config file used in the submission for the exact
			syntax, which tends to be cumbersome because of the
			need to carefully quote parts of the expression.
			When all expressions are evaluated, each CPU ends up
			with exactly one copy of each benchmark. The pbind
			expression may include: 
			$SPECUSERNUM:    the SPEC tools-assigned number for
					 this copy of the benchmark. 
			psrinfo:         find out what processors are available
			grep off-line:   search the psrinfo output for
					 information regarding off-line cpus 
			awk...print \$1: Pick out the line corresponding to
					 this copy of the benchmark and use
					 the CPU number mentioned at the
					start of this line. 
	      sh dobmk  actually runs the benchmark. 

-W<phase>,<flags>	Pass flags along to compiler phase (2=optimizer,
			c=code genetator).

-W2,-Abcopy		Increase the probability that the compiler will
(optimizer)		perform memcpy/memset transformations. 

-W2,-Abopt		Enable aggressive optimizations of all branches.
(optimizer)

-W2,-Aheap		Allows the compiler to recognize malloc-like
(optimizer)		memory allocation functions.

-W2,-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>][:mi][:recursion=1]
(optimizer)		Control the optimizer's loop inliner:

     (without a value)	Perform Inter-Procedural Analysis (IPA) -based inlining.

		cp=<n>	The minimum call site frequency counter in order to
			consider a routine for inlining.

		cs=<n>	Set inline callee size limit to n. The unit roughly
			corresponds to the number of instructions.

	       inc=<n>	The inliner is allowed to increase the size of the
			program by up to n%.

	       irs=<n>	Allow routines to increase by up to n. The unit
			roughly corresponds to the number of instructions. 

		    mi	Perform maximum inlining (without considering code
			size increase). 

	   recursion=1	Allow routines that are called recursively to still
			be eligible for inlining. 

-W2,-crit		Enable optimization of critical control paths.
(optimizer)

-W2,-Amemopt:arrayloc	Reconstruct array subscripts during memory
(optimizer)		allocation merging and data layout program
			transformation.

-W2,-Apf:llist=<n>:noinnerllist		
(optimizer)		Do speculative prefetching for link-list data structures:
			llist=<n> perform prefetching n iterations ahead
			noinnerllist do not attempt for innermost loops. 

-W2,-Ashort_ldst	Convert multiple short memory operations into
(optimizer)		single long memory operations.

-W2,-Aunroll		Enables outer-loop unrolling.
(optimizer)

-W2,-Mr<n>		Maximum code increase due to inlining is limited
(optimizer)		to n triples.

-W2,-Ms<n>		Maximum level of recursive inlining.
(optimizer)

-W2,-Mt<n>		The maximum size of a routine body eligible for
(optimizer)		inlining is limited to n triples.

-W2,-reroll=1		Turns on loop rerolling.
(optimizer)

-W2,-whole		Do whole program optimizations.
(optimizer)

-Wc,-Qdepgraph-early_cross_call=1	
(code generator)	There are several scheduling passes in the compiler.
			This option allows early passes to move instructions
			across call instructions.

-Wc,-Qeps:do_spec_load=1	
(code generator)	Allow generating speculative load during EPS.

-Wc,-Qeps:enabled=1	Use enhanced pipeline scheduling(EPS)
(code generator)	and selective scheduling algorithms for
			instruction scheduling. 

-Wc,-Qeps:rp_filtering_margin=100	
(code generator)	Turn off register pressure heuristics in EPS.

-Wc,-Qeps:ws=<n>	Set the EPS window size, that is, the number
(code generator)	of instructions it will consider across all
			paths when trying to find independent
			instructions to schedule a parallel group.
			Larger values may result in better run time,
			at the cost of increased compile time.

-Wc,-Qgsched-T<n>	Sets the aggressiveness of the trace
(code generator)	formation, where n is 4, 5, or 6. 
			The higher the value of n, the lower
			the branch probability needed to include
			a basic block in a trace.

-Wc,-Qgsched-trace_late=1
(code generator)	Turns on the late trace scheduler.

-Wc,-Qicache-chbab=1	Turn on optimization to reduce branch
(code generator)	after branch penalty: nops will be inserted
			to prevent one branch from occupying 
			the delay slot of another branch. 

-Wc,-Qipa:valueprediction	
(code generator)	Use profile feedback data to predict values and attempt
			to generate faster code along these control paths,
			even at the expense of possibly slower code along paths
			leading to different values. Correct code is generated
			for all paths.

-Wc,-Qiselect-funcalign=<n>
(code generator)	Do function entry alignment at n-byte boundaries. 


-Wc,-Qiselect-sw_pf_tbl_th=<n>	
(code generator)	Peels the most frequent test branches/cases off a switch
			until the branch probability reaches less than 1/n.
			This is effective only when profile feedback is used.

-Wc,-Qlp=<n>[-av=<n>][-t=<n>][-fa=<n>][-fl=<n>]
(code generator) 	Control irregular loop prefetching:

		lp=<n>	Turns the module on (1) or off (0)
			(default is on for F90; off for C/C++)

	       -av=<n>	Sets the prefetch look ahead distance, in bytes.
			Default is 256.

		-t=<n>	Sets the number of attempts at prefetching. If not
			specified, t=2 if -xprefetch_level=3 has been set;
			otherwise, defaults to t=1.

	       -fa=<n>	1=Force user settings to override internally computed values. 
    
	       -fl=<n>	1=Force the optimization to be turned on for all languages. 

-Wc,-Qms_pipe+intdivusefp	
(code generator)	In pipelined loops, use floating point divide
			instructions for signed integer division.

-Wc,-Qms_pipe+prefolim=<n>	
(code generator)	Set number of outstanding prefetches
			in pipelined loops to <n>

-Wc,-Qms_pipe+unoovf	Assert (to the pipeliner) that unsigned
(code generator)	int computations will not overflow. 

-Wc,-Qms_pipe-prefst	Turn off prefetching for stores in the pipeliner.
(code generator)

-Wc,-Qms_pipe-pref	Turn off prefetching within modulo scheduling.
(code generator)

-Wc,-Qpeep-Sh0		Reduce the probability that the compiler will hoist sethi
(code generator)	instructions out of loops. 

-xalias_level=[basic|std|strong]
(C)			Allows the compiler to perform type-based alias analysis
			at the specified alias level:

		 basic	Assume that memory references that involve
			different C basic types do not alias each other.

		   std	Assume aliasing rules described in the ISO 1999 C
			standard.

		strong	In addition to the restrictions at the std level,
			assume that pointers of type char * are used only
			to access an object of type char; and assume that
			there are no interior pointers.

-xalias_level=compatible
(C++)			Allows the compiler to assume that
			layout-incompatible types are not aliased.

-xarch=<a>		Limit the set of instructions the compiler may use
(C, C++, Fortran)	to generic, generic64, native, native64, v7, v8a,
			v8, v8plus, v8plusa, v8plusb, v9, v9a, v9b.
			Typical settings include:

				UltraSPARC-II, 32-bit mode: v8plusa
				UltraSPARC-II, 64-bit mode: v9a
				UltraSPARC-III, 32-bit mode: v8plusb
				UltraSPARC-III, 64-bit mode: v9b

			For more information, see the Fortran User's Guide
			at docs.sun.com

-xbuiltin=%all		Substitute intrinsic functions or inline system
(C, C++)		functions where profitable for performance. 

-xchip=<c>		Specifies the target processor for use by the
(C, C++, Fortran)	optimizer. c must be one of: generic, generic64,
			native, native64, old, super, super2, micro, micro2,
			hyper, hyper2, powerup, ultra, ultra2, ultra2i,
			ultra3, ultra3cu, 386, 486, pentium, pentium_pro,
			603, 604.

-xcache=<c>		Defines the cache properties for use by the
(C, C++, Fortran)	optimizer. c must be one of  the following:
			native (set parameters for the host environment)

				* s1/l1/a1
				* s1/l1/a1:s2/l2/a2
				* s1/l1/a1:s2/l2/a2:s3/l3/a3

			The si/li/ai are defined as follows:

				si The size of the data cache
				at level i, in kilobytes.
				li The line size of the data cache
				at level i, in bytes.
				ai The associativety of the data cache
				at level i.

-xdepend		Analyze loops for inter-iteration data dependencies,
(C, Fortran)		and do loop restructuring.

-xinline=		Turn off inlining.
(C, C++, Fortran)

-xipo[=2]		Perform optimizations across all object files in the
(C, C++, Fortran)	link step:

			0=off
			1=on
			2=performs whole-program detection and analysis

-xlibmil		Use inline expansion for math library, libm.
(C, C++, Fortran)

-xlibmopt		Select the optimized math library.
(C++, Fortran)

-xlic_lib=sunperf	Link with Sun supplied licensed sunperf library.
(C, C++, Fortran)

-xlinkopt		Perform link-time optimizations, such as branch
(C, C++, Fortran)	optimization and cache coloring.

-xO<n>			Specify optimization level n:
(C, C++, Fortran)

		  -xO1	Does only basic local optimizations (peephole).

		  -xO2	Do basic local and global optimizations, such as
			induction variable elimination, common
			subexpression elimination, constant propogation,
			register allocation, and basic block merging. 

		  -xO3	Add global optimizations at the function level,
			loop unrolling, and software pipelining.

		  -xO4	Adds automatic inlining of functions in the
			same file.

		  -xO5	Uses optmization algorithms that may take
			significantly more compilation time or that
			do not have as high a probability of improving
			execution time, such as speculative code motion.

-xpad=common[:<n>]	If multiple same-sized arrays are placed in common,
(Fortran)		insert padding between them for better use of cache.
			n specifies the amount of padding to apply, in units
			that are the same size as the array elements. If no
			parameter is specified then the compiler selects one
			automatically.

-xpad=local		Pad local variables, for better use of cache.
(Fortran)

-xprefetch=auto,explicit
(C, C++, Fortran)	Allow generation of prefetch instructions. -xprefetch and
			-xprefetch=yes is a synonym for -xprefetch=auto,explicit.

-xprefetch=latx:<n>	Adjust the compiler's assumptions about prefetch latency
(C, C++, Fortran)	by the specified factor. Typically values in the range of
			0.5 to 2.0 will be useful. A lower number might indicate
			that data will usually be cache resident; a higher number
			might indicate a relatively larger gap between the
			processor speed and the memory speed (compared to the
			assumptions built into the compiler).

-xprefetch=no%auto	Turn off prefetch instruction generation.
(C, C++, Fortran) 

-xprefetch_level=<n>	Control the level of searching that the compiler does
(C, C++, Fortran)	for prefetch opportunities by setting n to 1, 2, or 3,
			where higher numbers mean to do more searching.
			The default is 2.

-xprofile=collect:./feedback
(C, C++, Fortran)	Collect profile data for feedback-directed optimization,
			and store it in a sub directory of the current directory,
			named ./feedback.

-xprofile=use:./feedback
(C, C++, Fortran)	Use data collected for profile feedback. Look for it in
			a subdirectory of the current directory, named ./feedback.

-xregs=syst		Allows use of the system reserved registers %g6 and
(C, C++, Fortran)	%g7, and %g5 if not already allowed by -xarch value.

-xrestrict		Treat pointer-valued function parameters as
(C)			restricted pointers.

-xsafe=mem		Enables the use of non-faulting loads when used in
(C, C++, Fortran)	conjunction with -xarch=v8plus. Assumes that no memory
			based traps will occur.

-xsfpconst		Represents unsuffixed floating-point constants
(C, C++, Fortran)	as single precision.

-xtarget=[system_name]	Selects options appropriate for the system where
(C, C++, Fortran)	the compile is taking place, including architecture,
			chip, and cache sizes. (These can also be controlled
			separately, via -xarch, -xchip, and -xcache, respectively.) 

-xunroll=n		Specifies whether or not the compiler optimizes
(C, C++, Fortran)	(unrolls) loops.  n is a positive integer. When n is
			1, it is a command and the compiler unrolls no loops.
			When n is greater than 1, -xunroll=n merely suggests
			to the compiler that it unroll loops n times.

-xvector		Allow the compiler to transform math library calls within
(C, Fortran)		loops into calls to the vector math library.

-----------------------------------------------------------------------------------
[3] Environment Variables

Flag			Remark
-----------------------------------------------------------------------------------
LD_LIBRARY_PATH=<p>	Specify the locations to resolve dynamic link dependencies.

PRISM_HEAP=<n>		Set the heap size limit for large pages.

PRISM_MODE=2		Large page mode: Attempt to put text, data and heap
			all into large pages.

ulimit -s unlimited	Allow stack size to grow without limit.

-----------------------------------------------------------------------------------
[4] Kernel Parameters (/etc/system)

System Tunable		Remark
-----------------------------------------------------------------------------------
autoup			The frequency of file system sync operations.

consistent_coloring	Controls the page coloring policy. It can be set to
			one of the following:

		     0	(default) dynamic (uses various vaddr bits)
		     1	static (virtual=paddr)

shmsys:shminfo_shmmax	Maximum size of system V shared memory segment that
			can be created.

shmsys:shminfo_shmmin	Minimum size of system V shared memory segment that 
			can be created.

shmsys:shminfo_shmmni	System wide limit on number of shared memory
			segments that can be created.

shmsys:shminfo_shmseg	Limit on the number of shared memory segments that
			any one process can create.

tune_t_fsflushr		The number of seconds between fsflush invocations for
			checking dirty memory.

-----------------------------------------------------------------------------------
[5] Configuration file for large page manager (/etc/opt/FJSVpnrm/lpg.conf)

Tunable 		Remark
-----------------------------------------------------------------------------------
JOB=size [unit]		Specify the large page memory resource size for
			in-job processes. "unit" can be T (terra byte),
			G (giga byte), or M (mega byte) after size.

LIMITPOLICY=[job|proc]	Define the memory allocation/limitation type.
			Default is job.

		   job	Limits for each job in Node.
		  proc	Limits for each process resource set.

SHMSEGSIZE=size[unit]	Size of large page segment. "unit" can be M for
			mega-byte and G for giga-byte.

TSS=size[unit]		Size of total memory, to be used for large page
			segments. At start of the system, this amount of
			memory is reserved and initialized. "unit" can be M
			for mega-byte and G for giga-byte.

--------------------------------------------------------------------------------
[6] Commands for feedback control

Command			Remark
--------------------------------------------------------------------------------
Paralllenavi compiler:
fdo_pre0 = rm -rf `pwd`*.f.d
fdo_pre0 = rm -rf `pwd`*.fbk
			remove the profile data generated at the last
			feedback-optimized compilation.

Sun Studio 8 compiler:
fdo_pre0 = rm -rf `pwd`/..feedback.profile
fdo_pre0 = rm -rf `pwd`/SunWS_cache
			remove the profile data generated at the last
			feedback-optimized compilation.

--------------------------------------------------------------------------------