Spru187u Optimizingcompiler
Spru187u Optimizingcompiler
User's Guide
Preface...................................................................................................................................... 11
1 Introduction to the Software Development Tools ................................................................... 14
1.1 Software Development Tools Overview ................................................................................ 15
1.2 C/C++ Compiler Overview ................................................................................................ 16
1.2.1 ANSI/ISO Standard ............................................................................................... 16
1.2.2 Output Files ....................................................................................................... 17
1.2.3 Compiler Interface ................................................................................................ 17
1.2.4 Utilities ............................................................................................................. 17
2 Using the C/C++ Compiler .................................................................................................. 18
2.1 About the Compiler ........................................................................................................ 19
2.2 Invoking the C/C++ Compiler ............................................................................................ 19
2.3 Changing the Compiler's Behavior With Options ...................................................................... 20
2.3.1 Frequently Used Options ........................................................................................ 30
2.3.2 Miscellaneous Useful Options .................................................................................. 31
2.3.3 Run-Time Model Options ........................................................................................ 32
2.3.4 Selecting Target CPU Version (--silicon_version Option) ................................................... 33
2.3.5 Symbolic Debugging and Profiling Options ................................................................... 34
2.3.6 Specifying Filenames ............................................................................................ 35
2.3.7 Changing How the Compiler Interprets Filenames ........................................................... 35
2.3.8 Changing How the Compiler Processes C Files ............................................................. 36
2.3.9 Changing How the Compiler Interprets and Names Extensions ........................................... 36
2.3.10 Specifying Directories ........................................................................................... 36
2.3.11 Assembler Options .............................................................................................. 37
2.3.12 Dynamic Linking ................................................................................................. 38
2.3.13 Deprecated Options ............................................................................................. 39
2.4 Controlling the Compiler Through Environment Variables ........................................................... 39
2.4.1 Setting Default Compiler Options (C6X_C_OPTION) ....................................................... 39
2.4.2 Naming an Alternate Directory (C6X_C_DIR) ................................................................ 40
2.5 Precompiled Header Support ............................................................................................ 41
2.5.1 Automatic Precompiled Header ................................................................................. 41
2.5.2 Manual Precompiled Header .................................................................................... 41
2.5.3 Additional Precompiled Header Options ....................................................................... 41
2.6 Controlling the Preprocessor ............................................................................................. 42
2.6.1 Predefined Macro Names ....................................................................................... 42
2.6.2 The Search Path for #include Files ............................................................................ 43
2.6.3 Generating a Preprocessed Listing File (--preproc_only Option) .......................................... 44
2.6.4 Continuing Compilation After Preprocessing (--preproc_with_compile Option) .......................... 44
2.6.5 Generating a Preprocessed Listing File With Comments (--preproc_with_comment Option) .......... 44
2.6.6 Generating a Preprocessed Listing File With Line-Control Information (--preproc_with_line
Option) ............................................................................................................. 44
2.6.7 Generating Preprocessed Output for a Make Utility (--preproc_dependency Option) ................... 45
2.6.8 Generating a List of Files Included With the #include Directive (--preproc_includes Option) .......... 45
2.6.9 Generating a List of Macros in a File (--preproc_macros Option) .......................................... 45
2.7 Understanding Diagnostic Messages ................................................................................... 45
2.7.1 Controlling Diagnostics .......................................................................................... 46
List of Figures
1-1. TMS320C6000 Software Development Flow .......................................................................... 15
3-1. Software-Pipelined Loop.................................................................................................. 60
4-1. 4-Bank Interleaved Memory ............................................................................................ 127
4-2. 4-Bank Interleaved Memory With Two Memory Spaces ............................................................ 127
7-1. Char and Short Data Storage Format ................................................................................. 189
7-2. 32-Bit Data Storage Format ............................................................................................ 190
7-3. Single-Precision Floating-Point Char Data Storage Format ........................................................ 190
7-4. 40-Bit Data Storage Format Signed __int40_t or 40-bit long....................................................... 191
7-5. Unsigned 40-bit __int40_t or long ..................................................................................... 191
7-6. 64-Bit Data Storage Format Signed 64-bit long ...................................................................... 192
7-7. Unsigned 64-bit long ..................................................................................................... 192
7-8. Double-Precision Floating-Point Data Storage Format ............................................................. 193
7-9. Bit-Field Packing in Big-Endian and Little-Endian Formats ........................................................ 194
7-10. Register Argument Conventions ....................................................................................... 198
7-11. Autoinitialization at Run Time .......................................................................................... 235
7-12. Initialization at Load Time ............................................................................................... 236
7-13. Autoinitialization at Run Time in EABI Mode ......................................................................... 237
7-14. Initialization at Load Time in EABI Mode ............................................................................. 241
7-15. Constructor Table for EABI Mode ..................................................................................... 241
7-16. Format of Initialization Records in the .cinit Section ................................................................ 242
7-17. Format of Initialization Records in the .pinit Section ................................................................ 244
List of Tables
2-1. Processor Options ........................................................................................................ 20
2-2. Optimization Options ..................................................................................................... 20
2-3. Debug Options ............................................................................................................. 20
2-4. Include Options ........................................................................................................... 21
2-5. Control Options ........................................................................................................... 21
2-6. Advanced Debug Options ................................................................................................ 21
2-7. Language Options ......................................................................................................... 21
2-8. Parser Preprocessing Options ........................................................................................... 22
2-9. Predefined Symbols Options ............................................................................................ 22
2-10. Diagnostics Options ....................................................................................................... 22
2-11. Run-Time Model Options ................................................................................................. 23
2-12. Advanced Optimization Options ........................................................................................ 24
2-13. Entry/Exit Hook Options .................................................................................................. 24
2-14. Feedback Options ........................................................................................................ 24
2-15. Library Function Assumptions Options ................................................................................. 24
2-16. Assembler Options ........................................................................................................ 25
2-17. File Type Specifier Options .............................................................................................. 25
2-18. Directory Specifier Options ............................................................................................... 25
2-19. Default File Extensions Options ......................................................................................... 26
2-20. Dynamic Linking Support Compiler Options ........................................................................... 26
2-21. Command Files Options .................................................................................................. 26
2-22. MISRA-C:2004 Options ................................................................................................... 26
2-23. Performance Advisor Options ............................................................................................ 26
2-24. Linker Basic Options ...................................................................................................... 27
2-25. File Search Path Options ................................................................................................. 27
2-26. Command File Preprocessing Options ................................................................................. 27
2-27. Diagnostic Options ........................................................................................................ 27
2-28. Linker Output Options ..................................................................................................... 27
2-29. Symbol Management Options ........................................................................................... 29
2-30. Run-Time Environment Options ......................................................................................... 29
2-31. Link-Time Optimization Options ......................................................................................... 29
2-32. Miscellaneous Options .................................................................................................... 29
2-33. Dynamic Linking Support Options ....................................................................................... 30
2-34. Compiler Options For Dynamic Linking................................................................................. 38
2-35. Linker Options For Dynamic Linking .................................................................................... 38
2-36. Compiler Backwards-Compatibility Options Summary ............................................................... 39
2-37. Predefined C6000 Macro Names ....................................................................................... 42
2-38. Raw Listing File Identifiers ............................................................................................... 49
2-39. Raw Listing File Diagnostic Identifiers .................................................................................. 49
3-1. Options That You Can Use With --opt_level=3 ........................................................................ 70
3-2. Selecting a File-Level Optimization Option ............................................................................ 70
3-3. Selecting a Level for the --gen_opt_info Option ....................................................................... 70
3-4. Selecting a Level for the --call_assumptions Option .................................................................. 71
3-5. Special Considerations When Using the --call_assumptions Option ............................................... 72
4-1. Options That Affect the Assembly Optimizer ......................................................................... 107
4-2. Assembly Optimizer Directives Summary ............................................................................ 113
5-1. Initialized Sections Created by the Compiler for COFFABI ........................................................ 143
5-2. Initialized Sections Created by the Compiler for EABI .............................................................. 144
5-3. Uninitialized Sections Created by the Compiler for Both ABIs ..................................................... 144
6-1. TMS320C6000 C/C++ COFF ABI Data Types ....................................................................... 149
6-2. TMS320C6000 C/C++ EABI Data Types ............................................................................. 149
6-3. Valid Control Registers .................................................................................................. 150
6-4. GCC Language Extensions ............................................................................................. 178
7-1. Data Representation in Registers and Memory ..................................................................... 188
7-2. Register Usage .......................................................................................................... 196
7-3. C6000 C/C++ Intrinsics Support by Device .......................................................................... 205
7-4. TMS320C6000 C/C++ Compiler Intrinsics ............................................................................ 212
7-5. TMS320C6400, C6400+, C6740, and C6600 C/C++ Compiler Intrinsics ........................................ 214
7-6. TMS320C6400+, C6740, and C6600 C/C++ Compiler Intrinsics .................................................. 216
7-7. TMS320C6700, C6700+, C6740, and C6600 C/C++ Compiler Intrinsics ........................................ 217
7-8. TMS320C6600 C/C++ Compiler Intrinsics ............................................................................ 218
7-9. Vector-in-Scalar Support C/C++ Compiler v7.2 Intrinsics .......................................................... 224
7-10. Summary of Run-Time-Support Arithmetic Functions .............................................................. 232
8-1. The mklib Program Options ............................................................................................ 266
Notational Conventions
This document uses the following conventions:
• Program listings, program examples, and interactive displays are shown in a special typeface.
Interactive displays use a bold version of the special typeface to distinguish commands that you enter
from items that the system displays (such as prompts, command output, error messages, etc.).
Here is a sample of C code:
#include <stdio.h>
main()
{ printf("hello, cruel world\n");
}
• In syntax descriptions, the instruction, command, or directive is in a bold typeface and parameters are
in an italic typeface. Portions of a syntax that are in bold should be entered as shown; portions of a
syntax that are in italics describe the type of information that should be entered.
• Square brackets ( [ and ] ) identify an optional parameter. If you use an optional parameter, you specify
the information within the brackets. Unless the square brackets are in the bold typeface, do not enter
the brackets themselves. The following is an example of a command that has an optional parameter:
• Braces ( { and } ) indicate that you must choose one of the parameters within the braces; you do not
enter the braces themselves. This is an example of a command with braces that are not included in the
actual syntax but indicate that you must specify either the --rom_model or --ram_model option:
• In assembler syntax statements, column 1 is reserved for the first character of a label or symbol. If the
label or symbol is optional, it is usually not shown. If it is a required parameter, it is shown starting
against the left margin of the box, as in the example below. No instruction, command, directive, or
parameter, other than a symbol or label, can begin in column 1.
• Some directives can have a varying number of parameters. For example, the .byte directive. This
syntax is shown as [, ..., parameter].
• The TMS320C6200™ core is referred to as C6200. The TMS320C6400 core is referred to as C6400.
The TMS320C6700 core is referred to as C6700. TMS320C6000 and C6000 can refer to any of
C6200, C6400, C6400+, C6700, C6700+, C6740, or C6600.
Related Documentation
You can use the following books to supplement this user's guide:
ANSI X3.159-1989, Programming Language - C (Alternate version of the 1989 C Standard), American
National Standards Institute
ISO/IEC 9899:1989, International Standard - Programming Languages - C (The 1989 C Standard),
International Organization for Standardization
ISO/IEC 9899:1999, International Standard - Programming Languages - C (The C Standard),
International Organization for Standardization
ISO/IEC 14882-1998, International Standard - Programming Languages - C++ (The C++ Standard),
International Organization for Standardization
The C Programming Language (second edition), by Brian W. Kernighan and Dennis M. Ritchie,
published by Prentice-Hall, Englewood Cliffs, New Jersey, 1988
The Annotated C++ Reference Manual, Margaret A. Ellis and Bjarne Stroustrup, published by Addison-
Wesley Publishing Company, Reading, Massachusetts, 1990
C: A Reference Manual (fourth edition), by Samuel P. Harbison, and Guy L. Steele Jr., published by
Prentice Hall, Englewood Cliffs, New Jersey
Programming Embedded Systems in C and C++, by Michael Barr, Andy Oram (Editor), published by
O'Reilly & Associates; ISBN: 1565923545, February 1999
Programming in C, Steve G. Kochan, Hayden Book Company
The C++ Programming Language (second edition), Bjarne Stroustrup, published by Addison-Wesley
Publishing Company, Reading, Massachusetts, 1990
Tool Interface Standards (TIS) DWARF Debugging Information Format Specification Version 2.0,
TIS Committee, 1995
DWARF Debugging Information Format Version 3, DWARF Debugging Information Format Workgroup,
Free Standards Group, 2005 (http://dwarfstd.org)
The TMS320C6000™ is supported by a set of software development tools, which includes an optimizing
C/C++ compiler, an assembly optimizer, an assembler, a linker, and assorted utilities.
This chapter provides an overview of these tools and introduces the features of the optimizing C/C++
compiler. The assembly optimizer is discussed in Chapter 4. The assembler and linker are discussed in
detail in the TMS320C6000 Assembly Language Tools User's Guide.
C/C++
source
files
Macro
source C/C++ Linear
files compiler assembly
Assembler Assembly
Archiver
source optimizer
Macro Assembly
library Assembler optimized
file
Debugging
Library-build tools
Object
Archiver utility
files
Run-time-
Library of support
object library
files Link step
Executable
object file
Hex-conversion
utility
The following list describes the tools that are shown in Figure 1-1:
• The assembly optimizer allows you to write linear assembly code without being concerned with the
pipeline structure or with assigning registers. It accepts assembly code that has not been register-
allocated and is unscheduled. The assembly optimizer assigns registers and uses loop optimization to
turn linear assembly into highly parallel assembly that takes advantage of software pipelining. See
Chapter 4.
• The compiler accepts C/C++ source code and produces C6000 assembly language source code. See
Chapter 2.
• The assembler translates assembly language source files into machine language relocatable object
files. The TMS320C6000 Assembly Language Tools User's Guide explains how to use the assembler.
• The linker combines relocatable object files into a single absolute executable object file. As it creates
the executable file, it performs relocation and resolves external references. The linker accepts
relocatable object files and object libraries as input. See Chapter 5. The TMS320C6000 Assembly
Language Tools User's Guide provides a complete description of the linker.
• The archiver allows you to collect a group of files into a single archive file, called a library.
Additionally, the archiver allows you to modify a library by deleting, replacing, extracting, or adding
members. One of the most useful applications of the archiver is building a library of object files. The
TMS320C6000 Assembly Language Tools User's Guide explains how to use the archiver.
• The run-time-support libraries contain the standard ISO C and C++ library functions, compiler-utility
functions, floating-point arithmetic functions, and C I/O functions that are supported by the compiler.
See Chapter 8.
You can use the library-build utility to build your own customized run-time-support library. See
Section 8.5. Source code for the standard run-time-support library functions for C and C++ are
provided in the self-contained rtssrc.zip file.
• The hex conversion utility converts an object file into other object formats. You can download the
converted file to an EPROM programmer. The TMS320C6000 Assembly Language Tools User's Guide
explains how to use the hex conversion utility and describes all supported formats.
• The absolute lister accepts linked object files as input and creates .abs files as output. You can
assemble these .abs files to produce a listing that contains absolute, rather than relative, addresses.
Without the absolute lister, producing such a listing would be tedious and would require many manual
operations. The TMS320C6000 Assembly Language Tools User's Guide explains how to use the
absolute lister.
• The cross-reference lister uses object files to produce a cross-reference listing showing symbols,
their definitions, and their references in the linked source files. The TMS320C6000 Assembly
Language Tools User's Guide explains how to use the cross-reference utility.
• The C++ name demangler is a debugging aid that converts names mangled by the compiler back to
their original names as declared in the C++ source code. As shown in Figure 1-1, you can use the C++
name demangler on the assembly file that is output by the compiler; you can also use this utility on the
assembler listing file and the linker map file. See Chapter 9.
• The disassembler decodes object files to show the assembly instructions that they represent. The
TMS320C6000 Assembly Language Tools User's Guide explains how to use the disassembler.
• The main product of this development process is an executable object file that can be executed in a
TMS320C6000 device. You can use one of several debugging tools to refine and correct your code.
Available products include:
– An instruction-level and clock-accurate software simulator
– An XDS emulator
standard. The compiler also supports embedded C++. For a description of unsupported C++ features,
see Section 6.2.
• ISO-standard run-time support
The compiler tools come with an extensive run-time library. All library functions conform to the ISO
C/C++ library standard. The library includes functions for standard input and output, string
manipulation, dynamic memory allocation, data conversion, timekeeping, trigonometry, and exponential
and hyperbolic functions. Functions for signal handling are not included, because these are target-
system specific. For more information, see Chapter 8.
1.2.4 Utilities
These features are compiler utilities:
• Library-build utility
The library-build utility lets you custom-build object libraries from source for any combination of run-
time models. For more information, see Section 8.5.
• C++ name demangler
The C++ name demangler (dem6x) is a debugging aid that translates each mangled name it detects in
compiler-generated assembly code, disassembly output, or compiler diagnostic messages to its
original name found in the C++ source code. For more information, see Chapter 9.
• Hex conversion utility
For stand-alone embedded applications, the compiler has the ability to place all code and initialization
data into ROM, allowing C/C++ code to run from reset. The COFFor ELF files output by the compiler
can be converted to EPROM programmer data files by using the hex conversion utility, as described in
the TMS320C6000 Assembly Language Tools User's Guide.
The compiler translates your source program into machine language object code that the TMS320C6000
can execute. Source code must be compiled, assembled, and linked to create an executable object file. All
of these steps are executed at once by using the compiler.
For a complete description of the assembler and the linker, see the TMS320C6000 Assembly Language
Tools User's Guide.
(1)
Note: Machine-specific options (see Table 2-11) can also affect optimization.
(1)
Note: Machine-specific options (see Table 2-11) can also affect optimization.
(1)
See Section 2.3.12 for more information.
The following tables list the linker options. See the TMS320C6000 Assembly Language Tools User's
Guide for details on these options.
--c_src_interlist Invokes the interlist feature, which interweaves original C/C++ source
with compiler-generated assembly language. The interlisted C
statements may appear to be out of sequence. You can use the interlist
feature with the optimizer by combining the --optimizer_interlist and --
c_src_interlist options. See Section 3.14.2. The --c_src_interlist option
can have a negative performance and/or code size impact.
--cmd_file=filename Appends the contents of a file to the option set. You can use this option
to avoid limitations on command line length or C style comments
imposed by the host operating system. Use a # or ; at the beginning of a
line in the command file to include comments. You can also include
comments by delimiting them with /* and */. To specify options, surround
hyphens with quotation marks. For example, "--"quiet.
You can use the --cmd_file option multiple times to specify multiple files.
For instance, the following indicates that file3 should be compiled as
source and file1 and file2 are --cmd_file files:
cl6x --cmd_file=file1 --cmd_file=file2 file3
--compile_only Suppresses the linker and overrides the --run_linker option, which
specifies linking. The --compile_only option's short form is -c. Use this
option when you have --run_linker specified in the C6X_C_OPTION
environment variable and you do not want to link. See Section 5.1.3.
--define=name[=def] Predefines the constant name for the preprocessor. This is equivalent to
inserting #define name def at the top of each C source file. If the
optional[=def] is omitted, the name is set to 1. The --define option's short
form is -D.
If you want to define a quoted string and keep the quotation marks, do
one of the following:
• For Windows, use --define=name="\"string def\"". For example, --
define=car="\"sedan\""
• For UNIX, use --define=name='"string def"'. For example, --
define=car='"sedan"'
• For Code Composer Studio, enter the definition in a file and include
that file with the --cmd_file option.
--help Displays the syntax for invoking the compiler and lists available options.
If the --help option is followed by another option or phrase, detailed
information about the option or phrase is displayed. For example, to see
information about debugging options use --help debug.
--include_path=directory Adds directory to the list of directories that the compiler searches for
#include files. The --include_path option's short form is -I. You can use
this option several times to define several directories; be sure to
separate the --include_path options with spaces. If you do not specify a
directory name, the preprocessor ignores the --include_path option. See
Section 2.6.2.1.
--keep_asm Retains the assembly language output from the compiler or assembly
optimizer. Normally, the compiler deletes the output assembly language
file after assembly is complete. The --keep_asm option's short form is -k.
--quiet Suppresses banners and progress information from all the tools. Only
source filenames and error messages are output. The --quiet option's
short form is -q.
--run_linker Runs the linker on the specified object files. The --run_linker option and
its parameters follow all other options on the command line. All
arguments that follow --run_linker are passed to the linker. The --
run_linker option's short form is -z. See Section 5.1.
--skip_assembler Compiles only. The specified source files are compiled but not
assembled or linked. The --skip_assembler option's short form is -n. This
option overrides --run_linker. The output is assembly language output
from the compiler.
--src_interlist Invokes the interlist feature, which interweaves optimizer comments or
C/C++ source with assembly source. If the optimizer is invoked (--
opt_level=n option), optimizer comments are interlisted with the
assembly language output of the compiler, which may rearrange code
significantly. If the optimizer is not invoked, C/C++ source statements are
interlisted with the assembly language output of the compiler, which
allows you to inspect the code generated for each C/C++ statement. The
--src_interlist option implies the --keep_asm option. The --src_interlist
option's short form is -s.
--tool_version Prints the version number for each tool in the compiler. No compiling
occurs.
--undefine=name Undefines the predefined constant name. This option overrides any --
define options for the specified constant. The --undefine option's short
form is -U.
--verbose Displays progress information and toolset version while compiling.
Resets the --quiet option.
--keep_unneeded_statics Does not delete unreferenced static variables. The parser by default
remarks about and then removes any unreferenced static variables.
The --keep_unneeded_statics option keeps the parser from deleting
unreferenced static variables and any static functions that are
referenced by these variable definitions. Unreferenced static functions
will still be removed.
--no_const_clink Tells the compiler to not generate .clink directives for const global
arrays. By default, these arrays are placed in a .const subsection and
conditionally linked.
--misra_advisory={error| Sets the diagnostic severity for advisory MISRA-C:2004 rules.
warning|remark|suppress}
--misra_required={error| Sets the diagnostic severity for required MISRA-C:2004 rules.
warning|remark|suppress}
--preinclude=filename Includes the source code of filename at the beginning of the
compilation. This can be used to establish standard macro definitions.
The filename is searched for in the directories on the include search
list. The files are processed in the order in which they were specified.
--printf_support={full| Enables support for smaller, limited versions of the printf and sprintf
nofloat|minimal} run-time-support functions. The valid values are:
• full: Supports all format specifiers. This is the default.
• nofloat: Excludes support for printing and scanning floating-point
values. Supports all format specifiers except %f, %F, %g, %G, %e,
and %E.
• minimal: Supports the printing and scanning of integer, char, or
string values without width or precision flags. Specifically, only
the %%, %d, %o, %c, %s, and %x format specifiers are supported
There is no run-time error checking to detect if a format specifier is
used for which support is not included. The --printf_support option
precedes the --run_linker option, and must be used when performing
the final link.
--sat_reassoc={on|off} Enables or disables the reassociation of saturating arithmetic.
--interrupt_threshold=n Specifies an interrupt threshold value n that sets the maximum cycles
the compiler can disable interrupts. See Section 2.12.
--mem_model:const=type Allows const objects to be made far independently of the --
mem_model:data option. The type can be data, far, or
far_aggregates. See Section 7.1.5.3
--mem_model:data=type Specifies data access model as type far, far_aggregates, or near.
Default is far_aggregates. See Section 7.1.5.1.
--silicon_version=num Selects the target CPU version. See Section 2.3.4.
--small_enum By default, the C6000 compiler uses 32 bits for every enum. When
you use the --small_enum option, the smallest possible byte size for
the enumeration type is used. For example, enum example_enum
{first = -128, second = 0, third = 127} uses only one byte instead of 32
bits when the --small_enum option is used. Similarly, enum
a_short_enum {bottom = -32768, middle = 0, top = 32767} fits into
two bytes instead of four. Do not link object files compiled with the --
small_enum option with object files that have been compiled without
it. If you use the --small_enum option, you must use it with all of your
C/C++ files; otherwise, you will encounter errors that cannot be
detected until run time.
--speculate_loads=n Specifies speculative load byte count threshold. Allows speculative
execution of loads with bounded addresses. See Section 3.2.3.1.
--speculate_unknown_loads Allows speculative execution of loads with unbounded addresses.
--target_compatibility_6200 Compiles C6400 code that is compatible with array alignment
restrictions of version 4.0 tools or C6200/C6700 object code. This
option is deprecated. See Section 2.13
--use_const_for_alias_analysis Uses const to disambiguate pointers.
--wchar_t={32|16} Sets the size (in bits) of the C/C++ type wchar_t. The --abi=eabi
option is required when -wchar_t=32 is used. By default the compiler
generates 16-bit wchar_t. In COFF ABI mode, a warning is generated
and --wchar_t=32 is ignored. 16-bit wchar_t objects are not
compatible with the 32-bit wchar_t objects; an error is generated if
they are combined. When the --linux option is specified, it implies --
wchar_t=32 since Linux uses 32-bit extended characters.
--profile:breakpt Disables optimizations that would cause incorrect behavior when using a
breakpoint-based profiler.
--profile:power Enables power profiling support by inserting NOPs into the frame code.
These NOPs can then be instrumented by the power profiling tooling to
track the power usage of functions. If the power profiling tool is not used,
this option increases the cycle count of each function because of the
NOPs. The --profile:power option also disables optimizations that cannot
be handled by the power-profiler.
--symdebug:coff Enables symbolic debugging using the alternate STABS debugging
format. This may be necessary to allow debugging with older debuggers
or custom tools, which do not read the DWARF format. STABS format is
not supported for C6400+ or ELF.
--symdebug:dwarf Generates directives that are used by the C/C++ source-level debugger
and enables assembly source debugging in the assembler. The --
symdebug:dwarf option's short form is -g. The --symdebug:dwarf option
disables many code generator optimizations, because they disrupt the
debugger. You can use the --symdebug:dwarf option with the --opt_level
(aliased as -O) option to maximize the amount of optimization that is
compatible with debugging (see Section 3.14.3.1).
For more information on the DWARF debug format, see The DWARF
Debugging Standard.
--symdebug:keep_all_types Effects the ability to view unused types in the debugger that are from a
COFF executable. Use this option to view the details of a type that is
defined but not used to define any symbols. Such unused types are not
included in the debug information by default for COFF. However, in EABI
mode, all types are included in the debug information and this option has
no effect.
--symdebug:dwarf_ Specifies the DWARF debugging format version (2 or 3) to be generated
version={2|3} when --symdebug:dwarf or --symdebug:skeletal is specified. By default,
the compiler generates DWARF version 3 debug information. For more
information on TI extensions to the DWARF language, see The Impact of
DWARF on TI Object Files (SPRAAB5).
--symdebug:none Disables all symbolic debugging output. This option is not recommended;
it prevents debugging and most performance analysis capabilities.
--symdebug:profile_coff Adds the necessary debug directives to the object file which are needed
by the profiler to allow function level profiling with minimal impact on
optimization (when used). Using --symdebug:coff may hinder some
optimizations to ensure that debug ability is maintained, while this option
will not hinder optimization. STABS format is not supported for C6400+
or ELF.
You can set breakpoints and profile on function-level boundaries in Code
Composer Studio, but you cannot single-step through code as with full
debug ability.
--symdebug:skeletal Generates as much symbolic debugging information as possible without
hindering optimization. Generally, this consists of global-scope
information only. This option reflects the default behavior of the compiler.
See Section 2.3.13 for a list of deprecated symbolic debugging options.
For information about how you can alter the way that the compiler interprets individual filenames, see
Section 2.3.7. For information about how you can alter the way that the compiler interprets and names the
extensions of assembly source and object files, see Section 2.3.10.
You can use wildcard characters to compile or assemble multiple files. Wildcard specifications vary by
system; use the appropriate form listed in your operating system manual. For example, to compile all of
the files in a directory with the extension .cpp, enter the following:
cl6x *.cpp
For example, if you have a C source file called file.s and an assembly language source file called assy,
use the --asm_file and --c_file options to force the correct interpretation:
cl6x --c_file=file.s --asm_file=assy
The following example assembles the file fit.rrr and creates an object file named fit.o:
cl6x --asm_extension=.rrr --obj_extension=.o fit.rrr
The period (.) in the extension is optional. You can also write the example above as:
cl6x --asm_extension=rrr --obj_extension=o fit.rrr
--abs_directory=directory Specifies the destination directory for absolute listing files. The default is
to use the same directory as the object file directory. For example:
cl6x --abs_directory=d:\abso_list
--asm_directory=directory Specifies a directory for assembly files. For example:
cl6x --asm_directory=d:\assembly
--list_directory=directory Specifies the destination directory for assembly listing files and cross-
reference listing files. The default is to use the same directory as the
object file directory. For example:
cl6x --list_directory=d:\listing
--obj_directory=directory Specifies a directory for object files. For example:
cl6x --obj_directory=d:\object
--output_file=filename Specifies a compilation output file name; can override --obj_directory . For
example:
cl6x --output_file=transfer
--pp_directory=directory Specifies a preprocessor file directory for object files (default is .). For
example:
cl6x --pp_directory=d:\preproc
--temp_directory=directory Specifies a directory for temporary intermediate files. For example:
cl6x --temp_directory=d:\temp
Additionally, the --symdebug:profile_coff option has been added to enable function-level profiling of
optimized code with symbolic debugging using the STABS debugging format (the --symdebug:coff or -gt
option).
Since C6400+, C6740, and C6600 produce only DWARF debug information, the -gp, -gt/--symdebug:coff,
and --symdebug:profile_coff options are not supported for C6400+, C6740, and C6600.
Environment variable options are specified in the same way and have the same meaning as they do on
the command line. For example, if you want to always run quietly (the --quiet option), enable C/C++
source interlisting (the --src_interlist option), and link (the --run_linker option) for Windows, set up the
C6X_C_OPTION environment variable as follows:
set C6X_C_OPTION=--quiet --src_interlist --run_linker
In the following examples, each time you run the compiler, it runs the linker. Any options following --
run_linker on the command line or in C6X_C_OPTION are passed to the linker. Thus, you can use the
C6X_C_OPTION environment variable to specify default compiler and linker options and then specify
additional compiler and linker options on the command line. If you have set --run_linker in the environment
variable and want to compile only, use the compiler --compile_only option. These additional examples
assume C6X_C_OPTION is set as shown above:
cl6x *c ; compiles and links
cl6x --compile_only *.c ; only compiles
cl6x *.c --run_linker lnk.cmd ; compiles and links using a command file
cl6x --compile_only *.c --run_linker lnk.cmd
; only compiles (--compile_only overrides --run_linker)
For details on compiler options, see Section 2.3. For details on linker options, see the Linker Description
chapter in the TMS320C6000 Assembly Language Tools User's Guide.
The pathnames are directories that contain input files. The pathnames must follow these constraints:
• Pathnames must be separated with a semicolon.
• Spaces or tabs at the beginning or end of a path are ignored. For example, the space before and after
the semicolon in the following is ignored:
set C6X_C_DIR=c:\path\one\to\tools ; c:\path\two\to\tools
• Spaces and tabs are allowed within paths to accommodate Windows directories that contain spaces.
For example, the pathnames in the following are valid:
set C6X_C_DIR=c:\first path\to\tools;d:\second path\to\tools
The environment variable remains set until you reboot the system or reset the variable by entering:
Carefully organizing the include directives across multiple files so that their header files maximize common
usage can increase the compile time savings when using precompiled headers.
A precompiled header file is produced only if the header stop point and the code prior to it meet certain
requirements.
You can use the names listed in Table 2-37 in the same manner as any other defined name. For example,
printf ( "%s %s" , __TIME__ , __DATE__);
UNIX /tools/files/alt.h
Windows c:\tools\files\alt.h
The table below shows how to invoke the compiler. Select the command for your operating system:
2.6.8 Generating a List of Files Included With the #include Directive (--preproc_includes
Option)
The --preproc_includes option performs preprocessing only, but instead of writing preprocessed output,
writes a list of files included with the #include directive. If you do not supply an optional filename, the list is
written to a file with the same name as the source file but with a .pp extension.
By default, the source line is omitted. Use the --verbose_diagnostics compiler option to enable the display
of the source line and the error position. The above example makes use of this option.
The message identifies the file and line involved in the diagnostic, and the source line itself (with the
position indicated by the ^ character) follows the message. If several diagnostics apply to one source line,
each diagnostic has the form shown; the text of the source line is displayed several times, with an
appropriate position indicated each time.
Because an error is determined to be discretionary based on the error severity associated with a specific
context, an error can be discretionary in some cases and not in others. All warnings and remarks are
discretionary.
For some messages, a list of entities (functions, local variables, source files, etc.) is useful; the entities are
listed following the initial error message:
"test.c", line 4: error: more than one instance of overloaded function "f"
matches the argument list:
function "f(int)"
function "f(float)"
argument types are: (double)
f(1.5);
^
In some cases, additional context information is provided. Specifically, the context information is useful
when the front end issues a diagnostic while doing a template instantiation or while generating a
constructor, destructor, or assignment operator function. For example:
"test.c", line 7: error: "A::A()" is inaccessible
B x;
^
detected during implicit generation of "B::B()" at line 7
Without the context information, it is difficult to determine to what the error refers.
--display_error_number Displays a diagnostic's numeric identifier along with its text. Use this option in
determining which arguments you need to supply to the diagnostic
suppression options (--diag_suppress, --diag_error, --diag_remark, and --
diag_warning). This option also indicates whether a diagnostic is discretionary.
A discretionary diagnostic is one whose severity can be overridden. A
discretionary diagnostic includes the suffix -D; otherwise, no suffix is present.
See Section 2.7.
--emit_warnings_as_ Treats all warnings as errors. This option cannot be used with the --
errors no_warnings option. The --diag_remark option takes precedence over this
option. This option takes precedence over the --diag_warning option.
--issue_remarks Issues remarks (nonserious warnings), which are suppressed by default.
--no_warnings Suppresses warning diagnostics (errors are still issued).
--set_error_limit=num Sets the error limit to num, which can be any decimal value. The compiler
abandons compiling after this number of errors. (The default is 100.)
--verbose_diagnostics Provides verbose diagnostics that display the original source with line-wrap
and indicate the position of the error in the source line
--write_diagnostics_file Produces a diagnostics information file with the same source file name with an
.err extension. (The --write_diagnostics_file option is not supported by the
linker.)
If you invoke the compiler with the --quiet option, this is the result:
"err.c", line 9: warning: statement is unreachable
"err.c", line 12: warning: statement is unreachable
Because it is standard programming practice to include break statements at the end of each case arm to
avoid the fall-through condition, these warnings can be ignored. Using the --display_error_number option,
you can find out the diagnostic identifier for these warnings. Here is the result:
[err.c]
"err.c", line 9: warning #111-D: statement is unreachable
"err.c", line 12: warning #111-D: statement is unreachable
Next, you can use the diagnostic identifier of 111 as the argument to the --diag_remark option to treat this
warning as a remark. This compilation now produces no diagnostic messages (because remarks are
disabled by default).
Although this type of control is useful, it can also be extremely dangerous. The compiler often emits
messages that indicate a less than obvious problem. Be careful to analyze all diagnostics emitted before
using the suppression options.
The --gen_acp_raw option also includes diagnostic identifiers as defined in Table 2-39.
S One of the identifiers in Table 2-39 that indicates the severity of the diagnostic
filename The source file
line number The line number in the source file
column number The column number in the source file
diagnostic The message text for the diagnostic
Diagnostics after the end of file are indicated as the last line of the file with a column number of 0. When
diagnostic message text requires more than one line, each subsequent line contains the same file, line,
and column information but uses a lowercase version of the diagnostic identifier. For more information
about diagnostic messages, see Section 2.7.
/*****************************************************************************/
/* string.h vx.xx (Excerpted) */
/* Copyright (c) 1993-2011 Texas Instruments Incorporated */
/*****************************************************************************/
#ifdef _INLINE
#define _IDECL static inline
#else
#define _IDECL extern _CODE_ACCESS
#endif
#ifdef _INLINE
/****************************************************************************/
/* strlen */
/****************************************************************************/
static inline size_t strlen(const char *string)
{
size_t n = (size_t)-1;
const char *s = string - 1;
#endif
/****************************************************************************/
/* strlen */
/****************************************************************************/
#undef _INLINE
#include <string.h>
{
_CODE_ACCESS size_t strlen(cont char * string)
size_t n = (size_t)-1;
const char *s = string - 1;
RTS Library Files Are Not Built With the --interrupt_threshold Option
NOTE: The run-time-support library files provided with the compiler are not built with the interrupt
flexibility option. Refer to the readme file to see how the run-time-support library files were
built for your release. See Section 8.5 to build your own run-time-support library files with the
interrupt flexibility option.
The alignment of top-level arrays for the C6600 family is 16 bytes to facilitate compatibility with future
C6600 family devices. This change in alignment does not have any compatibility issues with the
C6400/C6400+/C6740 device code as the C6600 can safely accept top-level arrays aligned to an 8-byte
boundary.
The --c_src_interlist option prevents the compiler from deleting the interlisted assembly language output
file. The output assembly file, function.asm, is assembled normally.
When you invoke the interlist feature without the optimizer, the interlist runs as a separate pass between
the code generator and the assembler. It reads both the assembly and C/C++ source files, merges them,
and writes the C/C++ statements into the assembly file as comments.
Using the --c_src_interlist option can cause performance and/or code size degradation.
Example 2-4 shows a typical interlisted assembly file.
For more information about using the interlist feature with the optimizer, see Section 3.14.2.
_main:
EABI mode before migrating your existing COFF ABI systems to C6000 EABI. See
http://tiexpressdsp.com/index.php/EABI_Support_in_C6000_Compiler for full details.
For more details on the different ABIs, see Section 6.11.
--entry_hook[=name] Enables entry hooks. If specified, the hook function is called name. Otherwise,
the default entry hook function name is __entry_hook.
--entry_parm{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
--exit_hook[=name] Enables exit hooks. If specified, the hook function is called name. Otherwise,
the default exit hook function name is __exit_hook.
--exit_parm{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
The presence of the hook options creates an implicit declaration of the hook function with the given
signature. If a declaration or definition of the hook function appears in the compilation unit compiled with
the options, it must agree with the signatures listed above.
In C++, the hooks are declared extern "C". Thus you can define them in C (or assembly) without being
concerned with name mangling.
Hooks can be declared inline, in which case the compiler tries to inline them using the same criteria as
other inline functions.
Entry hooks and exit hooks are independent. You can enable one but not the other, or both. The same
function can be used as both the entry and exit hook.
You must take care to avoid recursive calls to hook functions. The hook function should not call any
function which itself has hook calls inserted. To help prevent this, hooks are not generated for inline
functions, or for the hook functions themselves.
You can use the --remove_hooks_when_inlining option to remove entry/exit hooks for functions that are
auto-inlined by the optimizer.
See Section 6.9.23 for information about the NO_HOOKS pragma.
The compiler tools can perform many optimizations to improve the execution speed and reduce the size of
C and C++ programs by simplifying loops, software pipelining, rearranging statements and expressions,
and allocating variables into registers.
This chapter describes how to invoke different levels of optimization and describes which optimizations are
performed at each level. This chapter also describes how you can use the Interlist feature when
performing optimization and how you can profile or debug optimized code.
A1
B1 A2
Pipelined-loop prolog
C1 B2 A3
D1 C2 B3 A4
E1 D2 C3 B4 A5 Kernel
E2 D3 C4 B5
E3 D4 C5
Pipelined-loop epilog
E4 D5
E5
If you enter comments on instructions in your linear assembly input file, the compiler moves the comments
to the output file along with additional information. It attaches a 2-tuple <x, y> to the comments to specify
the iteration and cycle of the loop an instruction is on in the software pipeline. The zero-based number x
represents the iteration the instruction is on during the first execution of the loop kernel. The zero-based
number y represents the cycle that the instruction is scheduled on within a single iteration of the loop.
For more information about software pipelining, see the TMS320C6000 Programmer's Guide.
*----------------------------------------------------------------------------*
The terms defined below appear in the software pipelining information. For more information on each
term, see the TMS320C6000 Programmer's Guide.
• Loop unroll factor. The number of times the loop was unrolled specifically to increase performance
based on the resource bound constraint in a software pipelined loop.
• Known minimum trip count. The minimum number of times the loop will be executed.
• Known maximum trip count. The maximum number of times the loop will be executed.
• Known max trip count factor. Factor that would always evenly divide the loops trip count. This
information can be used to possibly unroll the loop.
• Loop label. The label you specified for the loop in the linear assembly input file. This field is not
present for C/C++ code.
• Loop carried dependency bound. The distance of the largest loop carry path. A loop carry path
occurs when one iteration of a loop writes a value that must be read in a future iteration. Instructions
that are part of the loop carry bound are marked with the ^ symbol.
• Initiation interval (ii). The number of cycles between the initiation of successive iterations of the loop.
The smaller the initiation interval, the fewer cycles it takes to execute a loop.
• Resource bound. The most used resource constrains the minimum initiation interval. If four
instructions require a .D unit, they require at least two cycles to execute (4 instructions/2 parallel .D
units).
• Unpartitioned resource bound. The best possible resource bound values before the instructions in
the loop are partitioned to a particular side.
• Partitioned resource bound (*). The resource bound values after the instructions are partitioned.
• Resource partition. This table summarizes how the instructions have been partitioned. This
information can be used to help assign functional units when writing linear assembly. Each table entry
has values for the A-side and B-side registers. An asterisk is used to mark those entries that determine
the resource bound value. The table entries represent the following terms:
– .L units is the total number of instructions that require .L units.
– .S units is the total number of instructions that require .S units.
– .D units is the total number of instructions that require .D units.
– .M units is the total number of instructions that require .M units.
– .X cross paths is the total number of .X cross paths.
– .T address paths is the total number of address paths.
– Long read path is the total number of long read port paths.
– Long write path is the total number of long write port paths.
– Logical ops (.LS) is the total number of instructions that can use either the .L or .S unit.
– Addition ops (.LSD) is the total number of instructions that can use either the .L or .S or .D unit
• Bound(.L .S .LS). The resource bound value as determined by the number of instructions that use the
.L and .S units. It is calculated with the following formula:
Bound(.L .S .LS ) = ceil((.L + .S + .LS) / 2)
• Bound(.L .S .D .LS .LSD). The resource bound value as determined by the number of instructions that
use the .D, .L, and .S units. It is calculated with the following formula:
Bound(.L .S .D .LS .SLED) = ceil((.L + .S + .D + .LS + .LSD) / 3)
• Minimum required memory pad. The number of bytes that are read if speculative execution is
enabled. See Section 3.2.3 for more information.
• Loop carried dependency bound too large. If the loop has complex loop control, try --
speculate_loads according to the recommendations in Section 3.2.3.2.
• Cannot identify trip counter. The loop trip counter could not be identified or was used incorrectly in
the loop body.
• Too many reads of one register. The same register can be read a maximum of four times per cycle
with the C6200 or C6700 core. The C6400 core can read the same register any number of times per
cycle.
• Trip variable used in loop - Cannot adjust trip count. The loop trip counter has a use in the loop
other than as a loop trip counter.
This example shows that on cycle 0 (first execute packet) of the loop kernel, registers A0, A1, A2, A6, A7,
A8, A9, B0, B1, B2, B4, B5, B6, B7, B8, and B9 are all live during this cycle.
3.2.3 Collapsing Prologs and Epilogs for Improved Performance and Code Size
When a loop is software pipelined, a prolog and epilog are generally required. The prolog is used to pipe
up the loop and epilog is used to pipe down the loop.
In general, a loop must execute a minimum number of iterations before the software-pipelined version can
be safely executed. If the minimum known trip count is too small, either a redundant loop is added or
software pipelining is disabled. Collapsing the prolog and epilog of a loop can reduce the minimum trip
count necessary to safely execute the pipelined loop.
Collapsing can also substantially reduce code size. Some of this code size growth is due to the redundant
loop. The remainder is due to the prolog and epilog.
The prolog and epilog of a software-pipelined loop consists of up to p-1 stages of length ii, where p is the
number of iterations that are executed in parallel during the steady state and ii is the cycle time for the
pipelined loop body. During prolog and epilog collapsing the compiler tries to collapse as many stages as
possible. However, over-collapsing can have a negative performance impact. Thus, by default, the
compiler attempts to collapse as many stages as possible without sacrificing performance. When the --
opt_for_space=0 or --opt_for_space=1 options are invoked, the compiler increasingly favors code size
over performance.
If the minimum safe trip count is greater than the minimum known trip count, use of --speculate_loads is
highly recommended, not only for code size, but for performance.
When using --speculate_loads, you must ensure that potentially speculated loads will not cause illegal
reads. This can be done by padding the data sections and/or stack, as needed, by the required memory
pad in both directions. The required memory pad for a given software-pipelined loop is also provided in the
comment block for that loop.
;* Minimum required memory pad : 8 bytes
For safety, the example loop requires that array data referenced within this loop be preceded and followed
by a pad of at least 5 bytes. This pad can consist of other program data. The pad will not be modified. In
many cases, the threshold value (namely, the minimum value of the argument to --speculate_loads that is
needed to achieve a particular schedule and level of collapsing) is the same as the pad. However, when it
is not, the comment block will also include the minimum threshold value. In the case of this loop, the
threshold value must be at least 7 to achieve this level of collapsing.
Beginning with v7.4.0 of the C6000 Code Generation Tools, the compiler and linker can provide automatic
load speculation via the auto argument to the --speculate_loads option (i.e. --speculate_loads=auto or -
mh=auto). Use of the auto argument makes it easier to use and benefit from speculative load
optimizations. This option can generate speculative loads of up to 256 bytes beyond memory that the
compiler can prove to be allocated.
In addition, the compiler communicates information to the linker to help automatically ensure the required
pre- and post-padding:
• If the symbol of the speculatively loaded buffer is known during compile time, the linker will ensure the
object pointed to by the symbol has the required padding to let the speculative load access legal
memory.
• If the symbol information is not known during compile time, the linker will ensure that the placement of
data sections will allow legal accessing beyond the boundaries of the data sections. The linker does
this by simply padding the start and end of the memory range(s) where the data sections are placed.
However, you can also specify the speculative loads threshold explicitly via the --speculate_loads=n
option, where n is at least the minimum required pad (as explained earlier), but you also need to consider
whether a larger threshold value would facilitate additional collapsing. This information is also provided, if
applicable. For example, in the above comment block, a threshold value of 14 might facilitate further
collapsing. If you choose the auto argument to --speculate_loads, the compiler will consider the larger
threshold value automatically.
A
B A
C B A ←Three iterations in parallel = minimum trip count
C B
C
When the C6000 tools cannot determine the trip count for a loop, then by default two loops and control
logic are generated. The first loop is not pipelined, and it executes if the run-time trip count is less than the
loop's minimum trip count. The second loop is the software pipelined loop, and it executes when the run-
time trip count is greater than or equal to the minimum trip count. At any given time, one of the loops is a
redundant loop. For example:
foo(N) /* N is the trip count */
{
for (I=0; I < N; I++) /* I is the trip counter */
}
After finding a software pipeline for the loop, the compiler transforms foo() as below, assuming the
minimum trip count for the loop is 3. Two versions of the loop would be generated and the following
comparison would be used to determine which version should be executed:
foo(N)
{
if (N < 3)
{
for (I=0; I < N; I++) /* Unpipelined version */
}
else
}
for (I=0; I < N; I++) /* Pipelined version */
}
}
foo(50); /* Execute software pipelined loop */
foo(2); /* Execute loop (unpipelined)*/
You may be able to help the compiler avoid producing redundant loops with the use of --
program_level_compile --opt_level=3 (see Section 3.7) or the use of the MUST_ITERATE pragma (see
Section 6.9.20).
3.4 Utilizing the Loop Buffer Using SPLOOP on C6400+, C6740, and C6600
The C6400+, C6740, and C6600 ISAs have a loop buffer which improves performance and reduces code
size for software pipelined loops. The loop buffer provides the following benefits:
• Code size. A single iteration of the loop is stored in program memory.
• Interrupt latency. Loops executing out of the loop buffer are interruptible.
• Improves performance for loops with unknown trip counts and eliminates redundant loops.
• Reduces or eliminates the need for speculated loads.
• Reduces power usage.
You can tell that the compiler is using the loop buffer when you find SPLOOP(D/W) at the beginning of a
software pipelined loop followed by an SPKERNEL at the end. Refer to the TMS320C6400/C6400+ CPU
and Instruction Set Reference Guide for information on SPLOOP.
When the --opt_for_space option is not used, the compiler will not use the loop buffer if it can find a faster
software pipelined loop without it. When using the --opt_for_space option, the compiler will use the loop
buffer when it can.
The compiler does not generate code for the loop buffer (SPLOOP/D/W) when any of the following occur:
• ii (initiation interval) > 14 cycles
• Dynamic length (of a single iteration) > 48 cycles
• The optimizer completely unrolls the loop
• Code contains elements that disqualify normal software pipelining (call in loop, complex control code in
loop, etc.). See the TMS320C6000 Programmer's Guide for more information.
It is recommended that a code size flag not be used with the most performance-critical code. Using --
opt_for_space=0 or --opt_for_space=1 is recommended for all but the most performance-critical code.
Using --opt_for_space=2 or --opt_for_space=3 is recommended for seldom-executed code. Either --
opt_for_space=2 or --opt_for_space=3 should be used if you need minimum code size. It is generally
recommended that the code size flags be combined with --opt_level=2 or --opt_level=3.
In certain circumstances, the compiler reverts to a different --call_assumptions level from the one you
specified, or it might disable program-level optimization altogether. Table 3-5 lists the combinations of --
call_assumptions levels and conditions that cause the compiler to revert to other --call_assumptions
levels.
In some situations when you use --program_level_compile and --opt_level=3, you must use a --
call_assumptions option or the FUNC_EXT_CALLED pragma. See Section 3.7.2 for information about
these situations.
Situation — Your application consists of C/C++ source code and assembly source code. The assembly
functions are interrupt service routines that call C/C++ functions; the C/C++ functions that the
assembly functions call are never called from C/C++. These C/C++ functions act like main: they
function as entry points into C/C++.
Solution — Add the volatile keyword to the C/C++ variables that may be modified by the interrupts. Then,
you can optimize your code in one of these ways:
• You achieve the best optimization by applying the FUNC_EXT_CALLED pragma to all of the
entry-point functions called from the assembly language interrupts, and then compiling with --
program_level_compile --opt_level=3 --call_assumptions=2. Be sure that you use the pragma
with all of the entry-point functions. If you do not, the compiler might remove the entry-point
functions that are not preceded by the FUNC_EXT_CALLED pragma.
• Compile with --program_level_compile --opt_level=3 --call_assumptions=3. Because you do not
use the FUNC_EXT_CALLED pragma, you must use the --call_assumptions=3 option, which is
less aggressive than the --call_assumptions=2 option, and your optimization may not be as
effective.
Keep in mind that if you use --program_level_compile --opt_level=3 without additional options, the
compiler removes the C functions that the assembly functions call. Use the FUNC_EXT_CALLED
pragma to keep these functions.
--gen_profile_info tells the compiler to add instrumentation code to collect profile information. When
the program executes the run-time-support exit() function, the profile data is
written to a PDAT file. If the environment variable TI_PROFDATA on the host is
set, the data is written into the specified file name. Otherwise, it uses the default
filename: pprofout.pdat. The full pathname of the PDAT file (including the directory
name) can be specified using the TI_PROFDATA host environment variable.
By default, the RTS profile data output routine uses the C I/O mechanism to write
data to the PDAT file. You can install a device handler for the PPHNDL device that
enables you to re-direct the profile data to a custom device driver routine.
Feedback directed optimization requires you to turn on at least skeletal debug
information when using the --gen_profile_info option. This enables the compiler to
output debug information that allows pdd6x to correlate compiled functions and
their associated profile data.
--use_profile_info specifies the profile information file(s) to use for performing phase 2 of feedback
directed optimization. More than one profile information file can be specified on the
command line; the compiler uses all input data from multiple information files. The
syntax for the option is:
--use_profile_info==file1[, file2, ..., filen]
If no filename is specified, the compiler looks for a file named pprofout.prf in the
directory where the compiler in invoked.
-a Computes the average of the data values in the data sets instead of
accumulating data values
-e exec.out Specifies exec.out is the name of the application executable.
-o application.prf Specifies application.prf is the formatted profile feedback file that is used as the
argument to --use_profile_info during recompilation. If no output file is specified,
the default output filename is pprofout.prf.
filename .pdat Is the name of the profile data file generated by the run-time-support function.
This is the default name and it can be overridden by using the host environment
variable TI_PROFDATA.
The run-time-support function and pdd6x append to their respective output files and do not overwrite
them. This enables collection of data sets from multiple runs of the application.
--gen_profile_info Adds instrumentation to the compiled code. Execution of the code results in
profile data being emitted to a PDAT file.
--use_profile_info=file.prf Uses profile information for optimization and/or generating code coverage
information.
--analyze=codecov Generates a code coverage information file and continues with profile-based
compilation. Must be used with --use_profile_info.
--analyze_only Generates only a code coverage information file. Must be used with --
use_profile_info. You must specify both --analyze=codecov and --
analyze_only to do code coverage analysis of the instrumented application.
API
Files Created
*.pdat Profile data file, which is created by executing an instrumented program and
used as input to the profile data decoder
*.prf Profiling feedback file, which is created by the profile data decoder and
used as input to the re-compilation step
3.9 Using Profile Information to Get Better Program Cache Layout and Analyze Code
Coverage
There are two different types of analysis information you can get from the path profiler: code coverage
information and call graph information.
The program cache layout tool helps you to develop better program instruction cache efficiency into your
applications. Program cache layout is the process of controlling the relative placement of code sections
into memory to minimize the occurrence of conflict misses in the program instruction cache.
You can specify two environment variables to control the destination of the code-coverage information file.
• The TI_COVDIR environment variable specifies the directory where the code-coverage file should be
generated. The default is the directory where the compiler is invoked.
• The TI_COVDATA environment variable specifies the name of the code-coverage data file generated
by the compiler. the default is filename.csv where filename is the base-name of the file being compiled.
For example, if foo.c is being compiled, the default code-coverage data file name is foo.csv.
If the code-coverage data file already exists, the compiler appends the new dataset at the end of the file.
Code-coverage data is a comma-separated list of data items that can be conveniently handled by data-
processing tools and scripting languages. The following is the format of code-coverage data:
"filename-with-full-path","funcname",line#,column#,exec-frequency,"comments"
The full filename, function name, and comments appear within quotation marks ("). For example:
"/some_dir/zlib/c64p/deflate.c","_deflateInit2_",216,5,1,"( strm->zalloc )"
Other tools, such as a spreadsheet program, can be used to format and view the code coverage data.
For further information about profile-based optimization and a more detailed description of the profiling
infrastructure within the C6000, see Section 3.8.
--analyze=callgraph Instructs the compiler to generate weighted call graph analysis information.
--analyze=codecov Instructs the compiler to generate code coverage analysis information. This
option replaces the previous --codecov option.
--analyze_only Halts compilation after generation of analysis information is completed.
TI_WCGDATA Allows you to specify a single output CSV file for all weighted call graph analysis
information. New information is appended to the CSV file identified by this
environment variable, if the file already exists.
TI_ANALYSIS_DIR Specifies the directory in which the output analysis file will be generated. The
same environment variable can be used for both code coverage information and
weighted call graph information (all analysis files generated by pprof6x will be
written to the directory specified by the TI_ANALYSIS_DIR environment variable).
3.9.4.5 Linker
The compiler prioritizes the placement of a function relative to others based on the order in which --
preferred_order options are encountered during the linker invocation. The syntax is:
--preferred_order=function specification
unordered()
There are several ways in which this dynamic profile information can be collected. For example, if you are
running your application on hardware, you may have the capability to collect a PC discontinuity trace. The
discontinuity trace can then be post-processed to construct weighted call graph input information for the
clt6x.
The method for collecting dynamic profile information that is presented here relies on the path profiling
capabilities in the C6000 code generation tools. Here is how it works:
1. Build an instrumented application using the --gen_profile_info option.
Using --gen_profile_info instructs the compiler to embed counters into the code along the execution
paths of each function.
To compile only use:
Using pdd6x produces a .prf file which is then fed into the re-compile of the application that uses the
profile information to generate weighted call graph input data.
The use of -mo instructs the compiler to generate code for each function into its own subsection. This
option provides the linker with the means to directly control the placement of the code for a given
function.
The compiler generates a CSV file containing weighted call graph information for each source file that
is specified on the command line. If such a CSV file already exists, then new call graph analysis
information will be appended to the existing CSV file. These CSV files are then input to the cache
layout tool (clt6x) to produce a preferred function order command file for your application.
For more details on the content of the CSV files (containing weighted call graph information) generated
by the compiler, see Section 3.9.6.
The output of clt6x is a text file containing a sequence of --preferred_order=function specification options.
By default, the name of the output file is forder.cmd, but you can specify your own file name with the -o
option. The order in which functions appear in this file is their preferred function order as determined by
the clt6x.
In general, the proximity of one function to another in the preferred function order list is a reflection of how
often the two functions call each other. If two functions are very close to each other in the list, then the
linker interprets this as a suggestion that the two functions should be placed very near to one another.
Functions that are placed close together are less likely to create a cache conflict miss at run time when
both functions are active at the same time. The overall effect should be an improvement in program
instruction cache efficiency and performance.
The preferred function order command file, forder.cmd, contains a list of --preferred_order=function
specification options. The linker prioritizes the placement of functions relative to each other in the order
that the --preferred_order options are encountered during the linker invocation.
Each --preferred_order option contains a function specification. A function specification can describe
simply the name of the function for a global function, or it can provide the path name and source file name
where the function is defined. A function specification that contains path and file name information is used
to distinguish one static function from another that has the same function name.
The --preferred_order options are interpreted by the linker as suggestions to guide the placement of
functions relative to each other. They are not explicit placement instructions. If an object file or input
section is explicitly mentioned in a linker command file SECTIONS directive, then the placement
instruction specified in the linker command file takes precedence over any suggestion from a --
preferred_order option that is associated with a function that is defined in that object file or input section.
This precedence can be relaxed by applying the unordered() operator to an output specification as
described in Section 3.9.7.
3.9.6 Comma-Separated Values (CSV) Files with Weighted Call Graph (WCG) Information
The format of the CSV files generated by the compiler under the --analyze=callgraph --use_profile_info
option combination is as follows:
"caller","callee","weight" [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
...
*(.text)
} > PMEM
...
}
In this SECTIONS directive, the specification of .text explicitly dictates the order in which functions are laid
out in the output section. Thus by default, the linker will layout func_a through func_h in exactly the order
that they are specified, regardless of any other placement priority criteria (such as a preferred function
order list that is enumerated by --preferred_order options).
The unordered() operator can be used to relax this constraint on the placement of the functions in the
'.text' output section so that placement can be guided by other placement priority criteria.
The unordered() operator can be applied to an output section as in Example 3-2.
SECTIONS
{
.text: unordered()
{
file.obj(.text:func_a)
file.obj(.text:func_b)
file.obj(.text:func_c)
file.obj(.text:func_d)
file.obj(.text:func_e)
file.obj(.text:func_f)
file.obj(.text:func_g)
file.obj(.text:func_h)
*(.text)
} > PMEM
...
}
output attributes/
section page origin length input sections
-------- ---- ---------- ---------- ----------------
.text 0 00000020 00000120
00000020 00000020 file.obj (.text:func_g:func_g)
00000040 00000020 file.obj (.text:func_b:func_b)
00000060 00000020 file.obj (.text:func_d:func_d)
00000080 00000020 file.obj (.text:func_a:func_a)
000000a0 00000020 file.obj (.text:func_c:func_c)
000000c0 00000020 file.obj (.text:func_f:func_f)
000000e0 00000020 file.obj (.text:func_h:func_h)
00000100 00000020 file.obj (.text:func_e:func_e)
SECTIONS
{
.text: unordered()
{
file.obj(.text:func_a)
file.obj(.text:func_b)
file.obj(.text:func_c)
file.obj(.text:func_d)
. += 0x100;
file.obj(.text:func_e)
file.obj(.text:func_f)
file.obj(.text:func_g)
file.obj(.text:func_h)
*(.text)
} > PMEM
...
}
In Example 3-4, a dot (.) expression, ". += 0x100;", separates the explicit specification of two groups of
functions in the output section. In this case, the linker will honor the specified position of the dot (.)
expression with respect to the functions on either side of the expression. That is, the unordered() operator
will allow the preferred function order list to guide the placement of func_a through func_d relative to each
other, but none of those functions will be placed after the hole that is created by the dot (.) expression.
Likewise, the unordered() operator allows the preferred function order list to influence the placement of
func_e through func_h relative to each other, but none of those functions will be placed before the hole
that is created by the dot (.) expression.
SECTIONS
{
GROUP
{
.grp1:
{
file.obj(.grp1:func_a)
file.obj(.grp1:func_b)
file.obj(.grp1:func_c)
file.obj(.grp1:func_d)
} unordered()
.grp2:
{
file.obj(.grp2:func_e)
file.obj(.grp2:func_f)
file.obj(.grp2:func_g)
file.obj(.grp2:func_h)
}
} > PMEM
...
}
The SECTIONS directive in Example 3-5 applies the unordered() operator to the first member of the
GROUP. The .grp1 output section layout can then be influenced by other placement priority criteria (like
the preferred function order list), whereas the .grp2 output section will be laid out as explicitly specified.
The unordered() operator cannot be applied to an entire GROUP or UNION. Attempts to do so will result
in a linker command file syntax error and the link will be aborted.
3.10.1 Use the --aliased_variables Option When Certain Aliases are Used
The compiler, when invoked with optimization, assumes that any variable whose address is passed as an
argument to a function is not subsequently modified by an alias set up in the called function. Examples
include:
• Returning the address from a function
• Assigning the address to a global variable
If you use aliases like this in your code, you must use the --aliased_variables option when you are
optimizing your code. For example, if your code is similar to this, use the --aliased_variables option:
int *glob_ptr;
g()
{
int x = 1;
int *p = f(&x);
*p = 5; /* p aliases x */
*glob_ptr = 10; /* glob_ptr aliases x */
h(x);
}
3.10.2 Use the --no_bad_aliases Option to Indicate That These Techniques Are Not Used
The --no_bad_aliases option informs the compiler that it can make certain assumptions about how aliases
are used in your code. These assumptions allow the compiler to improve optimization. The --
no_bad_aliases option also specifies that loop-invariant counter increments and decrements are non-zero.
Loop invariant means the value of an expression does not change within the loop.
• The --no_bad_aliases option indicates that your code does not use the aliasing technique described in
Section 3.10.1. If your code uses that technique, do not use the --no_bad_aliases option. You must
compile with the --aliased_variables option.
Do not use the --aliased_variables option with the --no_bad_aliases option. If you do, the --
no_bad_aliases option overrides the --aliased_variables option.
• The --no_bad_aliases option indicates that a pointer to a character type does not alias (point to) an
object of another type. That is, the special exception to the general aliasing rule for these types given
in section 3.3 of the ISO specification is ignored. If you have code similar to the following example, do
not use the --no_bad_aliases option:
{
long l;
char *p = (char *) &l;
p[2] = 5;
}
• The --no_bad_aliases option indicates that indirect references on two pointers, P and Q, are not
aliases if P and Q are distinct parameters of the same function activated by the same call at run time.
If you have code similar to the following example, do not use the --no_bad_aliases option:
g(int j)
{
int a[20];
int g()
{
return f(5, -4); /* -4 is a negative index */
return f(0, 96); /* 96 exceeds 20 as an index */
return f(4, 16); /* This one is OK */
}
If an Advice file is requested, but there is no advice, the advice file will not be created;
rather the compiler prints a message to stdout :
"filename.c": advice #27004: No Performance Advice is generated.
Note that Advice to prevent Software Pipeline Disqualification (such as that presented above) will also be
printed in the .asm file. So, func.asm will contain :
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;* Disqualified loop: Loop contains a call
;* Loop at line 8 cannot be scheduled efficiently as it contains a
;* function call ("_init"). Try making "_init" an inline function.
;* Disqualified loop: Loop contains non-pipelinable instructions
;* Disqualified loop: Loop contains a call
;* Loop at line 8 cannot be scheduled efficiently as it contains a
;* function call ("_calculate"). Try to inline call or consider
;* rewriting loop.
;* Disqualified loop: Loop contains non-pipelinable instructions
;*----------------------------------------------------------------------------*
If --advice_dir option and full pathname are specified together, --advice:performance_dir option is ignored, and
the advice is generated in the full pathname advice file. Also, note that directory "mydir" must already exist for
an advice file to be created in there.
Your compilation is being done without any optimization options (-o0 and above). This prevents the
compiler from using its most powerful optimization techniques, since the -o (--opt_level) options are the
foundtions for most other optimizations. You could get substantially better performance using -o2 (or
above) optimization. For C6000, optimization option -o2 is required for the software pipelining loop
optimization, which is crucial to getting good performance.
The C/C++ compiler is able to perform various optimizations, but you need to specify optimization options
on the command line so that these optimizations are performed. The easiest way to invoke optimization is
to specify the --opt_level=n option on the compiler command line. You can use -On to alias the --opt_level
option. The n denotes the level of optimization (0, 1, 2, and 3), which controls the type and degree of
optimization. See "Invoking Optimization" in Section 3.1 for more information on Optimization Options.
Your compilation uses low-level optimization options (-o1 and below), which prevents the compiler from
using its most powerful optimization techniques.
The C/C++ compiler is able to perform various optimizations, but you can control the level of these
optimizations. High-level optimizations are performed in the optimizer and low-level, target-specific
optimizations occur in the code generator. You must use high-level optimizations to achieve optimal code.
You can invoke optimization by specifying the --opt_level=n option on the compiler command line.
See "Invoking Optimization" in Section 3.1 for more information on Optimization Options. Also see
information for Advice #27000 in Section 3.14.1.1.
Your compilation is being done using -mu, which turns off software-pipelining. Software-pipelining is a key
optimization for achieving good performance. This Advice is issued to alert you to NOT use compiler
option -mu. -mu is a good option for debugging, but it is recommended that this option not be used for
production code because of the negative performance implications.
In general, to achieve maximal performance, avoid using the following in production code :
• -g: Support full symbolic debug, and is a good option for debugging. This inhibits code reordering
across source line boundaries and limits optimizations around function boundaries. This results in less
parallelism, more nops and generally less efficient schedules. Using this option can cause a 30-50%
performance degradation for control code, generally somewhat less but still significant degradation for
performance critical code.
• -ss: Interlist source code into assembly file. As with -g, this option can negatively impact performance.
• -mu: Turns off software-pipelining, which is a key optimization for achieving good performance. This is
a good option for debugging, but is not recommended for use in production code due to negative
performance implications.
The compiler detects that your compilation is being done using -g, which limits software-pipelining.
Software-pipelining is a key optimization for achieving good performance. This Advice is issued to alert
you to NOT use compiler option -g. -g is a good option for debugging, but it is recommended that this
option not be used for production code because of the negative performance implications. Also see Advice
#27002 in Section 3.14.1.3.
The compiler detects that your compilation is being done using --advice:performance option, but the
compiler has no Advice to report. This Advice is issued to alert you to the fact that no Advice is being
emitted, and an Advice file will not be created (if one was requested).
The compiler attempts to perform the software pipeline loop optimization at optimization levels --
opt_level=2 (or -O2) and -O3. If there is a call in the loop, the compiler will attempt to completely inline the
called function, but sometimes this is not possible. If the compiler cannot inline the called function,
software pipelining cannot be performed. This can severely reduce the performance of the loop.
In the testcase below, the call to the function "func2" prevents software pipelining. Inlining function "func2"
or rewriting the loop to avoid a function call can avoid pipeline disqualification. If the loop pipelines
successfully you may see performance improvement.
void func1(int *p, int *q, int n)
p[i] = q[i] + t;
}
}
The compiler can insert calls to special functions in the run-time support library (RTS) to support
operations that are not natively supported by the ISA. For instance, while C6000 floating-point ISAs
support instructions to convert between floating-point and signed integer values, they don't support
conversion between floating-point and unsigned integer values. If you use unsigned variables in floating
point expressions, the compiler will generate a call to an RTS routine to carry out this function. Such a call
will disable software pipelining.
You can change the unsigned variables in your code to signed variables and prevent this from happening.
The compiler will then be able to use the native hardware instead of adding the special function call, so
you may get better performance.
An asm statement inserted in a C code loop will disqualify the loop for software pipelining. Software-
pipelining is a key optimization for achieving good performance. You may see reduced performance
without software pipelining.
Replace the asm() statement with native C, or an intrinsic function call to prevent this from happening.
Your code contains a complex conditional expression, possibly a large "if" clause, within a loop, which is
preventing optimization. The C6000 compiler will optimize small “if” statements (“if” statements with “if”
and “else” blocks that are short or empty). The compiler will not optimize large "if" statements, and such
large if statements within the loop body will disqualify the loop for software pipelining. Software-pipelining
is a key optimization; you may see reduced performance without it.
In the examples below, Example 1 will pipeline, but Example 2 won't :
Example 1:
for (i=0; i < N; i++)
{
if (!flag)
{
//statements
}
else
{
x[i] = y[i];
}
}
Example 2:
for (i = 0; i < n; i++)
{
if (!flag)
{
//statements
}
else
{
if (flag == 1) x[i] = y[i];
}
}
Example 1 will have significantly better performance than Example 2 becaues it pipelines successfully. But
Example 2 can be pipelined if the code is modified to eliminate the nested "if" :
for (i = 0; i < n; i++)
{
if (!flag)
{
//statements
}
else
{
p = (flag == 1);
x[i] = !p * x[i] + p * y[i] ;
}
}
There is a switch statement within the loop. A switch statement in a loop will disqualify the loop for
software pipelining. Software-pipelining is a key optimization; you may see reduced performance without
it.
Try and rewrite the loop without a switch statement.
The compiler can insert calls to special functions in the run-time support library (RTS) to support
operations that are not natively supported by the ISA. For example, the compiler calls __c6xabi_divi()
(_divi() in COFF) function to perform 32-bit integer divide operation. Such functions are called compiler
helper functions, and result in a function call withing the loop body. In the example below, the compiler will
accomplish the division operation by calling the compiler helper function "_divi" :
void func(float *p, float n)
{
int i;
For improved performance, at optimization levels --opt_level=2 (-O2) and --opt_level=3 (-O3), the compiler
attempts to software pipeline your loops. Sometimes the compiler may not be able to inline a function call
that is in a loop. Because the compiler could not inline the function call, the loop could not be software
pipelined, and the loop could not be efficiently scheduled.
For example, in the testcase below, call to function "func2" prevents software pipelining:
void func1(int *p, int *q, int n)
{
unsigned int i;
; other operations
}
}
int function func2() { . . . }
However if function func2 is inlined, it saves the overhead of a function call. The compiler is free to
optimize the function in context with surrounding code. Automatic inlining is controlled by the "inline"
keyword; use it to allow inlining of specific functions :
inline int function func2() { . . . }
The compiler inserts calls to special functions in the run-time support library (RTS) to support operations
that are not natively supported by the instruction set architecture (ISA). For example, C6000 fixed point
ISAs do not support floating-point instructions and the compiler will generate a call to an RTS routine to
carry out the floating point operation. In the testcase below, the floating-point multiplication is unavailable
for a fixed-point device such as C6200:
void func(float *p, float *q, int n)
{
unsigned int i;
If compiled for C6200 (compiler option -mv6200) the compiler will use an RTS call to carry out the
operation. Such a call will disable software pipelining. You can rewrite the operation, or use a fixed point
operation to prevent this.
Also see Advice #30001 in Section 3.14.1.7.
To help the compiler determine memory dependencies, you can qualify a pointer, reference, or array with
the restrict keyword. The restrict keyword is a type qualifier that can be applied to pointers, references,
and arrays. Its use represents a guarantee by you, the programmer, that within the scope of the pointer
declaration the object pointed to can be accessed only by that pointer. Any violation of this guarantee
renders the program undefined.
To see more information on using restrict, refer to Section 6.5.5
The C6000 architecture is partitioned into two nearly symmetric halves. The resource breakdown
displayed in the software pipelining information in the asm file, is computed after the compiler has
partitioned instructions to either the A-side or the B-side. If the resources are imbalanced (i.e.; some
resources on one side are used more than resources on the other) software pipelinging is resource-
bound, and the loop cannot be efficiently scheduled. If the compiler has information about the trip-count
for the loop, it can unroll the loop to balance resource usage, and get better pipelining. You can give loop
trip-count information to the compiler using the "MUST_ITERATE" pragma.
To see more information on using the MUST_ITERATE pragma, refer to Section 6.9.20
Most loops have memory access instructions. The compiler attempts to use wider load instructions, and
aligned memory accesses instead of non-aligned memory accesses to reduce/balance out resources used
for the memory access instructions. One of the ways to let the compiler know that it is safe to use "wider"
loads is to use the keyword "_nassert".
To find out more on using the _nassert keyword, see Section 7.5.10.
When you use the --optimizer_interlist option with optimization, the interlist feature does not run as a
separate pass. Instead, the compiler inserts comments into the code, indicating how the compiler has
rearranged and optimized the code. These comments appear in the assembly language file as comments
starting with ;**. The C/C++ source code is not interlisted, unless you use the --c_src_interlist option also.
The interlist feature can affect optimized code because it might prevent some optimization from crossing
C/C++ statement boundaries. Optimization makes normal source interlisting impractical, because the
compiler extensively rearranges your program. Therefore, when you use the --optimizer_interlist option,
the compiler writes reconstructed C/C++ statements.
Example 3-9 shows a function that has been compiled with optimization (--opt_level=2) and the --
optimizer_interlist option. The assembly file contains compiler comments interlisted with assembly code.
When you use the --c_src_interlist and --optimizer_interlist options with optimization, the compiler inserts
its comments and the interlist feature runs before the assembler, merging the original C/C++ source into
the assembly file.
Example 3-10 shows the function from Example 3-9 compiled with the optimization (--opt_level=2) and the
--c_src_interlist and --optimizer_interlist options. The assembly file contains compiler comments and C
source interlisted with assembly code.
Example 3-9. The Function From Example 2-4 Compiled With the -O2 and --optimizer_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
.line 3
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
.line 4
ZERO .L1 A4
.line 5
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
Example 3-10. The Function From Example 2-4 Compiled with the --opt_level=2, --optimizer_interlist, and
--c_src_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
;------------------------------------------------------------------------------
; 5 | printf("Hello, world\n");
;------------------------------------------------------------------------------
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
;------------------------------------------------------------------------------
; 6 | return 0;
;------------------------------------------------------------------------------
ZERO .L1 A4
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
Profile Points
NOTE: In Code Composer Studio, when symbolic debugging is not used, profile points can only be
set at the beginning and end of functions.
The initial mechanism for controlling code space, the --opt_for_space option, has the following
equivalences with the --opt_for_speed option:
--opt_for_space --opt_for_speed
none =4
=0 =3
=1 =2
=2 =1
=3 =0
Optimization See
Cost-based register allocation Section 3.14.5.1
Alias disambiguation Section 3.14.5.1
Branch optimizations and control-flow simplification Section 3.14.5.3
Data flow optimizations Section 3.14.5.4
• Copy propagation
• Common subexpression elimination
• Redundant assignment elimination
Expression simplification Section 3.14.5.5
Inline expansion of functions Section 3.14.5.6
Function Symbol Aliasing Section 3.14.5.7
Induction variable optimizations and strength reduction Section 3.14.5.8
Loop-invariant code motion Section 3.14.5.9
Loop rotation Section 3.14.5.10
Instruction scheduling Section 3.14.5.11
For this example, the compiler makes aaa an alias of bbb, so that at link time all calls to function aaa
should be redirected to bbb. If the linker can successfully redirect all references to aaa, then the body of
function aaa can be removed and the symbol aaa is defined at the same address as bbb.
The assembly optimizer allows you to write assembly code without being concerned with the pipeline
structure of the C6000 or assigning registers. It accepts linear assembly code, which is assembly code
that may have had register-allocation performed and is unscheduled. The assembly optimizer assigns
registers and uses loop optimizations to turn linear assembly into highly parallel assembly.
Profile
Efficient Yes
Complete
enough?
No
Refine C/C++ code
Phase 2:
Refine C/C++ Compile
code
Profile
Efficient Yes
Complete
enough?
No
Yes
More C/C++
optimizations?
No
Write/refine linear assembly
Phase 3:
Write linear Assembly optimize
assembly
Profile
No
Efficient
enough?
Yes
Complete
• TMS320C6000 instructions
When you are writing your linear assembly, your code does not need to indicate the following:
– Pipeline latency
– Register usage
– Which unit is being used
As with other code generation tools, you might need to modify your linear assembly code until you are
satisfied with its performance. When you do this, you will probably want to add more detail to your
linear assembly. For example, you might want to partition or assign some registers.
label[:] Labels are optional for all assembly language instructions and for most (but not all)
assembly optimizer directives. When used, a label must begin in column 1 of a source
statement. A label can be followed by a colon.
[ register ] Square brackets ([ ]) enclose conditional instructions. The machine-instruction
mnemonic is executed based on the value of the register within the brackets; valid
register names are A0 for C6400, C6400+, C6740, and C6600 only; A1, A2, B0, B1,
B2, or symbolic.
mnemonic The mnemonic is a machine-instruction (such as ADDK, MVKH, B) or assembly
optimizer directive (such as .proc, .trip)
unit specifier The optional unit specifier enables you to specify the functional unit operand. Only the
specified unit side is used; other specifications are ignored. The preferred method is
specifying register sides.
operand list The operand list is not required for all instructions or directives. The operands can be
symbols, constants, or expressions and must be separated by commas.
comment Comments are optional. Comments that begin in column 1 must begin with a
semicolon or an asterisk; comments that begin in any other column must begin with a
semicolon.
The C6000 assembly optimizer reads up to 200 characters per line. Any characters beyond 200 are
truncated. Keep the operational part of your source statements (that is, everything other than comments)
less than 200 characters in length for correct assembly. Your comments can extend beyond the character
limit, but the truncated portion is not included in the .asm file.
Follow these guidelines in writing linear assembly code:
• All statements must begin with a label, a blank, an asterisk, or a semicolon.
• Labels are optional; if used, they must begin in column 1.
• One or more blanks must separate each field. Tab characters are interpreted as blanks. You must
separate the operand list from the preceding field with a blank.
• Comments are optional. Comments that begin in column 1 can begin with an asterisk or a semicolon (*
or ;) but comments that begin in any other column must begin with a semicolon.
• If you set up a conditional instruction, the register must be surrounded by square brackets.
• A mnemonic cannot begin in column 1 or it is interpreted as a label.
Refer to the TMS320C6000 Assembly Language Tools User's Guide for information on the syntax of
C6000 instructions, including conditional instructions, labels, and operands.
loop: .trip 25
LDW *a_0++[2], val0 ; load a[0-1]
LDW *b_0++[2], val1 ; load b[0-1]
MPY val0, val1, prod1 ; a[0] * b[0]
MPYH val0, val1, prod2 ; a[1] * b[1]
ADD prod1, prod2, tmp0 ; sum0 += (a[0]*b[0]) +
ADD tmp0, sum0, sum0 ; (a[1]*b[1])
.return sum
.endproc
int sum, I;
The old method of partitioning registers indirectly by partitioning instructions can still be used. Side and
functional unit specifiers can still be used on instructions. However, functional unit specifiers (.L/.S/.D/.M)
are ignored. Side specifiers are translated into partitioning constraints on the corresponding symbolic
names, if any. For example:
MV .1 x, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
In the linear assembler, you can also specify register pairs using the .cproc and/or .reg directive as in
Example 4-3:
.global foopair
foopair: .cproc q1:q0,s0
.reg r1:r0
ADD q1:q0, s0, r1:r0
.return r1:r0
.endproc
In Example 4-3, the expression "q1:q0" means that the first argument into the linear assembly function is a
register pair. By the C calling conventions, the pair "q1:q0" symbols are mapped to register pair "a5:a4".
When a register pair syntax is used as the argument to a .reg directive (as shown), it means that the two
register symbols are constrained to be an aligned register pair when the compiler processes the linear
assembler source and allocates actual registers that the register pair symbols map to "r1:r0" as shown.
The 7.2. Beta compiler supports a register quad syntax (C6600 only), in order to specify 128-bit operands
of 128-bit capable instructions in linear assembly and assembly source code. Example 4-4 illustrates how
you can specify register quads:
.global fooquad
fooquad: .cproc q3:q2:q1:q0, s3:s2:s1:s0
.reg r3:r2:r1:r0
QMPY32 s3:s2:s1:s0, q3:q2:q1:q0, r3:r2:r1:r0
.return r3:r2:r1:r0
.endproc
In Example 4-4, the expression "q3:q2:q1:q0" means that the first argument into the linear assembly
function is a register quad. By the C calling conventions, the quad "q3:q2:q1:q0" symbols are mapped to
register quad "a7:a6:a5:a4". When a register quad syntax is used as the argument to a .reg directive (as
shown), it means that the four register symbols are constrained to be an aligned register quad when the
compiler processes the linear assembler source and allocates actual registers that the register quad
symbols map to "r3:r2:r1:r0" as shown.
There are several ways to enter the unit specifier filed in linear assembly. Of these, only the specific
register side information is recognized and used:
• You can specify the particular functional unit (for example, .D1).
• You can specify the .D1 or .D2 functional unit followed by T1 or T2 to specify that the nonmemory
operand is on a specific register side. T1 specifies side A and T2 specifies side B. For example:
LDW .D1T2 *A3[A4], B3
LDW .D1T2 *src, dst
• You can specify only the data path (for example, .1), and the assembly optimizer assigns the functional
type (for example, .L1).
For more information on functional units refer to the TMS320C6000 CPU and Instruction Set Reference
Guide.
.reg t0,t1,p,i,sh:sl
MVK 100,i
ZERO sh
ZERO sl
.return sh:sl
.endproc
To disable this format with symbolic names and display assembly instructions with actual registers
instead, compile with the --machine_regs option.
Description Use the .call directive to call a function. Optionally, you can specify a register that is
assigned the result of the call. The register can be a symbolic or machine register. The
.call directive adheres to the same register and function calling conventions as the
C/C++ compiler. For information, see Section 7.3 and Section 7.4. There is no support
for alternative register or function calling conventions.
You cannot call a function that has a variable number of arguments, such as printf. No
error checking is performed to ensure the correct number and/or type of arguments is
passed. You cannot pass or return structures through the .call directive.
Following is a description of the .call directive parameters:
By default, the compiler generates near calls and the linker utilizes trampolines if the
near call will not reach its destination. To force a far call, you must explicitly load the
address of the function into a register, and then issue an indirect call. For example:
MVK func,reg
MVKH func,reg
.call reg(op1) ; forcing a far call
If you want to use * for indirection, you must abide by C/C++ syntax rules, and use the
following alternate syntax:
.call [ret_reg =] (* ireg)([arg1, arg2,...])
For example:
.call (*driver)(op1, op2) ; indirect call
.reg driver
.call driver(op1, op2) ; also an indirect call
Here are other valid examples that use the .call syntax.
.call fir(x, h, y) ; void function
Since you can use machine register names anywhere you can use symbolic registers, it
may appear you can change the function calling convention. For example:
.call A6 = compute()
It appears that the result is returned in A6 instead of A4. This is incorrect. Using machine
registers does not override the calling convention. After returning from the compute
function with the returned result in A4, a MV instruction transfers the result to A6.
Description The .circ directive assigns a symbolic register name to a machine register and declares
the symbolic register as available for circular addressing. The compiler then assigns the
variable to the register and ensures that all code transformations are safe in this
situation. You must insert setup/teardown code for circular addressing.
The compiler assumes that it is safe to speculate any load using an explicitly declared
circular addressing variable as the address pointer and may exploit this assumption to
perform optimizations.
When a symbol is declared with the .circ directive, it is not necessary to declare that
symbol with the .reg directive.
The .circ directive is equivalent to using .map with a circular declaration.
Example Here the symbolic name Ri is assigned to actual machine register Mi and Ri is declared
as potentially being used for circular addressing.
.CIRC R1/M1, R2/M2 ...
Description Use the .cproc/.endproc directive pair to delimit a section of your code that you want
the assembly optimizer to optimize and treat as a C/C++ callable function. This section is
called a procedure. The .cproc directive is similar to the .proc directive in that you use
.cproc at the beginning of a section and .endproc at the end of a section. In this way, you
can set off sections of your assembly code that you want to be optimized, like functions.
The directives must be used in pairs; do not use .cproc without the corresponding
.endproc. Specify a label with the .cproc directive. You can have multiple procedures in a
linear assembly file.
The .cproc directive differs from the .proc directive in that the compiler treats the .cproc
region as a C/C++ callable function. The assembly optimizer performs some operations
automatically in a .cproc region in order to make the function conform to the C/C++
calling conventions and to C/C++ register usage conventions.
These operations include the following:
• When you use save-on-entry registers (A10 to A15 and B10 to B15), the assembly
optimizer saves the registers on the stack and restores their original values at the
end of the procedure.
• If the compiler cannot allocate machine registers to symbolic register names specified
with the .reg directive (see the .reg topic) it uses local temporary stack variables. With
.cproc, the compiler manages the stack pointer and ensures that space is allocated
on the stack for these variables.
For more information, see Section 7.3 and Section 7.4.
Use the optional argument to represent function parameters. The argument entries are
very similar to parameters declared in a C/C++ function. The arguments to the .cproc
directive can be of the following types:
• Machine-register names. If you specify a machine-register name, its position in the
argument list must correspond to the argument passing conventions for C (see
Section 7.4). For example, the C/C++ compiler passes the first argument to a
function in register A4. This means that the first argument in a .cproc directive must
be A4 or a symbolic name. Up to ten arguments can be used with the .cproc
directive.
• Variable names.If you specify a variable name, then the assembly optimizer ensures
that either the variable name is allocated to the appropriate argument passing
register or the argument passing register is copied to the register allocated for the
variable name. For example, the first argument in a C/C++ call is passed in register
A4, so if you specify the following .cproc directive:
frame .cproc arg1
The assembly optimizer either allocates arg1 to A4, or arg1 is allocated to a different
register (such as B7) and an MV A4, B7 is automatically generated.
• Register pairs. A register pair is specified as arghi:arglo and represents a 40-bit
argument or a 64-bit type double argument.
For example, the .cproc defined as follows:
_fcn: .cproc arg1, arg2hi:arg2lo, arg3, B6, arg5, B9:B8
...
.return res
...
.endproc
corresponds to a C function declared as:
int fcn(int arg1, long arg2, int arg3, int arg4, int arg5, long arg6);
In this example, the fourth argument of .cproc is register B6. This is allowed since the
fourth argument in the C/C++ calling conventions is passed in B6. The sixth
argument of .cproc is the actual register pair B9:B8. This is allowed since the sixth
argument in the C/C++ calling conventions is passed in B8 or B9:B8 for longs.
• Register quads (C6600 only). A register quad is specified as r3:r2:r1:r0 and
represents a 128-bit type, __x128_t. See Example 4-4.
If you are calling a procedure from C++ source, you must use the appropriate linkname
for the procedure label. Otherwise, you can force C naming conventions by using the
extern C declaration. See Section 6.12 and Section 7.5 for more information.
When .endproc is used with a .cproc directive, it cannot have arguments. The live out set
for a .cproc region is determined by any .return directives that appear in the .cproc
region. (A value is live out if it has been defined before or within the procedure and is
used as an output from the procedure.) Returning a value from a .cproc region is
handled by the .return directive. The return branch is automatically generated in a .cproc
region. See the .return topic for more information.
Only code within procedures is optimized. The assembly optimizer copies any code that
is outside of procedures to the output file and does not modify it. See Section 4.4.1 for a
list of instruction types that cannot appear in a .cproc region.
LOOP:
AND cword,mask,cond ; cond = codeword & mask
[cond] MVK 1,cond ; !(!(cond))
CMPEQ theta,cond,if ; (theta == !(!(cond)))
LDH *a++,ai ; a[i]
[if] ADD sum,ai,sum ; sum += a[i]
[!if] SUB sum,ai,sum ; sum -= a[i]
SHL mask,1,mask ; mask = mask << 1
[cntr] ADD -1,cntr,cntr ; decrement counter
[cntr] B LOOP ; for LOOP
.return sum
.endproc
Description The .map directive assigns symbol names to machine registers. Symbols are stored in
the substitution symbol table. The association between symbolic names and actual
registers is wiped out at the beginning and end of each linear assembly function. The
.map directive can be used in assembly and linear assembly files.
When a symbol is declared with the .map directive, it is not necessary to declare that
symbol with the .reg directive.
Example Here the .map directive is used to assign x to register A6 and y to register B7. The
symbols are used with a move statement.
.map x/A6, y/B7
MV x, y ; equivalent to MV A6, B7
The symbol used to name a memory reference has the same syntax restrictions as any
assembly symbol. (For more information about symbols, refer to the TMS320C6000
Assembly Language Tools User's Guide.) It is in the same space as the symbolic
registers. You cannot use the same name for a symbolic register and annotating a
memory reference.
The .mdep directive tells the assembly optimizer that there is a dependence between
two memory references.
The .mdep directive is valid only within procedures; that is, within occurrences of the
.proc and .endproc directive pair or the .cproc and .endproc directive pair.
Example Here is an example in which .mdep is used to indicate a dependence between two
memory references.
.mdep ld1, st1
Description The .mptr directive associates a register with the information that allows the assembly
optimizer to determine automatically whether two memory operations have a memory
bank conflict. If the assembly optimizer determines that two memory operations have a
memory bank conflict, then it does not schedule them in parallel.
A memory bank conflict occurs when two accesses to a single memory bank in a given
cycle result in a memory stall that halts all pipeline operation for one cycle while the
second value is read from memory. For more information on memory bank conflicts,
including how to use the .mptr directive to prevent them, see Section 4.5.
Following are descriptions of the .mptr directive parameters:
variable|memref The name of the register symbol or memory reference used to identify
a load or store involved in a dependence.
base A symbolic address that associates related memory accesses
offset The offset in bytes from the starting base symbol. The offset is an
optional parameter and defaults to 0.
stride The register loop increment in bytes. The stride is an optional
parameter and defaults to 0.
The .mptr directive tells the assembly optimizer that when the symbol or memref is used
as a memory pointer in an LD(B/BU)(H/HU)(W) or ST(B/H/W) instruction, it is initialized
to point to base + offset and is incremented by stride each time through the loop.
The .mptr directive is valid within procedures only; that is, within occurrences of the .proc
and .endproc directive pair or the .cproc and .endproc directive pair.
The symbolic addresses used for base symbol names are in a name space separate
from all other labels. This means that a symbolic register or assembly label can have the
same name as a memory bank base name. For example:
.mptr Darray,Darray
Example Here is an example in which .mptr is used to avoid memory bank conflicts.
_blkcp: .cproc I
loop: .trip 50
; potential conflict
LDW *ptr1++, tmp1 ; load *0, bank 0
STW tmp1, *ptr2++{foo} ; store *8, bank 0
.endproc
Syntax .no_mdep
Description The .no_mdep directive tells the assembly optimizer that no memory dependencies
occur within that function, with the exception of any dependencies pointed to with the
.mdep directive.
There is no guarantee that the symbol will be assigned to any register in the specified
group. The compiler may ignore the preference.
When a symbol is declared with the .pref directive, it is not necessary to declare that
variable with the .reg directive.
Description Use the .proc/.endproc directive pair to delimit a section of your code that you want the
assembly optimizer to optimize. This section is called a procedure. Use .proc at the
beginning of the section and .endproc at the end of the section. In this way, you can set
off sections of unscheduled assembly instructions that you want optimized by the
compiler. The directives must be used in pairs; do not use .proc without the
corresponding .endproc. Specify a label with the .proc directive. You can have multiple
procedures in a linear assembly file.
Use the optional variable parameter in the .proc directive to indicate which registers are
live in, and use the optional register parameter of the .endproc directive to indicate which
registers are live out for each procedure. The variable can be an actual register or a
symbolic name. For example:
.PROC x, A5, y, B7
...
.ENDPROC y
A value is live in if it has been defined before the procedure and is used as an input to
the procedure. A value is live out if it has been defined before or within the procedure
and is used as an output from the procedure. If you do not specify any registers with the
.endproc directive, it is assumed that no registers are live out.
Only code within procedures is optimized. The assembly optimizer copies any code that
is outside of procedures to the output file and does not modify it.
See Section 4.4.1 for a list of instruction types that cannot appear in a .proc region.
Example Here is a block move example in which .proc and .endproc are used:
move .proc A4, B4, B0
.no_mdep
loop:
LDW *B4++, A1
MV A1, B1
STW B1, *A4++
ADD -4, B0, B0
[B0] B loop
.endproc
Description The .reg directive allows you to use descriptive names for values that are stored in
registers. The assembly optimizer chooses a register for you such that its use agrees
with the functional units chosen for the instructions that operate on the value.
The .reg directive is valid within procedures only; that is, within occurrences of the .proc
and .endproc directive pair or the .cproc and .endproc directive pair.
Declaring register pairs (or register quads for C6600) explicitly is optional. Doing so is
only necessary if the registers should be allocated as a pair, but they are not used that
way. It is a best practice to declare register pairs and register quads with the pair/quad
syntax. Here is an example of declaring a register pair:
.reg A7:A6
Example 1 This example uses the same code as the block move example shown for .proc/.endproc
but the .reg directive is used:
move .cproc dst, src, cnt
Notice how this example differs from the .proc example: symbolic registers declared with
.reg are allocated as machine registers.
Example 2 The code in the following example is invalid, because a variable defined by the .reg
directive cannot be used outside of the defined procedure:
move .proc A4
.reg tmp
LDW *A4++, top
MV top, B5
.endproc
MV top, B6 ; WRONG: top is invalid outside of the procedure
Description Registers can be directly partitioned through two directives. The .rega directive is used
to constrain a symbol name to A-side registers. The .regb directive is used to constrain
a symbol name to B-side registers. For example:
.REGA y
.REGB u, v, w
MV x, y
LDW *u, v:w
The .rega and .regb directives are valid within procedures only; that is, within
occurrences of the .proc and .endproc directive pair or the .cproc and .endproc directive
pair.
When a symbol is declared with the .rega or .regb directive, it is not necessary to declare
that symbol with the .reg directive.
The old method of partitioning registers indirectly by partitioning instructions can still be
used. Side and functional unit specifiers can still be used on instructions. However,
functional unit specifiers (.L/.S/.D/.M) and crosspath information are ignored. Side
specifiers are translated into partitioning constraints on the corresponding symbol
names, if any. For example:
MV .1X z, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
Description The .reserve directive prevents the assembly optimizer from using the specified register
in a .proc or .cproc region.
If a .reserved register is explicitly assigned in a .proc or .cproc region, then the assembly
optimizer can also use that register. For example, the variable tmp1 can be allocated to
register A7, even though it is in the .reserve list, since A7 was explicitly defined in the
ADD instruction:
.cproc
.reserve a7
.reg tmp1
....
ADD a6, b4, a7
....
.endproc
Example 1 The .reserve in this example guarantees that the assembly optimizer does not use A10
to A13 or B10 to B13 for the variables tmp1 to tmp5:
test .proc a4, b4
.reg tmp1, tmp2, tmp3, tmp4, tmp5
.reserve a10, a11, a12, a13, b10, b11, b12, b13
.....
.endproc a4
Example 2 The assembly optimizer may generate less efficient code if the available register pool is
overly restricted. In addition, it is possible that the available register pool is constrained
such that allocation is not possible and an error message is generated. For example, the
following code generates an error since all of the conditional registers have been
reserved, but a conditional register is required for the variable tmp:
.cproc ...
.reserve a1,a2,b0,b1,b2
.reg tmp
....
[tmp] ....
....
.endproc
Description The .return directive function is equivalent to the return statement in C/C++ code. It
places the optional argument in the appropriate register for a return value as per the
C/C++ calling conventions (see Section 7.4).
The optional argument can have the following meanings:
• Zero arguments implies a .cproc region that has no return value, similar to a void
function in C/C++ code.
• An argument implies a .cproc region that has a 32-bit return value, similar to an int
function in C/C++ code.
• A register pair of the format hi:lo implies a .cproc region that has a 40-bit long, a 64-
bit long long, or a 64-bit type double return value; similar to a long/long long/double
function in C/C++ code.
Arguments to the .return directive can be either symbolic register names or machine-
register names.
All return statements in a .cproc region must be consistent in the type of the return value.
It is not legal to mix a .return arg with a .return hi:lo in the same .cproc region.
The .return directive is unconditional. To perform a conditional .return, simply use a
conditional branch around a .return. The assembly optimizer removes the branch and
generates the appropriate conditional code. For example, to return if condition cc is true,
code the return as:
[!cc] B around
.return
around:
Example This example uses a symbolic register, tmp, and a machine-register, A5, as .return
arguments:
.cproc ...
.reg tmp
...
.return tmp = legal symbolic name
...
.return a5 = legal actual name
Description The .trip directive specifies the value of the trip count. The trip count indicates how
many times a loop iterates. The .trip directive is valid within procedures only. Following
are descriptions of the .trip directive parameters:
label The label represents the beginning of the loop. This is a required
parameter.
minimum value The minimum number of times that the loop can iterate. This is a
required parameter. The default is 1.
maximum value The maximum number of times that the loop can iterate. The
maximum value is an optional parameter.
factor The factor used, along with minimum value and maximum value, to
determine the number of times that the loop can iterate. In the
following example, the loop executes some multiple of 8, between 8
and 48, times:
loop: .trip 8, 48, 8
If the assembly optimizer cannot ensure that the trip count is large enough to pipeline a
loop for maximum performance, a pipelined version and an unpipelined version of the
same loop are generated. This makes one of the loops a redundant loop. The pipelined
or the unpipelined loop is executed based on a comparison between the trip count and
the number of iterations of the loop that can execute in parallel. If the trip count is
greater or equal to the number of parallel iterations, the pipelined loop is executed;
otherwise, the unpipelined loop is executed. For more information about redundant
loops, see Section 3.3.
You are not required to specify a .trip directive with every loop; however, you should use
.trip if you know that a loop iterates some number of times. This generally means that
redundant loops are not generated (unless the minimum value is really small) saving
code size and execution time.
If you know that a loop always executes the same number of times whenever it is called,
define maximum value (where maximum value equals minimum value) as well. The
compiler may now be able to unroll your loop thereby increasing performance.
When you are compiling with the interrupt flexibility option (--interrupt_threshold=n),
using a .trip maximum value allows the compiler to determine the maximum number of
cycles that the loop can execute. Then, the compiler compares that value to the
threshold value given by the --interrupt_threshold option. See Section 2.12 for more
information.
Example The .trip directive states that the loop will execute 16, 24, 32, 40 or 48 times when the
w_vecsum routine is called.
w_vecsum: .cproc ptr_a, ptr_b, ptr_c, weight, cnt
.reg ai, bi, prod, scaled_prod, ci
.no_mdep
Description The .volatile directive allows you to designate memory references as volatile. Volatile
loads and stores are not deleted. Volatile loads and stores are not reordered with
respect to other volatile loads and stores.
If the .volatile directive references a memory location that may be modified during an
interrupt, compile with the --interrupt_threshold=1 option to ensure all code referencing
the volatile memory location can be interrupted.
.proc
.if
...
.endif
.endproc
Here are two examples of .if/.endif loops that are partly inside and partly outside of a .cproc or .proc
region:
.if
.cproc
.endif
.endproc
.proc
.if
...
.else
.endproc
.endif
• The following assembly instructions cannot be used from linear assembly:
– EFI
– SPLOOP, SPLOOPD and SPLOOPW and all other loop-buffer related instructions
– C6700+ instructions
– ADDKSP and DP-relative addressing
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
Bank 0 Bank 1 Bank 2 Bank 3
For devices that have more than one memory space (Figure 4-2), an access to bank 0 in one memory
space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
Memory 8M 8M + 1 8M + 2 8M + 3 8M + 4 8M + 5 8M + 6 8M + 7
space 1
For example:
.mptr a_0,a+0,16
.mptr a_4,a+4,16
LDW *a_0++[4], val1 ; base=a, offset=0, stride=16
LDW *a_4++[4], val2 ; base=a, offset=4, stride=16
.mptr dptr,D+0,8
LDH *dptr++, d0 ; base=D, offset=0, stride=8
LDH *dptr++, d1 ; base=D, offset=2, stride=8
LDH *dptr++, d2 ; base=D, offset=4, stride=8
LDH *dptr++, d3 ; base=D, offset=6, stride=8
In this example, the offset for dptr is updated after every memory access. The offset is updated only when
the pointer is modified by a constant. This occurs for the pre/post increment/decrement addressing modes.
See the .mptr topic for more information.
Example 4-6 shows loads and stores extracted from a loop that is being software pipelined.
Example 4-6. Load and Store Instructions That Specify Memory Bank Information
.mptr Ain,IN,-16
.mptr Bin,IN-4,-16
.mptr Aco,COEF,16
.mptr Bco,COEF+4,16
.mptr Aout,optr+0,4
.mptr Bout,optr+2,4
_dot: .cproc a, b
.reg sum0, sum1, I
.reg val1, val2, prod1, prod2
loop: .trip 50
LDW *a++,val1 ; load a[0-1] bank0
LDW *b++,val2 ; load b[0-1] bank2
MPY val1,val2,prod1 ; a[0] * b[0]
MPYH val1,val2,prod2 ; a[1] * b[1]
ADD prod1,sum0,sum0 ; sum0 += a[0] * b[0]
ADD prod2,sum1,sum1 ; sum1 += a[1] * b[1]
It is not always possible to control fully how arrays and other memory objects are aligned. This is
especially true when a pointer is passed into a function and that pointer may have different alignments
each time the function is called. A solution to this problem is to write a dot product routine that cannot
have memory hits. This would eliminate the need for the arrays to use different memory banks.
If the dot product loop kernel is unrolled once, then four LDW instructions execute in the loop kernel.
Assuming that nothing is known about the bank alignment of arrays a and b (except that they are word
aligned), the only safe assumptions that can be made about the array accesses are that a[0-1] cannot
conflict with a[2-3] and that b[0-1] cannot conflict with b[2-3]. Example 4-10 shows the unrolled loop
kernel.
Example 4-10. Dot Product From Example 4-8 Unrolled to Prevent Memory Bank Conflicts
ADD 4,a_0,a_4
ADD 4,b_0,b_4
MVK 25,i ; I = 100/4
ZERO sum0 ; multiply result = 0
ZERO sum1 ; multiply result = 0
.mptr a_0,a+0,8
.mptr a_4,a+4,8
.mptr b_0,b+0,8
.mptr b_4,b+4,8
loop: .trip 25
The goal is to find a software pipeline in which the following instructions are in parallel:
LDW *a0++[2],val1 ; load a[0-1] bankx
|| LDW *a2++[2],val2 ; load a[2-3] bankx+2
LDW *b0++[2],val1 ; load b[0-1] banky
|| LDW *b2++[2],val2 ; load b[2-3] banky+2
Without the .mptr directives in Example 4-10, the loads of a[0-1] and b[0-1] are scheduled in parallel, and
the loads of a[2-3] and b[2-3] might be scheduled in parallel. This results in a 50% chance that a memory
conflict will occur on every cycle. However, the loop kernel shown in Example 4-11 can never have a
memory bank conflict.
In Example 4-8, if .mptr directives had been used to specify that a and b point to different bases, then the
assembly optimizer would never find a schedule for a 1-cycle loop kernel, because there would always be
a memory bank conflict. However, it would find a schedule for a 2-cycle loop kernel.
.mptr a,RS
.mptr b,RS
.mptr c,XY
.mptr d,XY+2
LDW *a++[i0a],A0 ; a and b always conflict with each other
LDW *b++[i0b],B0 ;
STH A1,*c++[i1a] ; c and d never conflict with each other
STH B2,*d++[i1b] ;
The directive to indicate a specific memory dependence in the previous example is as follows:
.mdep ld1, st1
This means that whenever ld1 accesses memory at location X, some later time in code execution, st1 may
also access location X. This is equivalent to adding a dependence between these two instructions. In
terms of the software pipeline, these two instructions must remain in the same order. The ld1 reference
must always occur before the st1 reference; the instructions cannot even be scheduled in parallel.
It is important to note the directional sense of the directive from ld1 to st1. The opposite, from st1 to ld1, is
not implied. In terms of the software pipeline, while every ld1 must occur before every st1, it is still legal to
schedule the ld1 from iteration n+1 before the st1 from iteration n.
Example 4-14 is a picture of the software pipeline with the instructions from two different iterations in
different columns. In the actual instruction sequence, instructions on the same horizontal line are in
parallel.
STW { st1 }
If that schedule does not work because the iteration n st1 might write a value the iteration n+1 ld1 should
read, then you must note a dependence relationship from st1 to ld1.
.mdep st1, ld1
Both directives together force the software pipeline shown in Example 4-15.
Example 4-15. Software Pipeline Using .mdep st1, ld1 and .mdep ld1, st1
...
STW { st1 }
LDW { ld1 }
...
STW { st1 }
<Indexed addressing,...>
Indexed addressing, *+base[index], is a good example of an addressing mode where you typically do not
know anything about the relative sequence of the memory accesses, except they sometimes access the
same location. To correctly model this case, you need to note the dependence relation in both directions,
and you need to use both directives.
.mdep ld1, st1 .mdep st1, ld1
.return tmp
.endproc
• Example 2
Here, .mdep r2, r1 indicates that STW must occur before LDW. Since STW is after LDW in the code,
the dependence relation is across loop iterations. The STW instruction writes a value that may be read
by the LDW instruction on the next iteration. In this case, a 6-cycle recurrence is created.
fn: .cproc dst, src, cnt
.reg tmp
.no_mdep
.mdep r2, r1
.endproc
Volatile References
NOTE: For volatile references, use .volatile rather than .mdep.
The C/C++ compiler and assembly language tools provide two methods for linking your programs:
• You can compile individual modules and link them together. This method is especially useful when you
have multiple source files.
• You can compile and link in one step. This method is useful when you have a single source module.
This chapter describes how to invoke the linker with each method. It also discusses special requirements
of linking C/C++ code, including the run-time-support libraries, specifying the type of initialization, and
allocating the program into memory. For a complete description of the linker, see the TMS320C6000
Assembly Language Tools User's Guide.
5.1 Invoking the Linker Through the Compiler (-z Option) .......................................... 138
5.2 Linker Code Optimizations ................................................................................ 140
5.3 Controlling the Linking Process ........................................................................ 141
When you specify a library as linker input, the linker includes and links only those library members that
resolve undefined references. The linker uses a default allocation algorithm to allocate your program into
memory. You can use the MEMORY and SECTIONS directives in the linker command file to customize
the allocation process. For information, see the TMS320C6000 Assembly Language Tools User's Guide.
You can link a C/C++ program consisting of object files prog1.obj, prog2.obj, and prog3.obj, with an
executable object file filename of prog.out with the command:
cl6x --run_linker --rom_model prog1 prog2 prog3 --output_file=prog.out
--library=rts6200.lib
The --run_linker option divides the command line into the compiler options (the options before --
run_linker) and the linker options (the options following --run_linker). The --run_linker option must follow all
source files and compiler options on the command line.
All arguments that follow --run_linker on the command line are passed to the linker. These arguments can
be linker command files, additional object files, linker options, or libraries. These arguments are the same
as described in Section 5.1.1.
All arguments that precede --run_linker on the command line are compiler arguments. These arguments
can be C/C++ source files, assembly files, linear assembly files, or compiler options. These arguments are
described in Section 2.2.
You can compile and link a C/C++ program consisting of object files prog1.c, prog2.c, and prog3.c, with an
executable object file filename of prog.out with the command:
cl6x prog1.c prog2.c prog3.c --run_linker --rom_model --output_file=prog.out --library=rts6200.lib
<Linking>
Table 5-3. Uninitialized Sections Created by the Compiler for Both ABIs
Name Contents
.bss Global and static variables
.far Global and static variables declared far
.stack Stack
.sysmem Memory for malloc functions (heap)
When you link your program, you must specify where to allocate the sections in memory. In general,
initialized sections are linked into ROM or RAM; uninitialized sections are linked into RAM. With the
exception of code sections, the initialized and uninitialized sections created by the compiler cannot be
allocated into internal program memory. See Section 7.1.1 for a complete description of how the compiler
uses these sections.
The linker provides MEMORY and SECTIONS directives for allocating sections. For more information
about allocating sections into memory, see the TMS320C6000 Assembly Language Tools User's Guide.
The MEMORY and possibly the SECTIONS directives, might require modification to work with your
system. See the TMS320C6000 Assembly Language Tools User's Guide for more information on these
directives.
--rom_model
--heap_size=0x2000
--stack_size=0x0100
--library=rts6200.lib
MEMORY
{
VECS: o = 0x00000000 l = 0x000000400 /* reset & interrupt vectors */
PMEM: o = 0x00000400 l = 0x00000FC00 /* intended for initialization */
BMEM: o = 0x80000000 l = 0x000010000 /* .bss, .sysmem, .stack, .cinit */
}
SECTIONS
{
vectors > VECS
.text > PMEM
.data > BMEM
.stack > BMEM
.bss > BMEM
.sysmem > BMEM
.cinit > BMEM
.const > BMEM
.cio > BMEM
.far > BMEM
}
The C/C++ compiler supports the C/C++ language standard that was developed by a committee of the
American National Standards Institute (ANSI) and subsequently adopted by the International Standards
Organization (IS0).
The C++ language supported by the C6000 is defined by the ANSI/ISO/IEC 14882:1998 standard with
certain exceptions.
--check_misra={all|required|advisory|none|rulespec}
#pragma CHECK_MISRA ("{all|required|advisory|none|rulespec}");
#pragma RESET_MISRA ("{all|required|advisory|rulespec}");
Example: --check_misra=1-5,-1.1,7.2-4
• Checks topics 1 through 5
• Disables rule 1.1 (all other rules from topic 1 remain enabled)
• Checks rules 2 through 4 in topic 7
Two options control the severity of certain MISRA-C:2004 rules:
• The --misra_required option sets the diagnostic severity for required MISRA-C:2004 rules.
• The --misra_advisory option sets the diagnostic severity for advisory MISRA-C:2004 rules.
The syntax for these options is:
--misra_advisory={error|warning|remark|suppress}
--misra_required={error|warning|remark|suppress}
6.5 Keywords
The C6000 C/C++ compiler supports the standard const, register, restrict, and volatile keywords. In
addition, the C/C++ compiler extends the C/C++ language through the support of the cregister, interrupt,
near, and far keywords.
Using the const keyword, you can define large constant tables and allocate them into system ROM. For
example, to allocate a ROM table, you could use the following definition:
far const int digits[] = {0,1,2,3,4,5,6,7,8,9};
The cregister keyword can be used only in file scope. The cregister keyword is not allowed on any
declaration within the boundaries of a function. It can only be used on objects of type integer or pointer.
The cregister keyword is not allowed on objects of any floating-point type or on any structure or union
objects.
The cregister keyword does not imply that the object is volatile. If the control register being referenced is
volatile (that is, can be modified by some external control), then the object must be declared with the
volatile keyword also.
To use the control registers in Table 6-3, you must declare each register as follows. The c6x.h include file
defines all the control registers through this syntax:
Once you have declared the register, you can use the register name directly. See the TMS320C62x DSP
CPU and Instruction Set Reference Guide, TMS320C64x/C64x+ DSP CPU and Instruction Set Reference
Guide, the TMS320C67x/C67x+ DSP CPU and Instruction Set Reference Guide, or TMS320C66x+ DSP
CPU and Instruction Set Reference Guide for detailed information on the control registers.
See Example 6-1 for an example that declares and uses control registers.
The name c_int00 is the C/C++ entry point. This name is reserved for the system reset interrupt. This
special interrupt routine initializes the system and calls the function main. Because it has no caller, c_int00
does not save any registers.
Use the alternate keyword, __interrupt, if you are writing code for strict ANSI/ISO mode (using the --
strict_ansi compiler option).
near keyword The compiler assumes that the data item can be accessed relative to the data page
pointer. For example:
LDW *+dp(_address),a0
far keyword The compiler cannot access the data item via the DP. This can be required if the
total amount of program data is larger than the offset allowed (32K) from the DP.
For example:
MVKL _address, a1
MVKH _address, a1
LDW *a1,a0
Once a variable has been defined to be far, all external references to this variable in other C files or
headers must also contain the far keyword. This is also true of the near keyword. However, you will get
compiler or linker errors when the far keyword is not used everywhere. Not using the near keyword
everywhere only leads to slower data access times.
If you use the DATA_SECTION pragma, the object is indicated as a far variable, and this cannot be
overridden. If you reference this object in another file, then you need to use extern far when declaring this
object in the other source file. This ensures access to the variable, since the variable might not be in the
.bss section. For details, see Section 6.9.6.
When data objects do not have the near or far keyword specified, the compiler will use far accesses to
aggregate data and near accesses to non-aggregate data. For more information on the data memory
model and ways to control accesses to data, see Section 7.1.5.1.
near keyword The compiler assumes that destination of the call is within ± 1 M word of the caller.
Here the compiler uses the PC-relative branch instruction.
B _func
far keyword The compiler is told by you that the call is not within ± 1 M word.
MVKL _func, al
MVKH _func, al
B _func
By default, the compiler generates small-memory model code, which means that every function call is
handled as if it were declared near, unless it is actually declared far.
For more information on function calls, see Section 7.1.6.
Example 6-3 illustrates using the restrict keyword when passing arrays to a function. Here, the arrays c
and d should not overlap, nor should c and d point to the same array.
However, in this example, *ctrl is a loop-invariant expression, so the loop is optimized down to a single-
memory read. To get the desired result, define ctrl as:
volatile unsigned int *ctrl;
Here the *ctrl pointer is intended to reference a hardware location, such as an interrupt flag.
Volatile must also be used when accessing memory locations that represent memory-mapped peripheral
devices. Such memory locations might change value in ways that the compiler cannot predict. These
locations might change if accessed, or when some other memory location is accessed, or when some
signal occurs.
Volatile must also be used for local variables in a function which calls setjmp, if the value of the local
variables needs to remain valid if a longjmp occurs.
#include <stdlib.h>
jmp_buf context;
void function()
{
volatile int x = 3;
switch(setjmp(context))
{
case 0: setup(); break;
default:
{
printf("x == %d\n", x); /* We can only reach here if longjmp
has occured; because x's lifetime
begins before the setjmp and lasts
through the longjmp, the C standard
requires x be declared "volatile" */
break;
}
}
}
The compiler copies the argument string directly into your output file. The assembler text must be
enclosed in double quotes. All the usual character string escape codes retain their definitions. For
example, you can insert a .byte directive that contains quotes as follows:
asm("STR: .byte \"abc\"");
The inserted code must be a legal assembly language statement. Like all assembly language statements,
the line of code inside the quotes must begin with a label, a blank, a tab, or a comment (asterisk or
semicolon). The compiler performs no checking on the string; if there is an error, the assembler detects it.
For more information about the assembly language statements, see the TMS320C6000 Assembly
Language Tools User's Guide.
The asm statements do not follow the syntactic restrictions of normal C/C++ statements. Each can appear
as a statement or a declaration, even outside of blocks. This is useful for inserting directives at the very
beginning of a compiled module.
Use the alternate statement __asm("assembler text") if you are writing code for strict ANSI/ISO C mode
(using the --strict_ansi option).
The rulespec parameter is a comma-separated list of specifiers. See Section 6.3 for details.
The RESET_MISRA pragma can be used to reset any CHECK_MISRA pragmas; see Section 6.9.25.
The RETAIN pragma has the opposite effect of the CLINK pragma. See Section 6.9.26 for more details.
The CODE_SECTION pragma is useful if you have code objects that you want to link into an area
separate from the .text section.
The following examples demonstrate the use of the CODE_SECTION pragma.
int fn(int x)
{
return x;
}
.sect "my_sect"
.global _fn
;******************************************************************************
;* FUNCTION NAME: _fn *
;* *
;* Regs Modified : SP *
;* Regs Used : A4,B3,SP *
;* Local Frame Size : 0 Args + 4 Auto + 0 Save = 4 byte *
;******************************************************************************
_fn:
;** --------------------------------------------------------------------------*
RET .S2 B3 ; |6|
SUB .D2 SP,8,SP ; |4|
STW .D2T1 A4,*+SP(4) ; |4|
ADD .S2 8,SP,SP ; |6|
NOP 2
; BRANCH OCCURS ; |6|
C6200 The C6200 devices contain four memory banks (0, 1, 2, and 3); constant can be 0 or 2.
C6400 The C6400 devices contain 8 memory banks; constant can be 0, 2, 4, or 6.
C6400+ The C6400+ devices contain 8 memory banks; constant can be 0, 2, 4, or 6.
C6600 The C6600 devices contain 8 memory banks; constant can be 0, 2, 4, or 6.
C6700 The C6700 devices contain 8 memory banks; constant can be 0, 2, 4, or 6.
C6740 The C6740 devices contain 8 memory banks; constant can be 0, 2, 4, or 6.
Both global and local variables can be aligned with the DATA_MEM_BANK pragma. The
DATA_MEM_BANK pragma must reside inside the function that contains the local variable being aligned.
The symbol can also be used as a parameter in the DATA_SECTION pragma.
When optimization is enabled, the tools may or may not use the stack to store the values of local
variables.
The DATA_MEM_BANK pragma allows you to align data on any data memory bank that can hold data of
the type size of the symbol. This is useful if you need to align data in a particular way to avoid memory
bank conflicts in your hand-coded assembly code versus padding with zeros and having to account for the
padding in your code.
This pragma increases the amount of space used in data memory by a small amount as padding is used
to align data onto the correct bank.
For C6200, the code in Example 6-7 guarantees that array x begins at an address ending in 4 or c (in
hexadecimal), and that array y begins at an address ending in 4 or c. The alignment for array y affects its
stack placement. Array z is placed in the .z_sect section, and begins at an address ending in 0 or 8.
void main()
{
#pragma DATA_MEM_BANK (y, 2);
short y[100];
...
}
The DATA_SECTION pragma is useful if you have data objects that you want to link into an area separate
from the .bss section. If you allocate a global variable using a DATA_SECTION pragma and you want to
reference the variable in C code, you must declare the variable as extern far.
Example 6-8 through Example 6-10 demonstrate the use of the DATA_SECTION pragma.
char bufferA[512];
#pragma DATA_SECTION("my_sect")
char bufferB[512];
.global _bufferA
.bss _bufferA,512,4
.global _bufferB
The diagnostic affected (num) is specified using either an error number or an error tag name. The equal
sign (=) is optional. Any diagnostic can be overridden to be an error, but only diagnostics with a severity of
discretionary error or below can have their severity reduced to a warning or below, or be suppressed. The
diag_default pragma is used to return the severity of a diagnostic to the one that was in effect before any
pragmas were issued (i.e., the normal severity of the message as modified by any command-line options).
The diagnostic identifier number is output along with the message when the -pden command line option is
specified.
#pragma FUNC_ALWAYS_INLINE;
#pragma FUNC_CANNOT_INLINE;
#pragma FUNC_EXT_CALLED;
Except for _c_int00, which is the name reserved for the system reset interrupt for C/C++programs, the
name of the interrupt (the func argument) does not need to conform to a naming convention.
When you use program-level optimization, you may need to use the FUNC_EXT_CALLED pragma with
certain options. See Section 3.7.2.
#pragma FUNC_IS_PURE;
#pragma FUNC_IS_SYSTEM;
#pragma FUNC_NEVER_RETURNS;
#pragma FUNC_NO_GLOBAL_ASG;
#pragma FUNC_NO_IND_ASG;
#pragma INTERRUPT ;
The code for the function will return via the IRP (interrupt return pointer).
Except for _c_int00, which is the name reserved for the system reset interrupt for C programs, the name
of the interrupt (the func argument) does not need to conform to a naming convention.
#pragma LOCATION(address );
int x;
#pragma location=address
int x;
The noinit pragma may be used in conjunction with the location pragma to map variables to special
memory locations; see Section 6.9.22.
The arguments min and max are programmer-guaranteed minimum and maximum trip counts. The trip
count is the number of times a loop iterates. The trip count of the loop must be evenly divisible by multiple.
All arguments are optional. For example, if the trip count could be 5 or greater, you can specify the
argument list as follows:
#pragma MUST_ITERATE(5);
However, if the trip count could be any nonzero multiple of 5, the pragma would look like this:
#pragma MUST_ITERATE(5, , 5); /* Note the blank field for max */
It is sometimes necessary for you to provide min and multiple in order for the compiler to perform
unrolling. This is especially the case when the compiler cannot easily determine how many iterations the
loop will perform (that is, the loop has a complex exit condition).
When specifying a multiple via the MUST_ITERATE pragma, results of the program are undefined if the
trip count is not evenly divisible by multiple. Also, results of the program are undefined if the trip count is
less than the minimum or greater than the maximum specified.
If no min is specified, zero is used. If no max is specified, the largest possible number is used. If multiple
MUST_ITERATE pragmas are specified for the same loop, the smallest max and largest min are used.
In this example, the compiler attempts to generate a software pipelined loop even without the pragma.
However, if MUST_ITERATE is not specified for a loop such as this, the compiler generates code to
bypass the loop, to account for the possibility of 0 iterations. With the pragma specification, the compiler
knows that the loop iterates at least once and can eliminate the loop-bypassing code.
MUST_ITERATE can specify a range for the trip count as well as a factor of the trip count. For example:
pragma MUST_ITERATE(8, 48, 8);
This example tells the compiler that the loop executes between 8 and 48 times and that the trip_count
variable is a multiple of 8 (8, 16, 24, 32, 40, 48). The multiple argument allows the compiler to unroll the
loop.
You should also consider using MUST_ITERATE for loops with complicated bounds. In the following
example:
for(i2 = ipos[2]; i2 < 40; i2 += 5) { ...
The compiler would have to generate a divide function call to determine, at run time, the exact number of
iterations performed. The compiler will not do this. In this case, using MUST_ITERATE to specify that the
loop always executes eight times allows the compiler to attempt to generate a software pipelined loop:
#pragma MUST_ITERATE(8, 8);
#pragma NMI_INTERRUPT;
The code generated for the function will return via the NRP versus the IRP as for a function declared with
the interrupt keyword or INTERRUPT pragma.
Except for _c_int00, which is the name reserved for the system reset interrupt for C programs, the name
of the interrupt (function) does not need to conform to a naming convention.
#pragma NOINIT (x );
int x;
#pragma PERSISTENT (x );
int x=10;
#pragma NOINIT;
int x;
#pragma PERSISTENT;
int x=10;
int x __attribute__((noinit));
int x =0 __attribute__((persistent));
#pragma NO_HOOKS;
Where min and max are the minimum and maximum trip counts of the loop in the common case. The trip
count is the number of times a loop iterates. Both arguments are optional.
For example, PROB_ITERATE could be applied to a loop that executes for eight iterations in the majority
of cases (but sometimes may execute more or less than eight iterations):
#pragma PROB_ITERATE(8, 8);
If only the minimum expected trip count is known (say it is 5), the pragma would look like this:
#pragma PROB_ITERATE(5);
If only the maximum expected trip count is known (say it is 10), the pragma would look like this:
#pragma PROB_ITERATE(, 10); /* Note the blank field for min */
The rulespec parameter is a comma-separated list of specifiers. See Section 6.3 for details.
The CLINK pragma has the opposite effect of the RETAIN pragma. See Section 6.9.2 for more details.
In Example 6-11 x and y are put in the section mydata. To reset the current section to the default used by
the compiler, a blank paramater should be passed to the pragma. An easy way to think of the pragma is
that it is like applying the CODE_SECTION or DATA_SECTION pragma to all symbols below it.
#pragma SET_DATA_SECTION("mydata")
int x;
int y;
#pragma SET_DATA_SECTION()
The pragmas apply to both declarations and definitions. If applied to a declaration and not the definition,
the pragma that is active at the declaration is used to set the section for that symbol. Here is an example:
#pragma SET_CODE_SECTION("func1")
extern void func1();
#pragma SET_CODE_SECTION()
...
void func1() { ... }
In Example 6-12 func1 is placed in section func1. If conflicting sections are specified at the declaration
and definition, a diagnostic is issued.
The current CODE_SECTION and DATA_SECTION pragmas and GCC attributes can be used to override
the SET_CODE_SECTION and SET_DATA_SECTION pragmas. For example:
In Example 6-13 x is placed in x_data and y is placed in mydata. No diagnostic is issued for this case.
The pragmas work for both C and C++. In C++, the pragmas are ignored for templates and for implictly
created objects, such as implicit constructors and virtual function tables.
This pragma guarantees that the alignment of the named type or the base type of the named typedef is at
least equal to that of the expression. (The alignment may be greater as required by the compiler.) The
alignment must be a power of 2. The type must be a type or a typedef name. If a type, it must be either a
structure tag or a union tag. If a typedef, its base type must be either a structure tag or a union tag.
Since ANSI/ISO C declares that a typedef is simply an alias for a type (i.e. a struct) this pragma can be
applied to the struct, the typedef of the struct, or any typedef derived from them, and affects all aliases of
the base type.
This example aligns any st_tag structure variables on a page boundary:
typedef struct st_tag
{
int a;
short b;
} st_typedef;
Any use of STRUCT_ALIGN with a basic type (int, short, float) or a variable results in an error.
#pragma UNROLL( n );
If possible, the compiler unrolls the loop so there are n copies of the original loop. The compiler only
unrolls if it can determine that unrolling by a factor of n is safe. In order to increase the chances the loop is
unrolled, the compiler needs to know certain properties:
• The loop iterates a multiple of n times. This information can be specified to the compiler via the
multiple argument in the MUST_ITERATE pragma.
• The smallest possible number of iterations of the loop
• The largest possible number of iterations of the loop
The compiler can sometimes obtain this information itself by analyzing the code. However, sometimes the
compiler can be overly conservative in its assumptions and therefore generates more code than is
necessary when unrolling. This can also lead to not unrolling at all.
Furthermore, if the mechanism that determines when the loop should exit is complex, the compiler may
not be able to determine these properties of the loop. In these cases, you must tell the compiler the
properties of the loop by using the MUST_ITERATE pragma.
172 TMS320C6000C/C++ Language Implementation SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com The _Pragma Operator
Specifying #pragma UNROLL(1); asks that the loop not be unrolled. Automatic loop unrolling also is not
performed in this case.
If multiple UNROLL pragmas are specified for the same loop, it is undefined which pragma is used, if any.
The argument string_literal is interpreted in the same way the tokens following a #pragma directive are
processed. The string_literal must be enclosed in quotes. A quotation mark that is part of the string_literal
must be preceded by a backward slash.
You can use the _Pragma operator to express #pragma directives in macros. For example, the
DATA_SECTION syntax:
#pragma DATA_SECTION( func ," section ");
Is represented by the _Pragma() operator syntax:
_Pragma ("DATA_SECTION( func ,\" section \")")
The following code illustrates using _Pragma to specify the DATA_SECTION pragma in a macro:
...
COLLECT_DATA(x)
int x;
...
The EMIT_PRAGMA macro is needed to properly expand the quotes that are required to surround the
section argument to the DATA_SECTION pragma.
6.11.2 EABI
EABI requires the ELF object file format which enables supporting modern language features like early
template instantiation and export inline functions support.
TI-specific information on EABI mode is described in Section 7.8.2.
To generate object files compatible with EABI, you must use C6000 compiler version 7.2 or greater; see
Section 2.16. The __TI_EABI__ predefined symbol is defined and set to 1 if compiling for EABI and is not
defined otherwise.
The linkname of foo is _foo__Fi, indicating that foo is a function that takes a single argument of type int.
To aid inspection and debugging, a name demangling utility is provided that demangles names into those
found in the original C++ source. See Chapter 9 for more information.
For EABI, the mangling algorithm follows that described in the Itanium C++ ABI
(http://www.codesourcery.com/cxx-abi/abi.html).
int foo(int i) { } would be mangled "_Z3fooi"
.bss: {} = 0x00;
...
}
Because the linker writes a complete load image of the zeroed .bss section into the output COFF file, this
method can have the unwanted effect of significantly increasing the size of the output file (but not the
program).
If you burn your application into ROM, you should explicitly initialize variables that require initialization.
The preceding method initializes .bss to 0 only at load time, not at system reset or power up. To make
these variables 0 at run time, explicitly define them in your code.
For more information about linker command files and the SECTIONS directive, see the linker description
information in the TMS320C6000 Assembly Language Tools User's Guide.
6.13.2 Initializing Static and Global Variables With the const Type Qualifier
Static and global variables of type const without explicit initializations are similar to other static and global
variables because they might not be preinitialized to 0 (for the same reasons discussed in Section 6.13).
For example:
const int zero; /* may not be initialized to 0 */
However, the initialization of const global and static variables is different because these variables are
declared and initialized in a section called .const. For example:
const int zero = 0 /* guaranteed to be 0 */
This feature is particularly useful for declaring a large table of constants, because neither time nor space
is wasted at system startup to initialize the table. Additionally, the linker can be used to place the .const
section in ROM.
You can use the DATA_SECTION pragma to put the variable in a section other than .const. For example,
the following C code:
#pragma DATA_SECTION (var, ".mysect");
const int zero=0;
• External declarations with no type or storage class (only an identifier) are illegal in ANSI/ISO but legal
in K&R:
a; /* illegal unless --kr_compatible used */
• ANSI/ISO interprets file scope definitions that have no initializers as tentative definitions. In a single
module, multiple definitions of this form are fused together into a single definition. Under K&R, each
definition is treated as a separate definition, resulting in multiple definitions of the same object and
usually an error. For example:
int a;
int a; /* illegal if --kr_compatible used, OK if not */
Under ANSI/ISO, the result of these two definitions is a single definition for the object a. For most K&R
compilers, this sequence is illegal, because int a is defined twice.
• ANSI/ISO prohibits, but K&R allows objects with external linkage to be redeclared as static:
extern int a;
static int a; /* illegal unless --kr_compatible used */
• Unrecognized escape sequences in string and character constants are explicitly illegal under ANSI/ISO
but ignored under K&R:
char c = '\q'; /* same as 'q' if --kr_compatible used, error if not */
• ANSI/ISO specifies that bit fields must be of type int or unsigned. With --kr_compatible, bit fields can
be legally defined with any integral type. For example:
struct s
{
short f : 2; /* illegal unless --kr_compatible used */
};
• K&R syntax allows a trailing comma in enumerator lists:
enum { a, b, c, }; /* illegal unless --kr_compatible used */
• K&R syntax allows trailing tokens on preprocessor directives:
#endif NAME /* illegal unless --kr_compatible used */
6.14.2 Enabling Strict ANSI/ISO Mode and Relaxed ANSI/ISO Mode (--strict_ansi and --
relaxed_ansi Options)
Use the --strict_ansi option when you want to compile under strict ANSI/ISO mode. In this mode, error
messages are provided when non-ANSI/ISO features are used, and language extensions that could
invalidate a strictly conforming program are disabled. Examples of such extensions are the inline and asm
keywords.
Use the --relaxed_ansi option when you want the compiler to ignore strict ANSI/ISO violations rather than
emit a warning (as occurs in normal ANSI/ISO mode) or an error message (as occurs in strict ANSI/ISO
mode). In relaxed ANSI/ISO mode, the compiler accepts extensions to the ANSI/ISO C standard, even
when they conflict with ANSI/ISO C. The GCC language extensions described in Section 6.15 are
available in relaxed ANSI/ISO mode.
6.15.1 Extensions
Most of the GCC language extensions are available in the TI compiler when compiling in relaxed ANSI
mode (--relaxed_ansi) or if the --gcc option is used.
The extensions that the TI compiler supports are listed in Table 6-4, which is based on the list of
extensions found at the GNU web site. The shaded rows describe extensions that are not supported.
(1)
Feature defined for GCC 3.0; definition and examples at http://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcc/C-Extensions.html
However, the members of a packed struct are byte-aligned. Thus the following does not have any bytes of
padding between or after members and totals 6 bytes:
struct __attribute__((__packed__)) packed_struct { char c1; int i; char c2; };
Subsequently, packed structures in an array are packed together without trailing padding between array
elements.
Bit fields of a packed structure are bit-aligned. The byte alignment of adjacent struct members that are not
bit fields does not change. However, there are no bits of padding between adjacent bit fields.
The packed attribute can only be applied to the original definition of a structure or union type. It cannot be
applied with a typedef to a non-packed structure that has already been defined, nor can it be applied to
the declaration of a struct or union object. Therefore, any given structure or union type can only be packed
or non-packed, and all objects of that type will inherit its packed or non-packed attribute.
The packed attribute is not applied recursively to structure types that are contained within a packed
structure. Thus, in the following example the member s retains the same internal layout as in the first
example above. There is no padding between c and s, so s falls on an unaligned boundary:
struct __attribute__((__packed__)) outer_packed_struct { char c; struct unpacked_struct s; };
It is illegal to implicitly or explicitly cast the address of a packed struct member as a pointer to any non-
packed type except an unsigned char. In the following example, p1, p2, and the call to foo are all illegal.
void foo(int *param);
struct packed_struct ps;
However, it is legal to explicitly cast the address of a packed struct member as a pointer to an unsigned
char:
unsigned char *pc = (unsigned char *)&ps.i;
The TI compiler also supports an unpacked attribute for an enumeration type to allow you to indicate that
the representation is to be an integer type that is no smaller than int; in other words, it is not packed.
Run-Time Environment
This chapter describes the TMS320C6000 C/C++ run-time environment. To ensure successful execution
of C/C++ programs, it is critical that all run-time code maintain this environment. It is also important to
follow the guidelines in this chapter if you write assembly language functions that interface with C/C++
code.
7.1.1 Sections
The compiler produces relocatable blocks of code and data called sections. The sections are allocated
into memory in a variety of ways to conform to a variety of system configurations. For more information
about sections and allocating them, see the introductory object file information in the TMS320C6000
Assembly Language Tools User's Guide.
There are two basic types of sections:
• Initialized sections contain data or executable code. The C/C++ compiler creates the following
initialized sections:
– The .args section contains the command argument for a host-based loader. This section is read-
only. See the --arg_size option for details.
– For EABI only, the .binit section contains boot time copy tables. For details on BINIT, see the
TMS320C6000 Assembly Language Tools User's Guide for linker command file information.
– For COFF ABI only, the .cinit section contains tables for initializing variables and constants.
– The .pinit section for COFF ABI, or the .init_array section for EABI, contains the table for calling
global constructor tables.
– For EABI only, the .c6xabi.exidx section contains the index table for exception handling. The
.c6xabi.extab section contains un-winded instructions for exception handling. These sections are
read-only. See the --exceptions option for details.
– The .name.load section contains the compressed image of section name. This section is read-
only. See the TMS320C6000 Assembly Language Tools User's Guide for information on copy
tables.
– The .ppinfo section contains correlation tables and the .ppdata section contains data tables for
compiler-based profiling. See the --gen_profile_info option for details.
– The .const section contains string literals, floating-point constants, and data defined with the
C/C++ qualifiers far and const (provided the constant is not also defined as volatile).
– For EABI only, the .fardata section reserves space for non-const, initialized far global and static
variables.
– For EABI only, the .neardata section reserves space for non-const, initialized near global and
static variables.
– For EABI only, the .rodata section reserves space for const near global and static variables.
– The .switch section contains jump tables for large switch statements.
– The .text section contains all the executable code.
• Uninitialized sections reserve space in memory (usually RAM). A program can use this space at run
time to create and store variables. The compiler creates the following uninitialized sections:
– For COFF ABI only, the .bss section reserves space for global and static variables. When you
specify the --rom_model linker option, at program startup, the C boot routine copies data out of the
.cinit section (which can be in ROM) and stores it in the .bss section. The compiler defines the
global symbol $bss and assigns $bss the value of the starting address of the .bss section.
– For EABI only, the .bss section reserves space for uninitialized global and static variables.
– The .far section reserves space for global and static variables that are declared far.
– The .stack section reserves memory for the system stack.
– The .sysmem section reserves space for dynamic memory allocation. The reserved space is used
by dynamic memory allocation routines, such as malloc, calloc, realloc, or new. If a C/C++ program
does not use these functions, the compiler does not create the .sysmem section.
The assembler creates the default sections .text, .bss, and .data. The C/C++ compiler, however, does not
use the .data section. You can instruct the compiler to create additional sections by using the
CODE_SECTION and DATA_SECTION pragmas (see Section 6.9.3 and Section 6.9.6).
The run-time stack grows from the high addresses to the low addresses. The compiler uses the B15
register to manage this stack. B15 is the stack pointer (SP), which points to the next unused location on
the stack.
The linker sets the stack size, creates a global symbol, __TI_STACK_SIZE, and assigns it a value equal
to the stack size in bytes. The default stack size is 1K bytes. You can change the stack size at link time by
using the --stack_size option with the linker command. For more information on the --stack_size option,
see the linker description chapter in the TMS320C6000 Assembly Language Tools User's Guide.
At system initialization, SP is set to the first 8-byte aligned address before the end (highest numerical
address) of the .stack section. For C6600, SP is set to the first 16-byte aligned address. Since the position
of the stack depends on where the .stack section is allocated, the actual address of the stack is
determined at link time.
The C/C++ environment automatically decrements SP at the entry to a function to reserve all the space
necessary for the execution of that function. The stack pointer is incremented at the exit of the function to
restore the stack to the state before the function was entered. If you interface assembly language routines
to C/C++ programs, be sure to restore the stack pointer to the same state it was in before the function
was entered.
For more information about the stack and stack pointer, see Section 7.4.
Stack Overflow
NOTE: The compiler provides no means to check for stack overflow during compilation or at run
time. A stack overflow disrupts the run-time environment, causing your program to fail. Be
sure to allow enough space for the stack to grow. You can use the --entry_hook option to
add code to the beginning of each function to check for stack overflow; see Section 2.17.
The --mem_model:data options do not affect the access to objects explicitly declared with the near of far
keyword.
By default, all run-time-support data is defined as far.
For more information on near and far accesses to data, see Section 6.5.4.
Consts that are declared far, either explicitly through the far keyword or implicitly using --
mem_model:const are always placed in the .const section.
S S S S S S S S S S S S S S S S S S S S S S S S S S I I I I I I
31 7 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U
31 7 0
S S S S S S S S S S S S S S S S S I I I I I I I I I I I I I I I
31 15 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U U U U U U U U U
31 15 0
LEGEND: S = sign, I = signed integer, U = unsigned integer, MS = most significant, LS = least significant
7.2.1.2 enum, int, and long (EABI) Data Types (signed and unsigned)
The int, unsigned int, and enum data types are stored in memory as 32-bit objects (see Figure 7-2).
Objects of these types are loaded to and stored from bits 0-31 of a register. In big-endian mode, 4-byte
objects are loaded to registers by moving the first byte (that is, the lower address) of memory to bits 24-31
of the register, moving the second byte of memory to bits 16-23, moving the third byte to bits 8-15, and
moving the fourth byte to bits 0-7. In little-endian mode, 4-byte objects are loaded to registers by moving
the first byte (that is, the lower address) of memory to bits 0-7 of the register, moving the second byte to
bits 8-15, moving the third byte to bits 16-23, and moving the fourth byte to bits 24-31.
7.2.1.4 __int40_t and COFF ABI long Data Types (signed and unsigned)
Long and unsigned long data types are stored in an odd/even pair of registers (see Figure 7-4) and are
always referenced as a pair in the format of odd register:even register (for example, A1:A0). In little-endian
mode, the lower address is loaded into the even register and the higher address is loaded into the odd
register; if data is loaded from location 0, then the byte at 0 is the lowest byte of the even register. In big-
endian mode, the higher address is loaded into the even register and the lower address is loaded into the
odd register; if data is loaded from location 0, then the byte at 0 is the highest byte of the odd register but
is ignored.
Figure 7-4. 40-Bit Data Storage Format Signed __int40_t or 40-bit long
Odd register
MS
X X X X X X X X X X X X X X X X X X X X X X X X X S I I I I I I
31 8 7 6 0
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
31 0
LEGEND: S = sign, M = mantissa, E = exponent, MS = most significant, LS = least significant
The parameter d is the offset to be added to the beginning of the class object for this pointer. The
parameter I is the index into the virtual function table, offset by 1. The index enables the NULL pointer to
be represented. Its value is -1 if the function is nonvirtual. The parameter f is the pointer to the member
function if it is nonvirtual, when I is 0. The 0 is the offset to the virtual function pointer within the class
object.
Top-level arrays are aligned on an 8-byte boundary for C6400, C6400+, C6740, and C6600, and either a
4-byte (for all element types of 32 bits or smaller) or an 8-byte boundary for C6200, C6700, or C6700+.
Top-level arrays are aligned on a 16-byte boundary for C6600. Elements of arrays are stored in the same
manner as if they were individual objects.
A0 represents the least significant bit of the field A; A1 represents the next least significant bit, etc. Again,
storage of bit fields in memory is done with a byte-by-byte, rather than bit-by-bit, transfer.
Big-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
A A A A A A A B B B B B B B B B B C C C D D E E E E E E E E E X
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2 1 0 1 0 8 7 6 5 4 3 2 1 0 X
Little-endian register
MS LS
X E E E E E E E E E D D C C C B B B B B B B B B B A A A A A A A
X 8 7 6 5 4 3 2 1 0 1 0 2 1 0 9 8 7 6 5 4 3 2 1 0 6 5 4 3 2 1 0
31 0
Little-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
B A A A A A A A B B B B B B B B E E D D C C C B X E E E E E E E
0 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 1 0 1 0 2 1 0 9 X 8 7 6 5 4 3 2
LEGEND: X = not used, MS = most significant, LS = least significant
The register conventions dictate how the compiler uses registers and how values are preserved across
function calls. Table 7-2 summarizes how the compiler uses the TMS320C6000 registers.
The registers in Table 7-2 are available to the compiler for allocation to register variables and temporary
expression results. If the compiler cannot allocate a register of a required type, spilling occurs. Spilling is
the process of moving a register's contents to memory to free the register for another purpose.
Objects of type double, long, long long, or long double are allocated into an odd/even register pair and are
always referenced as a register pair (for example, A1:A0). The odd register contains the sign bit, the
exponent, and the most significant part of the mantissa. The even register contains the least significant
part of the mantissa. The A4 register is used with A5 for passing the first argument if the first argument is
a double, long, long long, or long double. The same is true for B4 and B5 for the second parameter, and
so on. For more information about argument-passing registers and return registers, see Section 7.4.
All other control registers are not saved or restored by the compiler.
The compiler assumes that control registers not listed in Table 7-2 that can have an effect on compiled
code have default values. For example, the compiler assumes all circular addressing-enabled registers
are set for linear addressing (the AMR is used to enable circular addressing). Enabling circular addressing
and then calling a C/C++ function without restoring the AMR to a default setting violates the calling
convention. You must be certain that control registers which affect compiler-generated code have a default
value when calling a C/C++ function from assembly.
Assembly language programmers must be aware that the linker assumes B15 contains the stack pointer.
The linker needs to save and restore values on the stack in trampoline code that it generates. If you do
not use B15 as the stack pointer in assembly code, you should use the linker option that disables
trampolines, --trampolines=off. Otherwise, trampolines could corrupt memory and overwrite register
values.
A4 A4 B4 A6
int func2( int a, float b, int c) struct A d, float e, int f, int g);
A4 A4 B4 A6 B6 A8 B8 A10
int func3( int a, double b, float c) long double d);
A4 A4 B5:B4 A6 B7:B6
/*NOTE: The following function has a variable number of arguments. */
int vararg(int a, int b, int c, int d);
A4 A4 B4 A6 stack
struct A func4( int y);
A3 A4
__x128_t func5( __x128_t a);
A7:A6:A5:A4 A7:A6:A5:A4
void func6(int a, int b, __x128_t c);
A4 B4 A11:A10:A9:A8
void func7(int a, int b, __x128_t c, int d, int e, int f, __x128_t g, int h);
You must be careful to declare functions properly that accept structure arguments, both at the point
where they are called (so that the structure argument is passed as an address) and at the point where
they are declared (so the function knows to copy the structure to a local copy).
5. The called function executes the code for the function.
6. If the called function returns any integer, pointer, or float type, the return value is placed in the A4
register. If the function returns a double, long double, long, or long long type, the value is placed in the
A5:A4 register pair. For C6600 if the function returns a __x128_t, the value is placed in A7:A6:A5:A4.
If the function returns a structure, the caller allocates space for the structure and passes the address of
the return space to the called function in A3. To return a structure, the called function copies the
structure to the memory block pointed to by the extra argument.
In this way, the caller can be smart about telling the called function where to return the structure. For
example, in the statement s = f(x), where s is a structure and f is a function that returns a structure, the
caller can actually make the call as f(&s, x). The function f then copies the return structure directly into
s, performing the assignment automatically.
If the caller does not use the return structure value, an address value of 0 can be passed as the first
argument. This directs the called function not to copy the return structure.
You must be careful to declare functions properly that return structures, both at the point where they
are called (so that the extra argument is passed) and at the point where they are declared (so the
function knows to copy the result).
7. Any register numbered A10 to A15 or B10 to B15 that was saved in is restored.
8. If A15 was used as a frame pointer (FP), the old value of A15 is restored from the stack. The space
allocated for the function in is reclaimed at the end of the function by adding a constant to register B15
(SP).
9. The function returns by jumping to the value of the return register (B3) or the saved value of the return
register.
convention. You must be certain that control registers that affect compiler-generated code have a
default value when calling a C/C++ function from assembly.
• Assembly language programmers must be aware that the linker assumes B15 contains the stack
pointer. The linker needs to save and restore values on the stack in trampoline code that it generates.
If you do not use B15 as the stack pointer in your assembly code, you should use the linker option that
disables trampolines, --trampolines=off. Otherwise, trampolines could corrupt memory and overwrite
register values.
• Assembly code that utilizes B14 and/or B15 for localized purposes other than the data-page pointer
and stack pointer may violate the calling convention. The assembly programmer needs to protect these
areas of non-standard use of B14 and B15 by turning off interrupts around this code. Because interrupt
handling routines need the stack (and thus assume the stack pointer is in B15) interrupts need to be
turned off around this code. Furthermore, because interrupt service routines may access global data
and may call other functions which access global data, this special treatment also applies to B14. After
the data-page pointer and stack pointer have been restored, interrupts may be turned back on.
Example 7-1 illustrates a C++ function called main, which calls an assembly language function called
asmfunc, Example 7-2. The asmfunc function takes its single argument, adds it to the C++ global variable
called gvar, and returns the result.
extern "C" {
extern int asmfunc(int a); /* declare external asm function */
int gvar = 0; /* define global variable */
}
void main()
{
int I = 5;
.global _asmfunc
.global _gvar
_asmfunc:
LDW *+b14(_gvar),A3
NOP 4
ADD a3,a4,a3
STW a3,*b14(_gvar)
MV a3,a4
B b3
NOP 5
In the C++ program in Example 7-1, the extern declaration of asmfunc is optional because the return type
is int. Like C/C++ functions, you need to declare assembly functions only if they return noninteger values
or pass noninteger parameters.
NOTE: SP Semantics
The stack pointer must always be 8-byte aligned. For C6600 the stack pointer must always
be 16-byte aligned. This is automatically performed by the C compiler and system
initialization code in the run-time-support libraries. Any hand assembly code that has
interrupts enabled or calls a function defined in C or linear assembly source should also
reserve a multiple of 8 bytes (multiple of 16 bytes for C6600) on the stack.
Because you are referencing only the symbol's value as stored in the symbol table, the symbol's declared
type is unimportant. In Example 7-5, int is used. You can reference linker-defined symbols in a similar
manner.
Table 7-4 provides a summary of the C6000 intrinsics clarifying which devices support which intrinsics.
The intrinsics listed in Table 7-4 can be used on all C6000 devices. They correspond to the indicated
C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference
Guide for more information.
See Table 7-5 for the listing of C6400-specific intrinsics, which are also compatible with C6400+, C6740,
and C6600. See Table 7-6 for the listing of C6400+-specific intrinsics, which are also compatible with
C6740 and C6600 devices. See Table 7-7 for the listing of C6700-specific intrinsics. See Table 7-8 for a
listing of C6600-specifiic intrinsics.
The intrinsics listed in Table 7-5 can be used for C6400, C6400+, C6740, and C6600 devices. The
intrinsics shown correspond to the indicated C6000 assembly language instruction(s). See the
TMS320C6000 CPU and Instruction Set Reference Guide for more information.
See Table 7-4 for the listing of generic C6000 intrinsics. See Table 7-6 for the listing of C6400+-, C6740-,
and C6600-specific intrinsics. See Table 7-7 for the listing of C6700-specific intrinsics. See Table 7-8 for
the listing of C6600-specific intrinsics.
Table 7-5. TMS320C6400, C6400+, C6740, and C6600 C/C++ Compiler Intrinsics
Assembly
C/C++ Compiler Intrinsic Instruction Description
int _abs2 (int src); ABS2 Calculates the absolute value for each 16-bit value
int _add4 (int src1, int src2); ADD4 Performs 2s-complement addition to pairs of packed 8-bit numbers
long long & _amem8 (void *ptr); LDDW Allows aligned loads and stores of 8 bytes to memory. The pointer
STDW must be aligned to an eight-byte boundary.
const long long & _amem8_const (const void *ptr); LDDW Allows aligned loads of 8 bytes from memory. The pointer must be
aligned to an eight-byte boundary. (1)
__float2_t & _amem8_f2(void * ptr); LDDW Allows aligned loads and stores of 8 bytes to memory. The pointer
STDW must be aligned to an eight-byte boundary. You must include
c6x.h. (1) (2)
const __float2_t & _amem8_f2_const(void * ptr); LDDW Allows aligned loads of 8 bytes from memory. The pointer must be
aligned to an eight-byte boundary. You must include c6x.h. (1) (2)
double & _amemd8 (void *ptr); LDDW Allows aligned loads and stores of 8 bytes to memory. The pointer
STDW must be aligned to an eight-byte boundary. (1) (2)
For C6400 _amemd8 corresponds to different assembly instructions
than when used with other C6000 devices; see Table 7-4.
const double & _amemd8_const (const void *ptr); LDDW Allows aligned loads of 8 bytes from memory. The pointer must be
aligned to an eight-byte boundary. (1) (2)
int _avg2 (int src1, int src2); AVG2 Calculates the average for each pair of signed 16-bit values
unsigned _avgu4 (unsigned, unsigned); AVGU4 Calculates the average for each pair of signed 8-bit values
unsigned _bitc4 (unsigned src); BITC4 For each of the 8-bit quantities in src, the number of 1 bits is written
to the corresponding position in the return value
unsigned _bitr (unsigned src); BITR Reverses the order of the bits
int _cmpeq2 (int src1, int src2); CMPEQ2 Performs equality comparisons on each pair of 16-bit values.
Equality results are packed into the two least-significant bits of the
return value.
int _cmpeq4 (int src1, int src2); CMPEQ4 Performs equality comparisons on each pair of 8-bit values. Equality
results are packed into the four least-significant bits of the return
value.
int _cmpgt2 (int src1, int src2); CMPGT2 Compares each pair of signed 16-bit values. Results are packed
into the two least-significant bits of the return value.
unsigned _cmpgtu4 (unsigned src1, unsigned src2); CMPGTU4 Compares each pair of 8-bit values. Results are packed into the four
least-significant bits of the return value.
int _cmplt2 (int src1, int src2); CMPLT2 Swaps operands and calls _cmpgt2.
unsigned _cmpltu4 (unsigned src1, unsigned src2); CMPLTU4 Swaps operands and calls _cmpgtu4.
unsigned _deal (unsigned src ); DEAL The odd and even bits of src are extracted into two separate 16-bit
values.
int _dotp2 (int src1, int src2); DOTP2 The product of the signed lower 16-bit values of src1 and src2 is
__int40_t _ldotp2 (int src1, int src2); DOTP2 added to the product of the signed upper 16-bit values of src1 and
src2.The _lo and _hi intrinsics are needed to access each half of
the 64-bit integer result.
int _dotpn2 (int src1, int src2); DOTPN2 The product of the signed lower 16-bit values of src1 and src2 is
subtracted from the product of the signed upper 16-bit values of
src1 and src2.
int _dotpnrsu2 (int src1, unsigned src2); DOTPNRSU2 The product of the lower 16-bit values of src1 and src2 is subtracted
from the product of the upper 16-bit values of src1 and src2. The
values in src1 are treated as signed packed quantities; the values in
src2 are treated as unsigned packed quantities. 2^15 is added and
the result is sign shifted right by 16.
int _dotpnrus2 (unsigned src1, int src2); DOTPNRUS2 Swaps the operands and calls _dotpnrsu2.
int _dotprsu2 (int src1, unsigned src2); DOTPRSU2 The product of the lower 16-bit values of src1 and src2 is added to
the product of the upper 16-bit values of src1 and src2. The values
in src1 are treated as signed packed quantities; the values in src2
are treated as unsigned packed quantities. 2^15 is added and the
result is sign shifted by 16.
(1)
See Section 7.5.9 for details on manipulating 8-byte data quantities.
(2)
See the TMS320C6000 Programmer's Guide for more information.
Table 7-5. TMS320C6400, C6400+, C6740, and C6600 C/C++ Compiler Intrinsics (continued)
Assembly
C/C++ Compiler Intrinsic Instruction Description
int _dotpsu4 (int src1, unsigned src2); DOTPSU4 For each pair of 8-bit values in src1 and src2, the 8-bit value from
int _dotpus4 (unsigned src1, int src2); DOTPUS4 src1 is multiplied with the 8-bit value from src2. The four products
unsigned _dotpu4 (unsigned src1, unsigned src2); DOTPU4 are summed together.
int _gmpy4 (int src1, int src2); GMPY4 Performs the Galois Field multiply on four values in src1 with four
parallel values in src2. The four products are packed into the return
value.
int _max2 (int src1, int src2); MAX2 Places the larger/smaller of each pair of values in the corresponding
int _min2 (int src1, int src2); MIN2 position in the return value. Values can be 16-bit signed or 8-bit
unsigned _maxu4 (unsigned src1, unsigned src2); MAX4 unsigned.
unsigned _minu4 (unsigned src1, unsigned src2); MINU4
ushort & _mem2 (void * ptr); LDB/LDB Allows unaligned loads and stores of 2 bytes to memory (2)
STB/STB
const ushort & _mem2_const (const void * ptr); LDB/LDB Allows unaligned loads of 2 bytes to memory (2)
unsigned & _mem4 (void * ptr); LDNW Allows unaligned loads and stores of 4 bytes to memory (2)
STNW
const unsigned & _mem4_const (const void * ptr); LDNW Allows unaligned loads of 4 bytes from memory (2)
long long & _mem8 (void * ptr); LDNDW Allows unaligned loads and stores of 8 bytes to memory (2)
STNDW
const long long & _mem8_const (const void * ptr); LDNDW Allows unaligned loads of 8 bytes from memory (3)
double & _memd8 (void * ptr); LDNDW Allows unaligned loads and stores of 8 bytes to memory (4) (3)
STNDW
const double & _memd8_const (const void * ptr); LDNDW Allows unaligned loads of 8 bytes from memory (4) (3)
long long _mpy2ll (int src1, int src2); MPY2 Returns the products of the lower and higher 16-bit values in src1
and src2
long long _mpyhill (int src1, int src2); MPYHI Produces a 16 by 32 multiply. The result is placed into the lower 48
long long _mpylill (int src1, int src2); MPYLI bits of the return type. Can use the upper or lower 16 bits of src1.
long long _mpyihll (int src1, int src2); MPYIH Swaps operands and calls _mpyhill.
long long _mpyilll (int src1, int src2); MPYIL Swaps operands and calls _mpylill.
int _mpyhir (int src1, int src2); MPYHIR Produces a signed 16 by 32 multiply. The result is shifted right by
int _mpylir (int src1, int src2); MPYLIR 15 bits. Can use the upper or lower 16 bits of src1.
int _mpyihr (int src1, int src2); MPYIHR Swaps operands and calls _mpyhir.
int _mpyilr (int src1, int src2); MPYILR Swaps operands and calls _mpylir.
long long _mpysu4ll (int src1, unsigned src2); MPYSU4 For each 8-bit quantity in src1 and src2, performs an 8-bit by 8-bit
long long _mpyus4ll (unsigned src1, int src2); MPYUS4 multiply. The four 16-bit results are packed into a 64-bit result. The
long long _mpyu4ll (unsigned src1, unsigned src2); MPYU4 results can be signed or unsigned.
int _mvd (int src2 ); MVD Moves the data from src2 to the return value over four cycles using
the multiplier pipeline
unsigned _pack2 (unsigned src1, unsigned src2); PACK2 The lower/upper halfwords of src1 and src2 are placed in the return
unsigned _packh2 (unsigned src1, unsigned src2); PACKH2 value.
unsigned _packh4 (unsigned src1, unsigned src2); PACKH4 Packs alternate bytes into return value. Can pack high or low bytes.
unsigned _packl4 (unsigned src1, unsigned src2); PACKL4
unsigned _packhl2 (unsigned src1, unsigned src2); PACKHL2 The upper/lower halfword of src1 is placed in the upper halfword the
unsigned _packlh2 (unsigned src1, unsigned src2); PACKLH2 return value. The lower/upper halfword of src2 is placed in the lower
halfword the return value.
unsigned _rotl (unsigned src1, unsigned src2); ROTL Rotates src2 to the left by the amount in src1
int _sadd2 (int src1, int src2); SADD2 Performs saturated addition between pairs of 16-bit values in src1
int _saddus2 (unsigned src1, int src2); SADDUS2 and src2. Values for src1 can be signed or unsigned.
int _saddsu2 (int src1, unsigned src2); SADDSU2
unsigned _saddu4 (unsigned src1, unsigned src2); SADDU4 Performs saturated addition between pairs of 8-bit unsigned values
in src1 and src2.
unsigned _shfl (unsigned src2); SHFL The lower 16 bits of src2 are placed in the even bit positions, and
the upper 16 bits of src are placed in the odd bit positions.
unsigned _shlmb (unsigned src1, unsigned src2); SHLMB Shifts src2 left/right by one byte, and the most/least significant byte
unsigned _shrmb (unsigned src1, unsigned src2); SHRMB of src1 is merged into the least/most significant byte position.
(3)
See the TMS320C6000 Programmer's Guide for more information.
(4)
See Section 7.5.9 for details on manipulating 8-byte data quantities.
Table 7-5. TMS320C6400, C6400+, C6740, and C6600 C/C++ Compiler Intrinsics (continued)
Assembly
C/C++ Compiler Intrinsic Instruction Description
int _shr2 (int src1, unsigned src2); SHR2 For each 16-bit quantity in src2, the quantity is arithmetically or
unsigned shru2 (unsigned src1, unsigned src2); SHRU2 logically shifted right by src1 number of bits. src2 can contain signed
or unsigned values
long long _smpy2ll (int src1, int src2); SMPY2 Performs 16-bit multiplication between pairs of signed packed 16-bit
values, with an additional 1 bit left-shift and saturate into a 64-bit
result.
int _spack2 (int src1, int src2); SPACK2 Two signed 32-bit values are saturated to 16-bit values and packed
into the return value
unsigned _spacku4 (int src1 , int src2); SPACKU4 Four signed 16-bit values are saturated to 8-bit values and packed
into the return value
int _sshvl (int src2, int src1); SSHVL Shifts src2 to the left/right src1 bits. Saturates the result if the
int _sshvr (int src2, int src1); SSHVR shifted value is greater than MAX_INT or less than MIN_INT.
int _sub4 (int src1, int src2); SUB4 Performs 2s-complement subtraction between pairs of packed 8-bit
values
int _subabs4 (int src1, int src2); SUBABS4 Calculates the absolute value of the differences for each pair of
packed 8-bit values
unsigned _swap4 (unsigned src); SWAP4 Exchanges pairs of bytes (an endian swap) within each 16-bit value
unsigned _swap2 (unsigned src); SWAP2 Calls _packlh2.
unsigned _unpkhu4 (unsigned src); UNPKHU4 Unpacks the two high unsigned 8-bit values into unsigned packed
16-bit values
unsigned _unpklu4 (unsigned src); UNPKLU4 Unpacks the two low unsigned 8-bit values into unsigned packed
16-bit values
unsigned _xpnd2 (unsigned src); XPND2 Bits 1 and 0 of src are replicated to the upper and lower halfwords
of the result, respectively.
unsigned _xpnd4 (unsigned src); XPND4 Bits 3 and 0 of src are replicated to bytes 3 through 0 of the result.
The intrinsics listed in Table 7-6 are included only for C6400+, C6740, and C6600 devices. The intrinsics
shown correspond to the indicated C6000 assembly language instruction(s). See the TMS320C6000 CPU
and Instruction Set Reference Guide for more information.
See Table 7-4 for the listing of generic C6000 intrinsics. See Table 7-5 for the general listing of intrinsics
for C6400 devices, which includes C6400, C6400+, C6740 and C6600. See Table 7-7 for the listing of
C6700-specific intrinsics. See Table 7-8 for a listing of additional intrinsics only for C6600.
Table 7-6. TMS320C6400+, C6740, and C6600 C/C++ Compiler Intrinsics (continued)
Assembly
C/C++ Compiler Intrinsic Instruction Description
unsigned _gmpy (unsigned src1, unsigned src2); GMPY Performs the Galois Field multiply.
long long _mpy2ir (int src1, int src2); MPY2IR Performs two 16 by 32 multiplies. Both results are shifted right by
15 bits to produce a rounded result.
int _mpy32 (int src1, int src2); MPY32 Returns the 32 LSBs of a 32 by 32 multiply.
long long _mpy32ll (int src1, int src2); MPY32 Returns all 64 bits of a 32 by 32 multiply. Values can be signed or
long long _mpy32su (int src1, int src2); MPY32SU unsigned.
long long _mpy32us (unsigned src1, int src2); MPY32US
long long _mpy32u (unsigned src1, unsigned src2); MPY32U
int _rpack2 (int src1, int src2); RPACK2 Shifts src1 and src2 left by 1 with saturation. The 16 MSBs of the
shifted src1 is placed in the 16 MSBs of the long long. The 16
MSBs of the shifted src2 is placed in the 16 LSBs of the long
long.
long long _saddsub (unsigned src1, unsigned src2); SADDSUB Performs a saturated addition and a saturated subtraction in
parallel.
long long _saddsub2 (unsigned src1, unsigned src2); SADDSUB2 Performs a SADD2 and a SSUB2 in parallel.
long long _shfl3 (unsigned src1, unsigned src2); SHFL3 Takes two 16-bit values from src1 and 16 LSBs from src2 to
perform a 3-way interleave, creating a 48-bit result.
int _smpy32 (int src1, int src2); SMPY32 Returns the 32 MSBs of a 32 by 32 multiply shifted left by 1.
int _ssub2 (unsigned src1, unsigned src2); SSUB2 Subtracts the upper and lower halves of src2 from the upper and
lower halves of src1 and saturates each result.
unsigned _xormpy (unsigned src1, unsigned src2); XORMPY Performs a Galois Field multiply
The intrinsics listed in Table 7-7 can be used for C6700, C6700+, C6740, and C6600 devices. The
intrinsics shown correspond to the indicated C6000 assembly language instruction(s). See the
TMS320C6000 CPU and Instruction Set Reference Guide for more information.
See Table 7-4 for the listing of generic C6000 intrinsics. See Table 7-5 for the listing of C6400-specific
intrinsics, which are also compatible with C6400+, C6740 and C6600. See Table 7-6 for the listing of
C6400+-specific intrinsics, which are also compatible with C6740 and C6600 devices. See Table 7-8 for
the listing of C6600-specific intrinsics.
Table 7-7. TMS320C6700, C6700+, C6740, and C6600 C/C++ Compiler Intrinsics
Assembly
C/C++ Compiler Intrinsic Instruction Description
int _dpint (double src); DPINT Converts 64-bit double to 32-bit signed integer, using the rounding
mode set by the CSR register
__int40_t _f2tol(__float2_t src); Reinterprets a __float2_t register pair src as an __int40_t (stored as a
register pair). You must include c6x.h.
__float2_t _f2toll(__float2_t src); Reinterprets a __float2_t register pair as a long long register pair. You
must include c6x.h.
double _fabs (double src); ABSDP Returns absolute value of src
float _fabsf (float src); ABSSP
__float2_t _lltof2(long long src); Reinterprets a long long register pair as a __float2_t register pair. You
must include c6x.h.
__float2_t _ltof2(__int40_t src); Reinterprets an __int40_t register pair as a __float2_t register pair. You
must include c6x.h.
__float2_t & _mem8_f2(void * ptr); LDNDW Allows unaligned loads and stores of 8 bytes to memory (1)
STNDW
const __float2_t & _mem8_f2_const(void * ptr); LDNDW Allows unaligned loads of 8 bytes from memory (1)
STNDW
long long _mpyidll (int src1, int src2); MPYID Produces a signed integer multiply. The result is placed in a register
pair.
(1)
See the TMS320C6000 Programmer's Guide for more information.
Table 7-7. TMS320C6700, C6700+, C6740, and C6600 C/C++ Compiler Intrinsics (continued)
Assembly
C/C++ Compiler Intrinsic Instruction Description
double_mpysp2dp (float src1, float src2); MPYSP2DP (C6700+, C6740, and C6600 only) Produces a double-precision
floating-point multiply. The result is placed in a register pair.
double_mpyspdp (float src1, double src2); MPYSPDP (C6700+, C6740, and C6600 only) Produces a double-precision
floating-point multiply. The result is placed in a register pair.
double _rcpdp (double src); RCPDP Computes the approximate 64-bit double reciprocal
float _rcpsp (float src); RCPSP Computes the approximate 32-bit float reciprocal
double _rsqrdp (double src); RSQRDP Computes the approximate 64-bit double square root reciprocal
float _rsqrsp (float src); RSQRSP Computes the approximate 32-bit float square root reciprocal
int _spint (float); SPINT Converts 32-bit float to 32-bit signed integer, using the rounding mode
set by the CSR register
The intrinsics listed in Table 7-8 are included only for C6600 devices. These intrinsics are in addition to
those listed in Table 7-5 and Table 7-6. The intrinsics shown correspond to the indicated C6000 assembly
language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference Guide for more
information.
See Table 7-4 for the listing of generic C6000 intrinsics. See Table 7-5 for the listing of C6400-specific
intrinsics, which are also compatible with C6400+, C6740 and C6600. See Table 7-6 for the listing of
C6400+-specific intrinsics, which are also compatible with C6740 and C6600 devices. See Table 7-7 for
the listing of C6700-specific intrinsics.
#include <c6x.h>
#include <stdio.h>
__x128_t mpy_four_way_example(__x128_t s, int a, int b, int c, int d)
{
__x128_t t = _ito128(a, b, c, d); // Pack values into a __x128_t
__x128_t results = _qmpy32(s, t); // Perform a four-way SIMD multiply
return results;
}
The _disable_interrupts() and _enable_interrupts( ) intrinsics both return an unsigned int that can be
subsequently passed to _restore_interrupts( ) to restore the previous interrupt state. These intrinsics
provide a barrier to optimization and are therefore appropriate for implementing a critical (or atomic)
section. For example,
unsigned int restore_value;
restore_value = _disable_interrupts();
if (sem) sem--;
_restore_interrupts(restore_value);
The example code disables interrupts so that the value of sem read for the conditional clause does not
change before the modification of sem in the then clause. The intrinsics are barriers to optimization, so the
memory reads and writes of sem do not cross the _disable_interrupts or _restore_interrupts locations.
Overwrites CSR
NOTE: The _restore_interrupts( ) intrinsic overwrites the CSR control register with the value in the
argument. Any CSR bits changed since the _disable_interrupts( ) intrinsic or
_enable_interrupts( ) intrinsic will be lost.
On C6400+, C6740, and C6600, the _restore_interrupts( ) intrinsic does not use the RINT instruction.
7.5.10 Using MUST_ITERATE and _nassert to Enable SIMD and Expand Compiler
Knowledge of Loops
Through the use of MUST_ITERATE and _nassert, you can guarantee that a loop executes a certain
number of times.
This example tells the compiler that the loop is guaranteed to run exactly 10 times:
#pragma MUST_ITERATE(10,10);
for (I = 0; I < trip_count; I++) { ...
MUST_ITERATE can also be used to specify a range for the trip count as well as a factor of the trip count.
For example:
#pragma MUST_ITERATE(8,48,8);
for (I = 0; I < trip; I++) { ...
This example tells the compiler that the loop executes between 8 and 48 times and that the trip variable is
a multiple of 8 (8, 16, 24, 32, 40, 48). The compiler can now use all this information to generate the best
loop possible by unrolling better even when the --interrupt_thresholdn option is used to specify that
interrupts do occur every n cycles.
The TMS320C6000 Programmer's Guide states that one of the ways to refine C/C++ code is to use word
accesses to operate on 16-bit data stored in the high and low parts of a 32-bit register. Examples using
casts to int pointers are shown with the use of intrinsics to use certain instructions like _mpyh. This can be
automated by using the _nassert(); intrinsic to specify that 16-bit short arrays are aligned on a 32-bit
(word) boundary.
The following examples generate the same assembly code:
• Example 1
int dot_product(short *x, short *y, short z)
{
int *w_x = (int *)x;
int *w_y = (int *)y;
int sum1 = 0, sum2 = 0, I;
for (I = 0; I < z/2; I++)
{
sum1 += _mpy(w_x[i], w_y[i]);
sum2 += _mpyh(w_x[i], w_y[i]);
}
return (sum1 + sum2);
}
• Example 2
int dot_product(short *x, short *y, short z)
{
int sum = 0, I;
The following subsections describe methods you can use to ensure the data referenced by ptr is aligned.
You have to employ one of these methods at every place in your code where f() is called.
When compiling for C6600 devices, such an array is automatically aligned to a 16-byte boundary. When
compiling for C6400, C6400+, C6740, and C6600 devices, such an array is automatically aligned to an 8-
byte boundary. When compiling for C6200 or C6700, such an array is automatically aligned to 4-byte
boundary, or, if the base type requires it, an 8-byte boundary. This is true whether the array is global,
static, or local. This automatic alignment is all that is required to achieve SIMD optimization on those
respective devices. You still need to include the _nassert because, in the general case, the compiler
cannot guarantee that ptr holds the address of a properly aligned array.
If you always pass the base address of an array to pointers like ptr, then you can use the following macro
to reflect that fact.
#if defined(_TMS320C6600)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 16 == 0)
#elif defined(_TMS320C6400)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 8 == 0)
#elif defined(_TMS320C6200) || defined(_TMS320C6700)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 4 == 0)
#else
#define ALIGNED_ARRAY(ptr) /* empty */
#endif
The macro works regardless of which C6000 device you build for, or if you port the code to another target.
This code passes an unaligned address to ptr, thus violating the presumption coded in the _nassert().
There is no direct remedy for this case. Avoid this practice whenever possible.
To get a stricter alignment, use the function memalign with the desired alignment. To get an alignment of
256 bytes for example:
buffer = memalign(256, 100 * sizeof(short);
If you are using BIOS memory allocation routines, be sure to pass the alignment factor as the last
argument using the syntax that follows:
See the TMS320C6000 DSP/BIOS Help for more information about BIOS memory allocation routines and
the segid parameter in particular.
struct s
{
...
short buf1[50];
...
} g;
...
f(g.buf1);
class c
{
public :
short buf1[50];
void mfunc(void);
...
};
void c::mfunc()
{
f(buf1);
...
}
To align an array in a structure, place it inside a union with a dummy object that has the desired
alignment. If you want 8 byte alignment, use a "long long" dummy field. For example:
struct s
{
union u
{ long long dummy; /* 8-byte alignment */
short buffer[50]; /* also 8-byte alignment */
} u;
...
};
If you want to declare several arrays contiguously, and maintain a given alignment, you can do so by
keeping the array size, measured in bytes, an even multiple of the desired alignment. For example:
struct s
{
long long dummy; /* 8-byte alignment */
short buffer[50]; /* also 8-byte alignment */
short buf2[50]; /* 4-byte alignment */
...
};
Because the size of buf1 is 50 * 2-bytes per short = 100 bytes, and 100 is an even multiple of 4, not 8,
buf2 is only aligned on a 4-byte boundary. Padding buf1 out to 52 elements makes buf2 8-byte aligned.
Within a structure or class, there is no way to enforce an array alignment greater than 8. For the purposes
of SIMD optimization, this is not necessary.
If a C/C++ interrupt routine does not call any other functions, only those registers that the interrupt handler
attempts to define are saved and restored. However, if a C/C++ interrupt routine does call other functions,
these functions can modify unknown registers that the interrupt handler does not use. For this reason, the
routine saves all usable registers if any other functions are called. Interrupts branch to the interrupt return
pointer (IRP). Do not call interrupt handling functions directly.
Interrupts can be handled directly with C/C++ functions by using the interrupt pragma or the interrupt
keyword. For more information, see Section 6.9.18 and Section 6.5.3, respectively.
You are responsible for handling the AMR control register and the SAT bit in the CSR correctly inside an
interrupt. By default, the compiler does not do anything extra to save/restore the AMR and the SAT bit.
Macros for handling the SAT bit and the AMR register are included in the c6x.h header file.
For example, you are using circular addressing in some hand assembly code (that is, the AMR does not
equal 0). This hand assembly code can be interrupted into a C code interrupt service routine. The C code
interrupt service routine assumes that the AMR is set to 0. You need to define a local unsigned int
temporary variable and call the SAVE_AMR and RESTORE_AMR macros at the beginning and end of
your C interrupt service routine to correctly save/restore the AMR inside the C interrupt service routine.
#include <c6x.h>
/* restore the AMR for you hand assembly code before exiting */
RESTORE_AMR(temp_amr);
}
If you need to save/restore the SAT bit (i.e. you were performing saturated arithmetic when interrupted
into the C interrupt service routine which may also perform some saturated arithmetic) in your C interrupt
service routine, it can be done in a similar way as the above example using the SAVE_SAT and
RESTORE_SAT macros.
For C6400+, C6740, and C6600, the compiler saves and restores the ILC and RILC control registers if
needed.
For floating point architectures, you are responsible for handling the floating point control registers
FADCR, FAUCR and FMCR. If you are reading bits out of the floating pointer control registers, and if the
interrupt service routine (or any called function) performs floating point operations, then the relevant
floating point control registers should be saved and restored. No macros are provided for these registers,
as simple assignment to and from an unsigned int temporary will suffice.
Initializing Variables
NOTE: In ANSI/ISO C, global and static variables that are not explicitly initialized must be set to 0
before program execution. The COFF ABI C/C++ compiler does not perform any
preinitialization of uninitialized variables. Explicitly initialize any variable that must have an
initial value of 0.
Global variables are either autoinitialized at run time or at load time; see Section 7.8.1.1 and
Section 7.8.1.2. Also see Section 6.13. In EABI mode, the compiler automatically zero initializes the
uninitialized variables. See Section 7.8.2 for details.
cint Initialization
.cinit
Loader tables
section
(EXT_MEM)
Boot
routine
.bss
section
(D_MEM)
.cinit Loader
.bss
Regardless of the use of the --rom_model or --ram_model options, the .pinit section is always loaded and
processed at run time.
The compiler allocates the variables 'i' and 'a[] to .data section and the initial values are placed directly.
.global i
.data
.align 4
i:
.field 23,32 ; i @ 0
.global a
.data
.align 4
a:
.field 1,32 ; a[0] @ 0
.field 2,32 ; a[1] @ 32
.field 3,32 ; a[2] @ 64
Each compiled module that defines static or global variables contains these .data sections. The linker
treats the .data section like any other initialized section and creates an output section. In the load-time
initialization model, the sections are loaded into memory and used by the program. See Section 7.8.2.5.
In the run-time initialization model, the linker uses the data in these sections to create initialization data
and an additional initialization table. The boot routine processes the initialization table to copy data from
load addresses to run addresses. See Section 7.8.2.3.
C auto init
table and data C auto init
(ROM) Loader table and data
(.cinit section) (ROM)
Boot
routine
.data
uninitialized
(RAM)
_TI_CINIT_Base:
32-bit load address 32-bit run address
The linker defined symbols __TI_CINIT_Base and __TI_CINIT_Limit point to the start and end of the
table, respectively. Each entry in this table corresponds to one output section that needs to be initialized.
The initialization data for each output section could be encoded using different encoding.
The load address in the C auto initialization record points to initialization data with the following format:
The first 8-bits of the initialization data is the handler index. It indexes into a handler table to get the
address of a handler function that knows how to decode the following data.
The handler table is a list of 32-bit function pointers.
_TI_Handler_Table_Base:
32-bit handler 1 address
The encoded data that follows the 8-bit index can be in one of the following format types. For clarity the 8-
bit index is also depicted for each format.
8-bit index 24-bit padding 32-bit length (N) N byte initialization data (not compressed)
The compiler uses 24-bit padding to align the length field to a 32-bit boundary. The 32-bit length field
encodes the length of the initialization data in bytes (N). N byte initialization data is not compressed and is
copied to the run address as is.
The run-time support library has a function __TI_zero_init() to process this type of initialization data. The
first argument to this function is the address pointing to the byte after the 8-bit index. The second
argument is the run address from the C auto initialization record.
The compiler uses 24-bit padding to align the length field to a 32-bit boundary. The 32-bit length field
encodes the number of bytes to be zero initialized.
The run-time support library has a function __TI_zero_init() to process the zero initialization. The first
argument to this function is the address pointing to the byte after the 8-bit index. The second argument is
the run address from the C auto initialization record.
The data following the 8-bit index is compressed using Run Length Encoded (RLE) format. uses a simple
run length encoding that can be decompressed using the following algorithm:
1. Read the first byte, Delimiter (D).
2. Read the next byte (B).
3. If B != D, copy B to the output buffer and go to step 2.
4. Read the next byte (L).
(a) If L == 0, then length is either a 16-bit, a 24-bit value, or we’ve reached the end of the data, read
next byte (L).
(i) If L == 0, length is a 24-bit value or the end of the data is reached, read next byte (L).
(i) If L == 0, the end of the data is reached, go to step 7.
(ii) Else L <<= 16, read next two bytes into lower 16 bits of L to complete 24-bit value for L.
(ii) Else L <<= 8, read next byte into lower 8 bits of L to complete 16-bit value for L.
(b) Else if L > 0 and L < 4, copy D to the output buffer L times. Go to step 2.
(c) Else, length is 8-bit value (L).
5. Read the next byte (C); C is the repeat character.
6. Write C to the output buffer L times; go to step 2.
7. End of processing.
The run-time support library has a routine __TI_decompress_rle24() to decompress data compressed
using RLE. The first argument to this function is the address pointing to the byte after the 8-bit index. The
second argument is the run address from the C auto initialization record.
The data following the 8-bit index is compressed using LZSS compression. The run-time support library
has the routine __TI_decompress_lzss() to decompress the data compressed using LZSS. The first
argument to this function is the address pointing to the byte after the 8-bit index. The second argument is
the run address from the C auto initialization record.
void auto_initialize()
{
unsigned char **table_ptr;
unsigned char **table_limit;
/*--------------------------------------------------------------*/
/* Check if Handler table has entries. */
/*--------------------------------------------------------------*/
if (&__TI_Handler_Table_Base >= &__TI_Handler_Table_Limit)
return;
/*---------------------------------------------------------------*/
/* Get the Start and End of the CINIT Table. */
/*---------------------------------------------------------------*/
table_ptr = (unsigned char **)&__TI_CINIT_Base;
table_limit = (unsigned char **)&__TI_CINIT_Limit;
while (table_ptr < table_limit)
{
/*-------------------------------------------------------------*/
/* 1. Get the Load and Run address. */
/* 2. Read the 8-bit index from the load address. */
/* 3. Get the handler function pointer using the index from */
/* handler table. */
/*-------------------------------------------------------------*/
unsigned char *load_addr = *table_ptr++;
unsigned char *run_addr = *table_ptr++;
unsigned char handler_idx = *load_addr++;
handler_fptr handler =
(handler_fptr)(&HANDLER_TABLE)[handler_idx];
/*-------------------------------------------------------------*/
/* 4. Call the handler and pass the pointer to the load data */
/* after index and the run address. */
/*-------------------------------------------------------------*/
(*handler)((const unsigned char *)load_addr, run_addr);
}
}
.data
Loader
section
.data section
(initialized)
(RAM)
Address of constructor 1
Address of constructor 2
Address of constructor n
__TI_INITARRAY_Limit:
Initialization record n
int x;
short i = 23;
int *p =
int a[5] = {1,2,3,4,5};
.global _x
.bss _x,4,4
.sect ".cinit:c"
.align 8
.field (CIR - $) - 8, 32
.field _I+0,32
.field 23,16 ; _I @ 0
.sect ".text"
.global _I
_I: .usect ".bss:c",2,2
.sect ".cinit:c"
.align 4
.field _x,32 ; _p @ 0
.sect ".text"
.global _p
_p: .usect ".bss:c",4,4
.sect ".cinit"
.align 8
.field IR_1,32
.field _a+0,32
.field 1,32 ; _a[0] @ 0
.field 2,32 ; _a[1] @ 32
.field 3,32 ; _a[2] @ 64
.field 4,32 ; _a[3] @ 96
.field 5,32 ; _a[4] @ 128
IR_1: .set 20
.sect ".text"
.global _a
.bss _a,20,4
;**********************************************************************
;* MARK THE END OF THE SCALAR INIT RECORD IN CINIT:C *
;**********************************************************************
The .cinit section must contain only initialization tables in this format. When interfacing assembly language
modules, do not use the .cinit section for any other purpose.
The table in the .pinit section simply consists of a list of addresses of constructors to be called (see
Figure 7-17). The constructors appear in the table after the .cinit initialization.
Address of constructor 1
Address of constructor 2
Address of constructor 3
•
•
Address of constructor n
When you use the --rom_model or --ram_model option, the linker combines the .cinit sections from all the
C/C++ modules and appends a null word to the end of the composite .cinit section. This terminating record
appears as a record with a size field of 0 and marks the end of the initialization tables.
Likewise, the --rom_model or --ram_model link option causes the linker to combine all of the .pinit sections
from all C/C++ modules and append a null word to the end of the composite .pinit section. The boot
routine knows the end of the global constructor table when it encounters a null constructor address.
The const-qualified variables are initialized differently; see Section 6.5.1.
Some of the features of C/C++ (such as I/O, dynamic memory allocation, string operations, and
trigonometric functions) are provided as an ANSI/ISO C/C++ standard library, rather than as part of the
compiler itself. The TI implementation of this library is the run-time-support library (RTS). The C/C++
compiler implements the complete ISO standard library except for those facilities that handle signal and
locale issues (properties that depend on local language, nationality, or culture). Using the ANSI/ISO
standard library ensures a consistent set of functions that provide for greater portability.
In addition to the ANSI/ISO-specified functions, the run-time-support library includes routines that give you
processor-specific commands and direct C language I/O requests. These are detailed inSection 8.1 and
Section 8.2.
A library-build utility is provided with the code generation tools that lets you create customized run-time-
support libraries. This process is described in Section 8.5 .
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 247
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
C and C++ Run-Time Support Libraries www.ti.com
248 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com C and C++ Run-Time Support Libraries
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 249
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
The C I/O Functions www.ti.com
rtstrg[endian][abi][eh].lib
trg The device family of the C6000 architecture that the library was built for. This can be one
of the following: 6200, 6400, 64plus, 6600, 6700, 6740, 67plus.
endian Indicates endianness:
(blank) Little-endian library
e Big-endian library
abi Indicates the application binary interface (ABI) used:
(blank) COFF ABI
_elf EABI
eh Indicates whether the library has exception handling support
(blank) exception handling not supported
_eh exception handling support
For information on the C6700 FastMath source library, fastmathc67x.src, see Section 8.4.
250 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com The C I/O Functions
void main()
{
FILE *fid;
fid = fopen("myfile","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
Issuing the following compiler command compiles, links, and creates the file main.out from the run-time-
support library:
cl6x main.c -z --heap_size=1000 --output_file=main.out
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 251
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
open — Open File for I/O www.ti.com
Description The open function opens the file specified by path and prepares it for I/O.
• The path is the filename of the file to be opened, including an optional directory path
and an optional device specifier (see Section 8.2.5).
• The flags are attributes that specify how the file is manipulated. The flags are
specified using the following symbols:
O_RDONLY (0x0000) /* open for reading */
O_WRONLY (0x0001) /* open for writing */
O_RDWR (0x0002) /* open for read & write */
O_APPEND (0x0008) /* append on each write */
O_CREAT (0x0200) /* open with file create */
O_TRUNC (0x0400) /* open with truncation */
O_BINARY (0x8000) /* open in binary mode */
Low-level I/O routines allow or disallow some operations depending on the flags used
when the file was opened. Some flags may not be meaningful for some devices,
depending on how the device implements files.
• The file_descriptor is assigned by open to an opened file.
The next available file descriptor is assigned to each new file opened.
252 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com close — Close File for I/O
Description The close function closes the file associated with file_descriptor.
The file_descriptor is the number assigned by open to an opened file.
Description The read function reads count characters into the buffer from the file associated with
file_descriptor.
• The file_descriptor is the number assigned by open to an opened file.
• The buffer is where the read characters are placed.
• The count is the number of characters to read from the file.
Description The write function writes the number of characters specified by count from the buffer to
the file associated with file_descriptor.
• The file_descriptor is the number assigned by open to an opened file.
• The buffer is where the characters to be written are located.
• The count is the number of characters to write to the file.
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 253
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
lseek — Set File Position Indicator www.ti.com
Description The lseek function sets the file position indicator for the given file to a location relative to
the specified origin. The file position indicator measures the position in characters from
the beginning of the file.
• The file_descriptor is the number assigned by open to an opened file.
• The offset indicates the relative offset from the origin in characters.
• The origin is used to indicate which of the base locations the offset is measured from.
The origin must be one of the following macros:
SEEK_SET (0x0000) Beginning of file
SEEK_CUR (0x0001) Current value of the file position indicator
SEEK_END (0x0002) End of file
Return Value The return value is one of the following:
# new value of the file position indicator if successful
(off_t)-1 on failure
Description The unlink function deletes the file specified by path. Depending on the device, a deleted
file may still remain until all file descriptors which have been opened for that file have
been closed. See Section 8.2.3.
The path is the filename of the file, including path information and optional device prefix.
(See Section 8.2.5.)
254 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com rename — Rename File
NOTE: The optional device specified in the new name must match the device of
the old name. If they do not match, a file copy would be required to
perform the rename, and rename is not capable of this action.
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 255
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
DEV_open — Open File for I/O www.ti.com
Syntax int DEV_open (const char * path , unsigned flags , int llv_fd );
Description This function finds a file matching path and opens it for I/O as requested by flags.
• The path is the filename of the file to be opened. If the name of a file passed to open
has a device prefix, the device prefix will be stripped by open, so DEV_open will not
see it. (See Section 8.2.5 for details on the device prefix.)
• The flags are attributes that specify how the file is manipulated. The flags are
specified using the following symbols:
O_RDONLY (0x0000) /* open for reading */
O_WRONLY (0x0001) /* open for writing */
O_RDWR (0x0002) /* open for read & write */
O_APPEND (0x0008) /* append on each write */
O_CREAT (0x0200) /* open with file create */
O_TRUNC (0x0400) /* open with truncation */
O_BINARY (0x8000) /* open in binary mode */
See POSIX for further explanation of the flags.
• The llv_fd is treated as a suggested low-level file descriptor. This is a historical
artifact; newly-defined device drivers should ignore this argument. This differs from
the low-level I/O open function.
This function must arrange for information to be saved for each file descriptor, typically
including a file position indicator and any significant flags. For the HOST version, all the
bookkeeping is handled by the debugger running on the host machine. If the device uses
an internal buffer, the buffer can be created when a file is opened, or the buffer can be
created during a read or write.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
opened; such as the file does not exist, could not be created, or there are too many files
open. The value of errno may optionally be set to indicate the exact error (the HOST
device does not set errno). Some devices might have special failure conditions; for
instance, if a device is read-only, a file cannot be opened O_WRONLY.
On success, this function must return a non-negative file descriptor unique among all
open files handled by the specific device. It need not be unique across devices. Only the
low-level I/O functions will see this device file descriptor; the low-level function open will
assign its own unique file descriptor.
256 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com DEV_close — Close File for I/O
Return Value This function should return -1 to indicate an error if the file descriptor is invalid in some
way, such as being out of range or already closed, but this is not required. The user
should not call close() with an invalid file descriptor.
Description The read function reads count bytes from the input file associated with dev_fd.
• The dev_fd is the number assigned by open to an opened file.
• The buf is where the read characters are placed.
• The count is the number of characters to read from the file.
Return Value This function must return -1 to indicate an error if for some reason no bytes could be
read from the file. This could be because of an attempt to read from a O_WRONLY file,
or for device-specific reasons.
If count is 0, no bytes are read and this function returns 0.
This function returns the number of bytes read, from 0 to count. 0 indicates that EOF
was reached before any bytes were read. It is not an error to read less than count bytes;
this is common if the are not enough bytes left in the file or the request was larger than
an internal device buffer size.
Syntax int DEV_write (int dev_fd , const char * buf , unsigned count );
Return Value This function must return -1 to indicate an error if for some reason no bytes could be
written to the file. This could be because of an attempt to read from a O_RDONLY file,
or for device-specific reasons.
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 257
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
DEV_lseek — Set File Position Indicator www.ti.com
Description This function sets the file's position indicator for this file descriptor as lseek.
If lseek is supported, it should not allow a seek to before the beginning of the file, but it
should support seeking past the end of the file. Such seeks do not change the size of
the file, but if it is followed by a write, the file size will increase.
Return Value If successful, this function returns the new value of the file position indicator.
This function must return -1 to indicate an error if for some reason no bytes could be
written to the file. For many devices, the lseek operation is nonsensical (e.g. a computer
monitor).
Description Remove the association of the pathname with the file. This means that the file may no
longer by opened using this name, but the file may not actually be immediately removed.
Depending on the device, the file may be immediately removed, but for a device which
allows open file descriptors to point to unlinked files, the file will not actually be deleted
until the last file descriptor is closed. See Section 8.2.3.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
unlinked (delayed removal does not count as a failure to unlink.)
If successful, this function returns 0.
Description This function changes the name associated with the file.
• The old_name is the current name of the file.
• The new_name is the new name for the file.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
renamed, such as the file doesn't exist, or the new name already exists.
258 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com DEV_rename — Rename File
#include <stdio.h>
#include <file.h>
#include "mydevice.h"
void main()
{
add_device("mydevice", _MSA,
MYDEVICE_open, MYDEVICE_close,
MYDEVICE_read, MYDEVICE_write,
MYDEVICE_lseek, MYDEVICE_unlink, MYDEVICE_rename);
/*-----------------------------------------------------------------------*/
/* Re-open stderr as a MYDEVICE file */
/*-----------------------------------------------------------------------*/
if (!freopen("mydevice:stderrfile", "w", stderr))
{
puts("Failed to freopen stderr");
exit(EXIT_FAILURE);
}
/*-----------------------------------------------------------------------*/
/* stderr should not be fully buffered; we want errors to be seen as */
/* soon as possible. Normally stderr is line-buffered, but this example */
/* doesn't buffer stderr at all. This means that there will be one call */
/* to write() for each character in the message. */
/*-----------------------------------------------------------------------*/
if (setvbuf(stderr, NULL, _IONBF, 0))
{
puts("Failed to setvbuf stderr");
exit(EXIT_FAILURE);
}
/*-----------------------------------------------------------------------*/
/* Try it out! */
/*-----------------------------------------------------------------------*/
printf("This goes to stdout\n");
fprintf(stderr, "This goes to stderr\n"); }
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 259
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
add_device — Add Device to Device Table www.ti.com
Use the low-level function add_device() to add your device to the device_table. The device table is a
statically defined array that supports n devices, where n is defined by the macro _NDEVICE found in
stdio.h/cstdio.
The first entry in the device table is predefined to be the host device on which the debugger is running.
The low-level routine add_device() finds the first empty position in the device table and initializes the
device fields with the passed-in arguments. For a complete description, see the add_device function.
If no device prefix is used, the HOST device will be used to open the file.
Description The add_device function adds a device record to the device table allowing that device to
be used for I/O from C. The first entry in the device table is predefined to be the HOST
device on which the debugger is running. The function add_device() finds the first empty
position in the device table and initializes the fields of the structure that represent a
device.
To open a stream on a newly added device use fopen( ) with a string of the format
devicename : filename as the first argument.
• The name is a character string denoting the device name. The name is limited to 8
characters.
• The flags are device characteristics. The flags are as follows:
_SSA Denotes that the device supports only one open stream at a time
_MSA Denotes that the device supports multiple open streams
More flags can be added by defining them in file.h.
• The dopen, dclose, dread, dwrite, dlseek, dunlink, and drename specifiers are
function pointers to the functions in the device driver that are called by the low-level
functions to perform I/O on the specified device. You must declare these functions
with the interface specified in Section 8.2.2. The device driver for the HOST that the
TMS320C6000 debugger is run on are included in the C I/O library.
260 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com add_device — Add Device to Device Table
#include <file.h>
#include <stdio.h>
/****************************************************************************/
/* Declarations of the user-defined device drivers */
/****************************************************************************/
extern int MYDEVICE_open(const char *path, unsigned flags, int fno);
extern int MYDEVICE_close(int fno);
extern int MYDEVICE_read(int fno, char *buffer, unsigned count);
extern int MYDEVICE_write(int fno, const char *buffer, unsigned count);
extern off_t MYDEVICE_lseek(int fno, off_t offset, int origin);
extern int MYDEVICE_unlink(const char *path);
extern int MYDEVICE_rename(const char *old_name, char *new_name);
main()
{
FILE *fid;
add_device("mydevice", _MSA, MYDEVICE_open, MYDEVICE_close, MYDEVICE_read,
MYDEVICE_write, MYDEVICE_lseek, MYDEVICE_unlink, MYDEVICE_rename);
fid = fopen("mydevice:test","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
}
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 261
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
Handling Reentrancy (_register_lock() and _register_unlock() Functions) www.ti.com
The arguments to _register_lock() and _register_unlock() should be functions which take no arguments
and return no values, and which implement some sort of global semaphore locking:
extern volatile sig_atomic_t *sema = SHARED_SEMAPHORE_LOCATION;
static int sema_depth = 0;
static void my_lock(void)
{
while (ATOMIC_TEST_AND_SET(sema, MY_UNIQUE_ID) != MY_UNIQUE_ID);
sema_depth++;
}
static void my_unlock(void)
{
if (!--sema_depth) ATOMIC_CLEAR(sema);
}
The run-time-support nests calls to _lock(), so the primitives must keep track of the nesting level.
262 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com C6700 FastMath Library
If you are using Code Composer Studio, include the C6700 FastMath library in your project, and ensure it
appears before the standard run-time-support library in the Link Order tab in the Build Options dialog box.
For details, refer to the TMS320C67x FastRTS Library Programmer's Reference (SPRU100).
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 263
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
Library-Build Process www.ti.com
All three of these programs are provided as a non-optional feature of CCS 5.1. They are also available as
part of the optional XDC Tools feature if you are using an earlier version of CCS.
The mklib program looks for these executables in the following order:
1. in your PATH
2. in the directory getenv("CCS_UTILS_DIR")/cygwin
3. in the directory getenv("CCS_UTILS_DIR")/bin
4. in the directory getenv("XDCROOT")
5. in the directory getenv("XDCROOT")/bin
If you are invoking mklib from the command line, and these executables are not in your path, you must set
the environment variable CCS_UTILS_DIR such that getenv("CCS_UTILS_DIR")/bin contains the correct
programs.
264 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com Library-Build Process
The index library describes a set of libraries with different build attributes. The linker will compare the build
attributes for each potential library with the build attributes of the application and will pick the best fit. For
details on the index library, see the archiver chapter in the TMS320C6000 Assembly Language Tools
User's Guide.
Now that the linker has decided which library to use, it checks whether the run-time-support library is
present in C6X_C_DIR . The library must be in exactly the same directory as the index library libc.a. If the
library is not present, the linker will will invoke mklib to build it. This happens when the library is missing,
regardless of whether the user specified the name of the library directly or allowed the linker to pick the
best library from the index library.
The mklib program builds the requested library and places it in 'lib' directory part of C6X_C_DIR in the
same directory as the index library, so it is available for subsequent compilations.
Things to watch out for:
• The linker invokes mklib and waits for it to finish before finishing the link, so you will experience a one-
time delay when an uncommonly-used library is built for the first time. Build times of 1-5 minutes have
been observed. This depends on the power of the host (number of CPUs, etc).
• In a shared installation, where an installation of the compiler is shared among more than one user, it is
possible that two users might cause the linker to rebuild the same library at the same time. The mklib
program tries to minimize the race condition, but it is possible one build will corrupt the other. In a
shared environment, all libraries which might be needed should be built at install time; see
Section 8.5.2.2 for instructions on invoking mklib directly to avoid this problem.
• The index library must exist, or the linker is unable to rebuild libraries automatically.
• The index library must be in a user-writable directory, or the library is not built. If the compiler
installation must be installed read-only (a good practice for shared installation), any missing libraries
must be built at installation time by invoking mklib directly.
• The mklib program is specific to a certain version of a certain library; you cannot use one compiler
version's run-time support's mklib to build a different compiler version's run-time support library.
Some targets have many libraries, so this step can take a long time. To build a subset of the libraries,
invoke mklib individually for each desired library.
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 265
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
Library-Build Process www.ti.com
Examples:
To build all standard libraries and place them in the compiler's library directory:
mklib --all --index=$C_DIR/lib
To build one standard library and place it in the compiler's library directory:
mklib --pattern=rts6200.lib --index=$C_DIR/lib
To build a custom library that is just like rts6200.lib, but has symbolic debugging support enabled:
mklib --pattern=rts6200.lib --extra_options="-g" --index=$C_DIR/lib --install_to=$Project/Debug --
name=rts6200_debug.lib
266 Using Run-Time-Support Functions and Building Libraries SPRU187U – July 2012
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
www.ti.com Library-Build Process
SPRU187U – July 2012 Using Run-Time-Support Functions and Building Libraries 267
Submit Documentation Feedback
Copyright © 2012, Texas Instruments Incorporated
Chapter 9
SPRU187U – July 2012
The C++ compiler implements function overloading, operator overloading, and type-safe linking by
encoding a function's prototype and namespace in its link-level name. The process of encoding the
prototype into the linkname is often referred to as name mangling. When you inspect mangled names,
such as in assembly files, disassembler output, or compiler or linker diagnostics, it can be difficult to
associate a mangled name with its corresponding name in the C++ source code. The C++ name
demangler is a debugging aid that translates each mangled name it detects to its original name found in
the C++ source code.
These topics tell you how to invoke and use the C++ name demangler. The C++ name demangler reads
in input, looking for mangled names. All unmangled text is copied to output unaltered. All mangled names
are demangled before being copied to output.
By default, the C++ name demangler outputs to standard output. You can use the -o file option if you want
to output to a file.
class banana {
public:
int calories(void);
banana();
~banana();
};
int calories_in_a_banana(void)
{
banana x;
return x.calories();
}
_calories_in_a_banana__Fv:
;** ----------------------------------------------------------------------*
CALL .S1 ___ct__6bananaFv ; |10|
STW .D2T2 B3,*SP--(16) ; |9|
MVKL .S2 RL0,B3 ; |10|
MVKH .S2 RL0,B3 ; |10|
ADD .S1X 8,SP,A4 ; |10|
NOP 1
RL0: ; CALL OCCURS ; |10|
CALL .S1 _calories__6bananaFv ; |12|
MVKL .S2 RL1,B3 ; |12|
ADD .S1X 8,SP,A4 ; |12|
MVKH .S2 RL1,B3 ; |12|
NOP 2
RL1: ; CALL OCCURS ; |12|
CALL .S1 ___dt__6bananaFv ; |13|
STW .D2T1 A4,*+SP(4) ; |12|
ADD .S1X 8,SP,A4 ; |13|
MVKL .S2 RL2,B3 ; |13|
MVK .S2 0x2,B4 ; |13|
MVKH .S2 RL2,B3 ; |13|
RL2: ; CALL OCCURS ; |13|
LDW .D2T1 *+SP(4),A4 ; |12|
LDW .D2T2 *++SP(16),B3 ; |13|
NOP 4
RET .S2 B3 ; |13|
NOP 5
; BRANCH OCCURS ; |13|
Executing the C++ name demangler demangles all names that it believes to be mangled. Enter:
dem6x calories_in_a_banana.asm
The result is shown in Example 9-3. The linknames in Example 9-2 ___ct__6bananaFv,
_calories__6bananaFv, and ___dt__6bananaFv are demangled.
calories_in_a_banana():
;** ----------------------------------------------------------------------*
CALL .S1 banana::banana() ; |10|
STW .D2T2 B3,*SP--(16) ; |9|
MVKL .S2 RL0,B3 ; |10|
MVKH .S2 RL0,B3 ; |10|
ADD .S1X 8,SP,A4 ; |10|
NOP 1
RL0: ; CALL OCCURS ; |10|
CALL .S1 banana::calories() ; |12|
MVKL .S2 RL1,B3 ; |12|
ADD . S1X 8,SP,A4 ; |12|
MVKH .S2 RL1,B3 ; |12|
NOP 2
RL1: ; CALL OCCURS ; |12|
CALL .S1 banana::~banana() ; |13|
STW .D2T1 A4,*+SP(4) ; |12|
ADD .S1X 8,SP,A4 ; |13|
MVKL .S2 RL2,B3 ; |13|
MVK . S2 0x2,B4 ; |13|
MVKH . S2 RL2,B3 ; |13|
RL2: ; CALL OCCURS ; |13|
LDW .D2T1 *+SP(4),A4 ; |12|
LDW .D2T2 *++SP(16),B3 ; |13|
NOP 4
RET .S2 B3 ; |13|
NOP 5
; BRANCH OCCURS ; |13|
Glossary
absolute lister— A debugging tool that allows you to create assembler listings that contain absolute
addresses.
assignment statement— A statement that initializes a variable with a value.
autoinitialization— The process of initializing global C variables (contained in the .cinit section) before
program execution begins.
autoinitialization at run time— An autoinitialization method used by the linker when linking C code. The
linker uses this method when you invoke it with the --rom_model link option. The linker loads the
.cinit section of data tables into memory, and variables are initialized at run time.
alias disambiguation— A technique that determines when two pointer expressions cannot point to the
same location, allowing the compiler to freely optimize such expressions.
aliasing— The ability for a single object to be accessed in more than one way, such as when two
pointers point to a single object. It can disrupt optimization, because any indirect reference could
refer to any other object.
allocation— A process in which the linker calculates the final memory addresses of output sections.
ANSI— American National Standards Institute; an organization that establishes standards voluntarily
followed by industries.
archive library— A collection of individual files grouped into a single file by the archiver.
archiver— A software program that collects several individual files into a single file called an archive
library. With the archiver, you can add, delete, extract, or replace members of the archive library.
assembler— A software program that creates a machine-language program from a source file that
contains assembly language instructions, directives, and macro definitions. The assembler
substitutes absolute operation codes for symbolic operation codes and absolute or relocatable
addresses for symbolic addresses.
assignment statement— A statement that initializes a variable with a value.
autoinitialization— The process of initializing global C variables (contained in the .cinit section) before
program execution begins.
autoinitialization at run time— An autoinitialization method used by the linker when linking C code. The
linker uses this method when you invoke it with the --rom_model link option. The linker loads the
.cinit section of data tables into memory, and variables are initialized at run time.
big endian— An addressing protocol in which bytes are numbered from left to right within a word. More
significant bytes in a word have lower numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also little endian
block— A set of statements that are grouped together within braces and treated as an entity.
.bss section— One of the default object file sections. You use the assembler .bss directive to reserve a
specified amount of space in the memory map that you can use later for storing data. The .bss
section is uninitialized.
byte— Per ANSI/ISO C, the smallest addressable unit that can hold a character.
C/C++ compiler— A software program that translates C source statements into assembly language
source statements.
code generator— A compiler tool that takes the file produced by the parser or the optimizer and
produces an assembly language source file.
COFF— Common object file format; a system of object files configured according to a standard
developed by AT&T. These files are relocatable in memory space.
command file— A file that contains options, filenames, directives, or commands for the linker or hex
conversion utility.
comment— A source statement (or portion of a source statement) that documents or improves
readability of a source file. Comments are not compiled, assembled, or linked; they have no effect
on the object file.
compiler program— A utility that lets you compile, assemble, and optionally link in one step. The
compiler runs one or more source modules through the compiler (including the parser, optimizer,
and code generator), the assembler, and the linker.
configured memory— Memory that the linker has specified for allocation.
constant— A type whose value cannot change.
cross-reference listing— An output file created by the assembler that lists the symbols that were
defined, what line they were defined on, which lines referenced them, and their final values.
.data section— One of the default object file sections. The .data section is an initialized section that
contains initialized data. You can use the .data directive to assemble code into the .data section.
direct call— A function call where one function calls another using the function's name.
directives— Special-purpose commands that control the actions and functions of a software tool (as
opposed to assembly language instructions, which control the actions of a device).
disambiguation— See alias disambiguation
dynamic memory allocation— A technique used by several functions (such as malloc, calloc, and
realloc) to dynamically allocate memory for variables at run time. This is accomplished by defining a
large memory pool (heap) and using the functions to allocate memory from the heap.
ELF— Executable and Linkable Format; a system of object files configured according to the System V
Application Binary Interface specification.
emulator— A hardware development system that duplicates the TMS320C6000 operation.
entry point— A point in target memory where execution starts.
environment variable— A system symbol that you define and assign to a string. Environmental variables
are often included in Windows batch files or UNIX shell scripts such as .cshrc or .profile.
epilog— The portion of code in a function that restores the stack and returns. See also pipelined-loop
epilog.
executable object file— A linked, executable object file that is downloaded and executed on a target
system.
expression— A constant, a symbol, or a series of constants and symbols separated by arithmetic
operators.
external symbol— A symbol that is used in the current program module but defined or declared in a
different program module.
file-level optimization— A level of optimization where the compiler uses the information that it has about
the entire file to optimize your code (as opposed to program-level optimization, where the compiler
uses information that it has about the entire program to optimize your code).
function inlining— The process of inserting code for a function at the point of call. This saves the
overhead of a function call and allows the optimizer to optimize the function in the context of the
surrounding code.
global symbol— A symbol that is either defined in the current module and accessed in another, or
accessed in the current module but defined in another.
high-level language debugging— The ability of a compiler to retain symbolic and high-level language
information (such as type and function definitions) so that a debugging tool can use this
information.
indirect call— A function call where one function calls another function by giving the address of the
called function.
initialization at load time— An autoinitialization method used by the linker when linking C/C++ code. The
linker uses this method when you invoke it with the --ram_model link option. This method initializes
variables at load time instead of run time.
initialized section— A section from an object file that will be linked into an executable object file.
input section— A section from an object file that will be linked into an executable object file.
integrated preprocessor— A C/C++ preprocessor that is merged with the parser, allowing for faster
compilation. Stand-alone preprocessing or preprocessed listing is also available.
interlist feature— A feature that inserts as comments your original C/C++ source statements into the
assembly language output from the assembler. The C/C++ statements are inserted next to the
equivalent assembly instructions.
intrinsics— Operators that are used like functions and produce assembly language code that would
otherwise be inexpressible in C, or would take greater time and effort to code.
ISO— International Organization for Standardization; a worldwide federation of national standards
bodies, which establishes international standards voluntarily followed by industries.
kernel— The body of a software-pipelined loop between the pipelined-loop prolog and the pipelined-loop
epilog.
K&R C— Kernighan and Ritchie C, the de facto standard as defined in the first edition of The C
Programming Language (K&R). Most K&R C programs written for earlier, non-ISO C compilers
should correctly compile and run without modification.
label— A symbol that begins in column 1 of an assembler source statement and corresponds to the
address of that statement. A label is the only assembler statement that can begin in column 1.
linker— A software program that combines object files to form an executable object file that can be
allocated into system memory and executed by the device.
listing file— An output file, created by the assembler, that lists source statements, their line numbers,
and their effects on the section program counter (SPC).
little endian— An addressing protocol in which bytes are numbered from right to left within a word. More
significant bytes in a word have higher numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also big endian
loader— A device that places an executable object file into system memory.
loop unrolling— An optimization that expands small loops so that each iteration of the loop appears in
your code. Although loop unrolling increases code size, it can improve the performance of your
code.
run-time environment— The run time parameters in which your program must function. These
parameters are defined by the memory and register conventions, stack organization, function call
conventions, and system initialization.
run-time-support functions— Standard ISO functions that perform tasks that are not part of the C
language (such as memory allocation, string conversion, and string searches).
run-time-support library— A library file, rts.src, that contains the source for the run time-support
functions.
section— A relocatable block of code or data that ultimately will be contiguous with other sections in the
memory map.
sign extend— A process that fills the unused MSBs of a value with the value's sign bit.
simulator— A software development system that simulates TMS320C6000 operation.
source file— A file that contains C/C++ code or assembly language code that is compiled or assembled
to form an object file.
stand-alone preprocessor— A software tool that expands macros, #include files, and conditional
compilation as an independent program. It also performs integrated preprocessing, which includes
parsing of instructions.
static variable— A variable whose scope is confined to a function or a program. The values of static
variables are not discarded when the function or program is exited; their previous value is resumed
when the function or program is reentered.
storage class— An entry in the symbol table that indicates how to access a symbol.
string table— A table that stores symbol names that are longer than eight characters (symbol names of
eight characters or longer cannot be stored in the symbol table; instead they are stored in the string
table). The name portion of the symbol's entry points to the location of the string in the string table.
structure— A collection of one or more variables grouped together under a single name.
subsection— A relocatable block of code or data that ultimately will occupy continuous space in the
memory map. Subsections are smaller sections within larger sections. Subsections give you tighter
control of the memory map.
symbol— A string of alphanumeric characters that represents an address or a value.
symbolic debugging— The ability of a software tool to retain symbolic information that can be used by a
debugging tool such as a simulator or an emulator.
target system— The system on which the object code you have developed is executed.
.text section— One of the default object file sections. The .text section is initialized and contains
executable code. You can use the .text directive to assemble code into the .text section.
trigraph sequence— A 3-character sequence that has a meaning (as defined by the ISO 646-1983
Invariant Code Set). These characters cannot be represented in the C character set and are
expanded to one character. For example, the trigraph ??' is expanded to ^.
trip count— The number of times that a loop executes before it terminates.
unconfigured memory— Memory that is not defined as part of the memory map and cannot be loaded
with code or data.
uninitialized section— A object file section that reserves space in the memory map but that has no
actual contents. These sections are built with the .bss and .usect directives.
unsigned value— A value that is treated as a nonnegative number, regardless of its actual sign.
variable— A symbol representing a quantity that can assume any of a set of values.
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2012, Texas Instruments Incorporated