
Application Note 3. Issue 2
November 2008
Copyright © 2008 Embecosm Limited
![]() | The document entitled " Howto: Porting the GNU Debugger " by Jeremy Bennett of Embecosm is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License. See the Legal Notice for details. |
Table of Contents
This document complements the existing documentation for GDB ([3], [4], [5]). It is intended to help software engineers porting GDB to a new architecture for the first time.
This application note is based on the author's experience to date. It will be updated in future issues. Suggestions for improvements are always welcome.
Although the GDB project includes a 100 page guide to its internals, that document is aimed primarily at those wishing to develop GDB itself. The document also suffers from three limitations.
It tends to document at a detailed level. Individual functions are described well, but it is hard to get the big picture.
It is incomplete. Many of the most useful sections (for example on frame interpretation) are yet to be written.
Is tends to be out of date. For example the documentation of the UI-Independent output describes a number of functions which no longer exist.
Consequently the engineer faced with their first port of GDB to a new architecture is faced with discovering how GDB works by reading the source code and looking at how other architectures have been ported.
The author of this application note went through that process when porting the OpenRISC 1000 architecture to GDB. This document captures the learning experience, with the intention of helping others.
If you are about to start a port of GDB to a new architecture, this document is for you. If at the end of your endeavors you are better informed, please help by adding to this document.
If you have already been through the porting process, please help others by adding to this document.
The main user guide for GDB [3] provides a great deal of context about how GDB is intended to work.
The GDB Internals document [4] is essential reading before and during any porting exercise. It is not complete, nor is it always up to date, but it provides the first place to look for explanation of what a particular function does.
GDB relies on a separate specification of the Binary file format; for each architecture. That has its own comprehensive user guide [5].
The main GDB code base is generally well commented, particularly in the headers for the major interfaces. Inevitably this must be the definitive place to find out exactly how a particular function behaves.
The files making up the port for the OpenRISC 1000 are comprehensively commented, and can be processed with Doxygen [7]. Each function's behavior, its parameters and any return value is described.
The main GDB website is at sourceware.org/gdb/. It is supplemented by the less formal GDB Wiki at sourceware.org/gdb/wiki/.
The GDB developer community communicate through the GDB mailing lists and using IRC chat. These are always good places to find solutions to problems.
The main mailing list for discussion is gdb@sourceware.org, although for detailed understanding, the patches mailing list, gdb-patches@sourceware.org. See the main GDB website for details of subscribing to these mailing lists.
IRC is channel #gdb on
irc.freenode.net.
Embecosm is a consultancy specializing in open source tools, models and training for the embedded software community. All Embecosm products are freely available under open source licenses.
Embecosm offers a range of commercial services.
Customization of open source tools and software, including porting to new architectures.
Support, tutorials and training for open source tools and software.
Custom software development for the embedded market, including bespoke software models of hardware.
Independent evaluation of software tools.
For further information, visit the Embecosm website at www.embecosm.com.
There are three major areas to GDB:
GDB has a very simple view of a processor. It has a block of memory and a block of registers. Executing code contains its state in the registers and in memory. GDB maps that information to the source level program being debugged.
Porting a new architecture to GDB means providing a way to read executable files, a description of the ABI, a description of the physical architecture and operations to access the target being debugged.
Probably the most common use of GDB is to debug the architecture on which it is actually running. This is native debugging where the architecture of the host and target are the same.
For the OpenRISC 1000 GDB is normally run on a host separate to the target (typically a workstation) connecting to the OpenRISC 1000 target via JTAG, using the OpenRISC 1000 Remote JTAG Protocol. Remote debugging in this way is the most common method of working for embedded systems.
A full Glossary is provided at the end of this document. However a number of key concepts are worth explaining up front.
Exec or program. An executable program, i.e. a binary file which may be run independently of other programs. Commonly the term program is found in user documentation, and exec in comments and GDB internal documentation.
Inferior. A GDB entity representing a program or exec which has run, is running, or will run in the future. An inferior corresponds to a process or a core dump file.
Address space. A GDB entity which can interpret addresses (that is values of type CORE_ADDR). Inferiors must have at least one address space and inferiors may share an address space.
Thread. A single thread of control within an inferior.
The OpenRISC 1000 port for GDB is designed for "bare metal" debugging, so will have only a single address space and inferiors with a single thread.
BFD is a package which allows applications to use the same routines to operate on object files whatever the object file format. A new object file format can be supported simply by creating a new BFD back end and adding it to the library.
The BFD library back end creates a number of data structures describing the data held in a particular type of object file. Ultimately a unique enumerated constant (of type enum bfd_architecture) is defined for each individual architecture. This constant is then used to access the various data structures associated with the BFD of the particular architecture.
In the case of the OpenRISC 1000, 32-bit implementation (which may
be a COFF or ELF binary), the
enumerated constant is bfd_arch_or32.
BFD is part of the binutils package. A binutils implementation must be provided for any architecture intending to support the GNU tool chain.
The OpenRISC 1000 is supported by the GNU tool chain. BFD back ends already exist which are suitable for use with 32-bit OpenRISC 1000 images in ELF or COFF format as used with either the RTEMS or Linux operating systems.
Any architecture to be debugged by GDB is described in a struct gdbarch. When an object file is to be debugged, GDB will select the correct struct gdbarch using information about the object file captured in its BFD.
The data in struct gdbarch facilitates both the symbol side processing (for which it also uses the BFD information) and the target side processing (in combination with the frame and target operation information).
struct gdbarch is a mixture of data values (number of bytes in an integer for example) and functions to perform standard operations (e.g. to print the registers). The major functional groups are:
Data values capturing details of the hardware architecture. For example the endianism and the number of bits in an address and in a word. Some of this data is captured in the BFD, to which there is a reference in the struct gdbarch. There is also a structure, struct gdbarch_tdep to capture additional target specific data, beyond that which is covered by the standard struct gdbarch.
Data values describing how all the standard high level scalar data structures are represented (char, int, double etc).
Functions to access and display registers. GDB includes the concept of "pseudo-registers", those registers which do not physically exist, but which have a meaning within the architecture. For example in the OpenRISC 1000, floating point registers are actually the same as the General Purpose Registers. However a set of floating point pseudo-registers could be defined, to allow the GPRs to be displayed in floating point format.
Functions to access information on stack frames. This includes setting up "dummy" frames to allow GDB to evaluate functions (for example using the call command).
An architecture will need to specify most of the contents of
struct gdbarch, for which a set of functions (all starting
set_gdbarch_) are provided. Defaults are provided
for all entries, and in a small number of cases these will be
suitable.
Analysis of the stack frames of executing programs is complex with different approaches needed for different circumstances. A set of functions to identify stack frames and analyze their contents is associated with each struct gdbarch.
A set of utility functions are provided to access the members of
struct gdbarch. Element xyz of a struct gdbarch pointed to
by g may be accessed by using
gdbarch_xyz (g, ...). This will check,
using gdb_assert that g is
defined, and in the case of functions that g->x
is not NULL and return either the value g->xyz
(for values) or the result of calling
g->xyz (...) (for functions). This saves the
user testing for existence before each function call, and ensures
any errors are handled cleanly.
A set of operations is required to access a program using the target architecture described by struct gdbarch in order to implement the target side functionality. For any given architecture there may be multiple ways of connecting to the target, specified using the GDB target command. For example with the OpenRISC 1000 architecture, the connection may be directly to a JTAG interface connected through the host computer's parallel port, or through the OpenRISC 1000 Remote JTAG Protocol over TCP/IP.
These target operations are described in a struct target_ops. As with struct gdbarch this comprises a mixture of data and functions. The major functional groups are:
Functions to establish and close down a connection to the target.
Functions to access registers and memory on the target.
Functions to insert and remote breakpoints and watchpoints on the target.
Functions to start and stop programs running on the target.
A set of data describing the features of the target, and hence what operations can be applied. For example when examining a core dump, the data can be inspected, but the program cannot be executed.
As with struct gdbarch, defaults are provided for the struct target_ops values. In many cases these are sufficient, so need not be provided.
GDB's command handling is intended to be extensible. A set of
functions (defined in cli-decode.h) provide
that extensibility.
GDB groups its commands into a number of command lists (of
struct cmd_list_element), pointed to by
a number of global variables (defined in
cli-cmds.h). Of these, cmdlist
is the list of all defined commands. Separate lists define
sub-commands of various top level commands. For example
infolist is the list of all info
sub-commands.
Commands are also classified according the the area they address, for
example commands that provide support, commands that examine data,
commands for file handling etc. These classes are specified by
enum command_class, defined in
command.h. These classes provide the top level
categories in which help will be given.
A GDB description for a new architecture, arch is created by
defining a global function
_initialize_arch_tdep, by
convention in the source file arch-tdep.c. In the
case of the OpenRISC 1000, this function is called
_initialize_or1k_tdep and is found in the file
or1k-tdep.c.
The resulting object files containing the implementation of the
_initialize_arch_tdep function are specified in
the GDB configure.tgt file, which includes a
large case statement pattern matching against the
--target option of the configure
command.
The new struct gdbarch is created within the
_initialize_arch_tdep function by calling
gdbarch_register:
void gdbarch_register (enum bfd_architecture architecture,
gdbarch_init_ftype *init_func,
gdbarch_dump_tdep_ftype *tdep_dump_func);
For example the _initialize_or1k_tdep creates
its architecture for 32-bit OpenRISC 1000 architectures by calling.
gdbarch_register (bfd_arch_or32, or1k_gdbarch_init, or1k_dump_tdep);
The architecture enumeration will identify the
unique BFD for this architecture (see Section 2.2.1). The init_func is called to create and return
the new struct gdbarch (see Section 2.3). The
tdep_dump_func is a function which will dump the
target specific details associated with this architecture (also
described in Section 2.3).
The call to gdbarch_register (see Section 2.2) specifies a function which will
define a struct gdbarch for a particular BFD architecture.
struct gdbarch gdbarch_init_func (struct gdbarch_info info,
struct gdbarch_list *arches);
For example, in the case of the OpenRISC 1000 architecture, the
initialization function is or1k_gdbarch_init.
The first argument to the architecture initialization function is a
struct gdbarch_info containing all the known information about this
architecture (deduced from the BFD enumeration provided to
gdbarch_register). The second argument is a list
of the currently defined architectures within GDB.
The lookup is done using
gdbarch_list_lookup_by_info. It is passed the
list of existing architectures and the struct gdbarch_info (possibly
updated) and returns the first matching architecture it finds, or
NULL if none are found. If an architecture is found, the
initialization function can finish, returning the found architecture
as result.
The struct gdbarch_info has the following components:
struct gdbarch_info
{
const struct bfd_arch_info *bfd_arch_info;
int byte_order;
bfd *abfd;
struct gdbarch_tdep_info *tdep_info;
enum gdb_osabi osabi;
const struct target_desc *target_desc;
};
bfd_arch_info holds the key details about the
architecture. byte_order is an enumeration
indicating the endianism. abfd is a pointer to
the full BFD, tdep_info is additional custom
target specific information, gdb_osabi is an
enumeration identifying which (if any) of a number of operating
specific ABIs are used by this architecture and
target_desc is a set of name-value pairs with
information about register usage in this target.
When the struct gdbarch initialization function is called, not all the
fields are provided—only those which can be deduced from the
BFD. The struct gdbarch_info is used as a look-up key with the list of
existing architectures (the second argument to the initialization
function) to see if a suitable architecture already exists. The
tdep_info osabi and
target_desc fields may be added before this
lookup to refine the search.
If no architecture is found, then a new architecture must be
created, by calling gdbarch_alloc using the
supplied struct gdbarch_info and and any additional custom target specific
information in a struct gdbarch_tdep.
The newly created struct gdbarch must then be populated. Although there are
default values, in most cases they are not what is required. For each
element, X, there is a corresponding accessor
function to set the value of that element,
set_gdbarch_X.
The following sections identify the main elements that should be set
in this way. This is not the complete list, but represents the
functions and elements that must commonly be specified for a new
architecture. Many of the functions are described in the header file,
gdbarch.h and many may be found in the GDB
Internals document [4].
struct gdbarch *gdbarch_alloc (const struct gdbarch_info *info,
struct gdbarch_tdep *tdep);
struct gdbarch_tdep is not defined within GDB—it is up to the
user to define this struct if it is needed to
hold custom target information that is not covered by the standard
struct gdbarch. For example with the OpenRISC 1000 architecture it is
used to hold the number of matchpoints available in the target
(along with other information). If there is no additional target
specific information, it can be set to NULL.
A set of values in struct gdbarch define how different data types are represented within the architecture.
short_bit. Number of bits in a C/C++
short variable. Default is
2*TARGET_CHAR_BIT.
TARGET_CHAR_BIT is a defined constant, which
if not set explicitly defaults to 8.
int_bit, long_bit,
long_long_bit, float_bit,
double_bit,
long_double_bit. These are analogous to
short and are the number of bits in a C/C++
variable of the corresponding time. Defaults are
4*TARGET_CHAR_BIT for int,
long and float and
4*TARGET_CHAR_BIT for long long,
double and long double.
ptr_bit. Number of bits in a C/C++
pointer. Default is 4*TARGET_CHAR_BIT.
addr_bit. Number of bits in a C/C++
address. Almost always this is the same as the number of bits in
a pointer, but there are a small number of architectures for
which pointers cannot reach all addresses. Default is
4*TARGET_CHAR_BIT.
float_format,
double_format and
long_double_format. These point to an array
of C structs (one for each endianism),
defining the format for each of the floating point types. A
number of these arrays are predefined. They in turn are built on
top of a set of standard types defined by the library
libiberty.
char_signed. 1 if char to be
treated as signed, 0 if char is to be treated as
unsigned. The default is -1 (undefined), so this should always
be set.
A set of function members of struct gdbarch define aspects of the architecture and its ABI. For some of these functions, defaults are provided which will be suitable for most architectures.
return_value. This function determines the
return convention for a given data type. For example on the
OpenRISC 1000, structs/unions and large (>32 bit) scalars are
returned as references, while small scalars are returned in
GPR 11. This function should always be defined.
breakpoint_from_pc. Returns the breakpoint
instruction to be used when the PC is at a particular location
in memory. For architectures with variable length instructions,
the choice of breakpoint instruction may depend on the length of
the instruction at the program counter. Returns the instruction
sequence and its length.
The default value is NULL (undefined). This function should
always be defined if GDB is to support breakpointing for this
architecture.
adjust_breakpoint_address. Some
architectures do not allow breakpoints to be placed at all
points. Given a program counter, this function returns an
address where a breakpoint can be
placed. Default value is NULL (undefined). The function need
only be defined for architectures which cannot accept a
breakpoint at all program counter locations.
memory_insert_breakpoint and
memory_remove_breakpoint. These functions
insert or remove memory based (a.k.a. soft) breakpoints. The
default values
default_memory_insert_breakpoint and
default_memory_remove_breakpoint are
suitable for most architectures, so in most cases these
functions need not be defined.
decr_pc_after_break. Some architectures
require the program counter to be decremented after a break, to
allow the broken instruction to be executed on resumption. This
function returns the number of bytes by which to decrement the
address. The default value is NULL (undefined) which means the
program counter is left unchanged. This function need only be
defined if the functionality is required.
In practice this function is only of use for the very simplest
architectures. It applies only to software breakpoints, not
watchpoints or hardware breakpoints. It is more usual to adjust
the program counter as required in the target
to_wait and to_resume
functions (see Section 2.4).
single_step_through_delay. Returns 1 if the
target is executing a delay slot and a further single step is
needed before the instruction finishes. The default value is
NULL (not defined). This function should be implemented if the
target has delay slots.
print_insn. Disassemble an instruction and
print it. Default value is NULL (undefined). This function
should be defined if disassembly of code is to be supported.
Disassembly is a function required by the
binutils library. This function is
defined in the opcodes sub-directory. A
suitable implementation may already exist if
binutils has already been ported.
GDB considers registers to be a set with members numbered linearly from 0 upwards. The first part of that set corresponds to real physical registers, the second part to any "pseudo-registers". Pseudo-registers have no independent physical existence, but are useful representations of information within the architecture. For example the OpenRISC 1000 architecture has up to 32 general purpose registers, which are typically represented as 32-bit (or 64-bit) integers. However it could be convenient to define a set of pseudo-registers, to show the GPRs represented as floating point registers.
For any architecture, the implementer will decide on a mapping from hardware to GDB register numbers. The registers corresponding to real hardware are referred to as raw registers, the remaining registers are pseudo-registers. The total register set (raw and pseudo) is called the cooked register set.
These functions specify the number and type of registers in the architecture.
read_pc and
write_pc. Functions to read the program
counter. The default value is NULL (no function
available). However, if the program counter is just an ordinary
register, it can be specified in struct gdbarch instead (see
pc_regnum below) and it will be read or
written using the standard routines to access registers. Thus
this function need only be specified if the
program counter is not an ordinary register.
pseudo_register_read and
pseudo_register_write. These functions
should be defined if there are any pseudo-registers (see Section 2.2.2 and Section 2.3.5.3 for more information on
pseudo-registers). The default value is NULL.
num_regs and
num_pseudo_regs. These define the number of
real and pseudo-registers. They default to -1 (undefined) and
should always be explicitly defined.
sp_regnum, pc_regnum,
ps_regnum and
fp0_regnum. These specify the register
holding the stack pointer, program counter, processor status
and first floating point register. All except the first
floating-point register (which defaults to 0) default to -1
(not defined). They may be real or
pseudo-registers. sp_regnum must always be
defined. If pc_regnum is not defined, then
the functions read_pc and
write_pc (see above) must be defined. If
ps_regnum is not defined, then the
$ps variable will not be available to the
GDB user. fp0_regnum is not needed unless
the target offers support for floating point.
These functions return information about registers.
register_name. This function should
convert a register number (raw or pseudo) to a register name
(as a C char *). This is used both to
determine the name of a register for output and to work out
the meaning of any register names used as input. For example
with the OpenRISC 1000, GDB registers 0-31 are the General Purpose
Registers, register 32 is the program counter and register 33
is the supervision register, which map to the strings
"gpr00" through "gpr31",
"pc" and "sr"
respectively. This means that the GDB command print
$gpr5 should print the value of the OR1K general
purpose register 5. The default value for this function is
NULL. It should always be defined.
Historically, GDB always had a concept of a frame
pointer register, which could be accessed via the
GDB variable, $fp. That concept is now
deprecated, recognizing that not all architectures have a
frame pointer. However if an architecture does have a frame
pointer register, and defines a register or pseudo-register
with the name "fp", then that register will
be used as the value of the $fp variable.
register_type. Given a register number,
this function identifies the type of data it may be holding,
specified as a
struct type. GDB allows
creation of arbitrary types, but a number of built in types
are provided (builtin_type_void,
builtin_type_int32 etc), together with
functions to derive types from these. Typically the program
counter will have a type of "pointer to function" (it points
to code), the frame pointer and stack pointer will have types
of "pointer to void" (they point to data on the stack) and all
other integer registers will have a type of 32-bit integer or
64-bit integer. This information guides the formatting when
displaying out register information. The default value is
NULL meaning no information is available to guide formatting
when displaying registers.
print_registers_info. Define this function
to print out one or all of the registers for the GDB
info registers command. The default
value is the function
default_print_registers_info which uses the
type information (see register_type above)
to determine how each register should be printed. Define this
function for fuller control over how the registers are
displayed.
print_float_info and
print_vector_info. Define this function to
provide output for the GDB info float
and info vector commands
respectively. The default value is NULL (not defined), meaning
no information will be provided. Define each function if the
target supports floating point or vector operations
respectively.
register_reggroup_p. GDB groups
registers into different categories (general, vector, floating
point etc). This function given a register and group returns 1
(true) if the register is in the group and 0 otherwise. The
default value is the function
default_register_reggroup_p which will do
a reasonable job based on the type of the register (see the
function register_type above), with
groups for general purpose registers, floating point
registers, vector registers and raw (i.e not pseudo)
registers.
Caching of registers is used, so that the target does not need to be accessed and reanalyzed multiple times for each register in circumstances where the register value cannot have changed.
GDB provides struct regcache, associated with a particular struct gdbarch to
hold the cached values of the raw registers. A set of functions is
provided to access both the raw registers (with
raw in their name) and the full set of cooked
registers (with cooked in their name). Functions
are provided to ensure the register cache is kept synchronized with
the values of the actual registers in the target.
Accessing registers through the struct regcache routines will ensure that the appropriate struct gdbarch functions are called when necessary to access the underlying target architecture. In general users should use the "cooked" functions, since these will map to the "raw" functions automatically as appropriate.
The two key functions are regcache_cooked_read
and regcache_cooked_write which read or write a
register to or from a byte buffer (type
gdb_byte *). For convenience the wrapper functions
regcache_cooked_read_signed,
regcache_cooked_read_unsigned,
regcache_cooked_write_signed and
regcache_cooked_write_unsigned are provided,
which read or write the value and convert to or from a value as
appropriate.
GDB needs to understand the stack on which local (automatic) variables are stored. The area of the stack containing all the local variables for a function invocation is known as the stack frame for that function (or colloquially just as the "frame"). In turn the function that called the function will have its stack frame, and so on back through the chain of functions that have been called.
Almost all architectures have one register dedicated to point to the end of the stack (the stack pointer). Many have a second register which points to the start of the currently active stack frame (the frame pointer). The specific arrangements for an architecture are a key part of the ABI.
A diagram helps to explain this. Here is a simple program to compute factorials:
1: #include <stdio.h>
2:
3: int fact( int n )
4: {
5: if( 0 == n ) {
6: return 1;
7: }
8: else {
9: return n * fact( n - 1 );
10: }
11: }
12:
13: main()
14: {
15: int i;
16:
17: for( i = 0 ; i < 10 ; i++ ) {
18: int f = fact( i );
19: printf( "%d! = %d\n", i, f );
20: }
21: }
Consider the state of the stack when the code reaches line 6 after
the main program has called fact (3). The
chain of function calls will be main,
fact (3),
fact (2),
fact (1) and
fact (0). In this example the stack is
falling (as used by the OpenRISC 1000 ABI). The stack pointer
(SP) is at the end of the stack (lowest address) and the frame
pointer (FP) is at the highest address in the current stack
frame. Figure 2.1 shows how the stack looks.
In each stack frame, offset 0 from the stack pointer is the frame
pointer of the previous frame and offset 4
(this is illustrating a 32-bit architecture) from the stack pointer
is the return address. Local variables are indexed from the frame
pointer, with negative indexes. In the function
fact, offset -4 from the frame pointer is the
argument n. In the main
function, offset -4 from the frame pointer is the local variable
i and offset -8 from the frame pointer is the
local variable f.
![]() | Note |
|---|---|
This is a simplified example for illustrative purposes only. Good optimizing compilers would not put anything on the stack for such simple functions. Indeed they might eliminate the recursion and use of the stack entirely! |
It is very easy to get confused when examining stacks. GDB has
terminology it uses rigorously throughout. The stack frame of the
function currently executing, or where execution stopped is numbered
zero. In this example frame #0 is the stack frame of the call to
fact (0). The stack frame of its calling
function (fact(1) in this case) is numbered #1
and so on back through the chain of calls.
The main GDB data structure describing frames is struct frame_info. It is not used directly, but only via its accessor functions. struct frame_info includes information about the registers in the frame and a pointer to the code of the function with which the frame is associated. The entire stack is represented as a linked list of struct frame_info.
It is easy to get confused when referencing stack frames. GDB uses some precise terminology.
THIS frame is the frame currently under consideration.
The NEXT frame, also sometimes called the inner or newer frame is the frame of the function called by the function of THIS frame.
The PREVIOUS frame, also sometimes called the outer or older frame is the frame of the function which called the function of THIS frame.
So in the example of Figure 2.1, if THIS
frame is #3 (the call to fact (3)), the
NEXT frame is frame #2 (the call to
fact (2)) and the PREVIOUS frame is frame
#4 (the call to main ()).
The innermost frame is the frame of the current
executing function, or where the program stopped, in this example,
in the middle of the call to fact (0)). It
is always numbered frame #0.
The base of a frame is the address immediately before the start of the NEXT frame. For a falling stack this will be the lowest address and for a rising stack this will be the highest address in the frame.
GDB functions to analyze the stack are typically given a pointer to the NEXT frame to determine information about THIS frame. Information about THIS frame includes data on where the registers of the PREVIOUS frame are stored in this stack frame. In this example the frame pointer of the PREVIOUS frame is stored at offset 0 from the stack pointer of THIS frame.
The process whereby a function is given a pointer to the NEXT
frame to work out information about THIS frame is referred to as
unwinding. The GDB functions involved in this
typically include unwind in their name.
The process of analyzing a target to determine the information that
should go in struct frame_info is called
sniffing. The functions that carry this out are
called sniffers and typically include
sniffer in their name. More than one sniffer may
be required to extract all the information for a particular frame.
Because so many functions work using the NEXT frame, there is an issue about addressing the innermost frame—it has no NEXT frame. To solve this GDB creates a dummy frame #-1, known as the sentinel frame.
All the frame sniffing functions typically examine the code at the start of the corresponding function, to determine the state of registers. The ABI will save old values and set new values of key registers at the start of each function in what is known as the function prologue.
For any particular stack frame this data does not change, so all the standard unwinding functions, in addition to receiving a pointer to the NEXT frame as their first argument, receive a pointer to a prologue cache as their second argument. This can be used to store values associated with a particular frame, for reuse on subsequent calls involving the same frame.
It is up to the user to define the structure used (it is a void * pointer) and arrange allocation and deallocation of storage. However for general use, GDB provides struct trad_frame_cache, with a set of accessor routines. This structure holds the stack and code address of THIS frame, the base address of the frame, a pointer to the struct frame_info for the NEXT frame and details of where the registers of the PREVIOUS frame may be found in THIS frame.
Typically the first time any sniffer function is called with
NEXT frame, the prologue sniffer for THIS frame will be
NULL. The sniffer will analyze the frame, allocate a prologue
cache structure and populate it. Subsequent calls using the same
NEXT frame will pass in this prologue cache, so the data can be
returned with no additional analysis.
These struct gdbarch functions and value provide analysis of the stack frame and allow it to be adjusted as required.
skip_prologue. The prologue of a function
is the code at the beginning of the function which sets up the
stack frame, saves the return address etc. The code
representing the behavior of the function starts after the
prologue.
This function skips past the prologue of a function if the program counter is within the prologue of a function. With modern optimizing compilers, this may be a far from trivial exercise. However the required information may be within the binary as DWARF2 debugging information, making the job much easier.
The default value is NULL (not defined). This function
should always be provided, but can take advantage of DWARF2
debugging information, if that is available.
inner_than. Given two frame or stack
pointers, return 1 (true) if the first represents the "inner"
stack frame and 0 (false) otherwise. This is used to determine
whether the target has a rising or a falling stack frame. See
Section 2.3.6 for an explanation of "inner"
frames.
The default value of this function is NULL and it should
always be defined. However for almost all architectures one of
the built-in functions can be used:
core_addr_lessthan (for falling stacks)
or core_addr_greaterthan (for rising
stacks).
frame_align. The architecture may have
constraints on how its frames are aligned. Given a proposed
address for the stack pointer, this function returns a
suitably aligned address (by expanding the stack frame). The
default value is NULL (undefined). This function should be
defined for any architecture where it is possible the stack
could become misaligned. The utility functions
align_down (for falling stacks) and
align_up (for rising stacks) will
facilitate the implementation of this function.
frame_red_zone_size. Some ABIs reserve
space beyond the end of the stack for use by leaf functions
without prologue or epilogue or by exception handlers
(OpenRISC 1000 is in this category). This is known as a
red zone (AMD
terminology). The default value is 0. Set this field if the
architecture has such a red zone.
These functions provide access to key registers and arguments in the stack frame.
unwind_pc and
unwind_sp. These functions are given a
pointer to THIS stack frame (see Section 2.3.6 for how frames are represented) and return the value of the
program counter and stack pointer respectively in the PREVIOUS
frame (i.e. the frame of the function that called this one).
frame_num_args. Given a pointer to THIS
stack frame (see Section 2.3.6 for how frames
are represented), return the number of arguments that are
being passed, or -1 if not known. The default value is NULL
(undefined), in which case the number of arguments passed on
any stack frame is always unknown. For many architectures this
will be a suitable default.
GDB can call functions in the target code (for example by using the call or print commands). These functions may be breakpointed, and it is essential that if a function does hit a breakpoint, commands like backtrace work correctly.
This is achieved by making the stack look as though the function had been called from the point where GDB had previously stopped. This requires that GDB can set up stack frames appropriate for such function calls.
The following functions provide the functionality to set up such "dummy" stack frames.
push_dummy_call. This function sets up a
dummy stack frame for the function about to be
called. push_dummy_call is given the
arguments to be passed and must copy them into registers or
push them on to the stack as appropriate for the ABI. GDB
will then pass control to the target at the address of the
function, and it will find the stack and registers set up just
as expected.
The default value of this function is NULL (undefined). If the
function is not defined, then GDB will not allow the user to
call functions within the target being debugged.
unwind_dummy_id. This is the inverse of
push_dummy_call which restores the stack
and frame pointers after a call to evaluate a function using a
dummy stack frame. The default value is NULL (undefined). If
push_dummy_call is defined, then this
function should also be defined.
push_dummy_code. If this function is not
defined (its default value is NULL), a dummy call will use the
entry point of the target as its return address. A temporary
breakpoint will be set there, so the location must be writable
and have room for a breakpoint.
It is possible that this default is not suitable. It might not be writable (in ROM possibly), or the ABI might require code to be executed on return from a call to unwind the stack before the breakpoint is encountered.
If either of these is the case, then
push_dummy_code should be defined to push
an instruction sequence onto the end of the stack to which the
dummy call should return.
When a program stops, GDB needs to construct the chain of struct frame_info representing the state of the stack using appropriate sniffers.
Each architecture requires appropriate sniffers, but they do not form entries in struct gdbarch, since more than one sniffer may be required and a sniffer may be suitable for more than one struct gdbarch. Instead sniffers are associated with architectures using the following functions.
frame_unwind_append_sniffer is used to
add a new sniffer to analyze THIS frame when given a pointer
to the NEXT frame.
frame_base_append_sniffer is used to add
a new sniffer which can determine information about the base
of a stack frame.
frame_base_set_default is used to specify
the default base sniffer.
These functions all take a reference to struct gdbarch, so they are associated with a specific architecture. They are usually called in the struct gdbarch initialization function, after the struct gdbarch has been set up. Unless a default has been set, the most recently appended sniffer will be tried first.
The main frame unwinding sniffer (as set by
frame_unwind_append_sniffer) returns a
structure specifying a set of sniffing functions:
struct frame_unwind
{
enum frame_type type;
frame_this_id_ftype *this_id;
frame_prev_register_ftype *prev_register;
const struct frame_data *unwind_data;
frame_sniffer_ftype *sniffer;
frame_prev_pc_ftype *prev_pc;
frame_dealloc_cache_ftype *dealloc_cache;
};
The type field indicates the type of frame this
sniffer can handle: normal, dummy (see
push_dummy_call in Section 2.3), signal handler or sentinel. Signal
handlers sometimes have their own simplified stack structure for
efficiency, so may need their own handlers.
unwind_data holds additional information which
may be relevant to particular types of frame. For example it may
hold additional information for signal handler frames.
The remaining fields define functions that yield different types
of information when given a pointer to the NEXT stack frame. Not
all functions need be provided. If an entry is NULL, the next
sniffer will be tried instead.
this_id determines the stack pointer and
function (code entry point) for THIS stack frame.
prev_register determines where the values
of registers for the PREVIOUS stack frame are stored in THIS
stack frame.
sniffer takes a look at THIS frame's
registers to determine if this is the appropriate unwinder.
prev_pc determines the program counter
for THIS frame. Only needed if the program counter is not an
ordinary register (see prev_pc in Section 2.3).
dealloc_cache frees any additional memory
associated with the prologue cache for this frame (see Section 2.3.6.2).
In general it is only the this_id and
prev_register functions that need be defined
for custom sniffers.
The frame base sniffer is much simpler. It is a struct frame_base, which refers to the corresponding struct frame_unwind and provides functions yielding various addresses within the frame.
struct frame_base
{
const struct frame_unwind *unwind;
frame_this_base_ftype *this_base;
frame_this_locals_ftype *this_locals;
frame_this_args_ftype *this_args;
};
All these functions take a pointer to the NEXT frame as
argument. this_base returns the base address
of THIS frame, this_locals returns the base
address of local variables in THIS frame and
this_args returns the base address of the
function arguments in this frame.
As described above the base address of a frame is the address immediately before the start of the NEXT frame. For a falling stack, this is the lowest address in the frame and for a rising stack it is the highest address in the frame. For most architectures the same address is also the base address for local variables and arguments, in which case the same function can be used for all three entries.
It is worth noting that if it cannot be determined in any other
way (for example by there being a register with the name
"fp"), then the result of the
this_base function will be used as the value
of the frame pointer variable $fp in GDB
The communication with the target is down to a set of target
operations. These operations are held in a struct target_ops,
together with flags describing the behavior of the target. The
struct target_ops elements are defined and documented in
target.h. The sections following describe the
most important of these functions.
GDB has several different types of target: executable files, core dumps, executing processes etc. At any time, GDB may have several sets of target operations in use. For example target operations for use with an executing process (which can run code) might be different from the operations used when inspecting a core dump.
All the targets GDB knows about are held in a stack. GDB walks down the stack to find the set of target operations suitable for use. The stack is organized as a series of strata of decreasing importance: target operations for threads, then target operations suitable for processes, target operations to download remote targets, target operations for core dumps, target operations for executable files and at the bottom target operations for dummy targets. So GDB when debugging a running process will always select target operations from the process_stratum if available, over target operations from the file stratum, even if the target operations from the file stratum were pushed onto the stack more recently.
At any particular time, there is a current
target, held in the global variable
current_target. This can never be NULL—if
there is no other target available, it will point to the dummy target.
target.h defines a set of convenience macros to
access functions and values in the
current_target. Thus
current_target->to_xyz can be accessed as
target_xyz.
Some targets (sets of target operations in a struct target_ops) are set up automatically by GDB—these include the operations to drive simulators (see Section 2.6 and the operations to drive the GDB Remote Serial Protocol (RSP) (see Section 2.7).
Other targets must be set up explicitly by the implementer, using
the add_target function. By far the most common
is the native target for native debugging of the
host. Less common is to set up a non-native target, such as the
JTAG target used with the OpenRISC 1000[1].
A new native target is created by defining a function
_initialize_arch_os_nat for the
architecture, arch and operating system os, in the source file
arch-os-nat.c. A fragment of a makefile
to create the binary from the source is created in the file
config/arch/os.mh with a header giving
any macro definitions etc in
config/arch/nm-os.h (which will be linked
to nm.h at build time).
The _initialize_ function should create a new
struct target_ops and call add_target to add this
target to the list of available targets.
For new native targets there are standard implementations which
can be reused, with just one or two changes. For example the
function linux_trad_target returns a
struct target_ops suitable for most Linux native targets. It may prove
necessary only to alter the description field and the functions to
fetch and store registers.
For a new remote target, the procedure is a little simpler. The
source files should be added to configure.tgt,
just as for the architectural description (see Section 2.3). Within the source file, define a new
function
_initialize_remote_arch to
implement a new remote target, arch.
For new remote targets, the definitions in
remote.c used to implement the RSP provide a
good starting point.
These functions and variables provide information about the target. The first group identifies the name of the target and provides help information for the user.
to_shortname. This string is the name of
target, for use with GDBs target. Setting
to_shortname to foo
means that
target foo will
connect to the target, invoking to_open for
this target (see below).
to_longname. A string giving a brief
description of the type of target. This is printed with the
info target information (see also
to_files_info below).
to_doc. The help text for this target. If the
short name of the target is foo, then the
command help target will print
target foo followed by
the first sentence of this help text. The command
help target foo
will print out the complete text.
to_files_info. This function provides
additional information for the info target
command.
The second group of variables provides information about the current state of the target.
to_stratum. An enumerated constant indicating
to which stratum this struct target_ops belongs
to_has_all_memory. Boolean indicating if the
target includes all of memory, or only part of it. If only part,
then a failed memory request may be able to be satisfied by a
different target in the stack.
to_has_memory. Boolean indicating if the
target has memory (dummy targets do not)
to_has_stack. Boolean indicating if the
target has a stack. Object files do not, core dumps and
executable threads/processes do.
to_has_registers. Boolean indicating if the
target has registers. Object files do not, core dumps and
executable threads/processes do.
to_has_execution. Boolean indicating if the
target is currently executing. For some targets that is the same
as if they are capable of execution. However some remote targets
can be in the position where they are not executing until
create_inferior or
attach is called.
These functions control the connection to the target. For remote targets this may mean establishing and tearing down links using protocols such as TCP/IP. For native targets, these functions will be more concerned with setting flags describing the state.
to_open. This function is invoked by the
GDB target command. Any additional
arguments (beyond the name of the target being invoked) are
passed to this function. to_open should
establish the communications with the target. It should
establish the state of the target (is it already running for
example), and initialize data structures appropriately.
This function should not start the target
running if it is not currently running—that is the job of
the functions (to_create_inferior and
to_resume) invoked by the GDB
run command.
to_xclose and
to_close. Both these functions should close
the remote connection. to_close is the
legacy function. New implementations should use
to_xclose which should also free any memory
allocated for this target.
to_attach. For targets which can run
without a debugger connected, this function attaches the
debugger to a running target (which should first have been
opened).
to_detach. Function to detach from a
target, leaving it running.
to_disconnect. This is similar to
to_detach, but makes no effort to inform
the target that the debugger is detaching. It should just drop
the connection to the target.
to_terminal_inferior. This function
connects the target's terminal I/O to the local terminal. This
functionality is not always available with remote targets.
to_rcmd. If the target is capable of
running commands, then this function requests that command to be
run on the target. This is of most relevance to remote targets.
These functions transfer data to and from the target registers and memory.
to_fetch_registers and
to_store_registers. Functions to populate
the register cache with values from the target and to set target
registers with values in the register cache.
to_prepare_to_store. This function is
called prior to storing registers to set up any additional
information required. In most cases it will be an empty function.
to_load. Load a file into the target. For
most implementations, the generic function,
generic_load, which is reuses the other
target operations for memory access is suitable.
to_xfer_partial. This function is a generic
function to transfer data to and from the target. Its most
important function (often the only one actually implemented) is
to load and store data from and to target memory.
For all targets, GDB can implement breakpoints and write access watchpoints in software, by inserting code in the target. However many targets provide hardware assistance for these functions which is far more efficient, and in addition may implement read access watchpoints.
These functions in struct target_ops provide a mechanism to access such functionality if it is available.
to_insert_breakpoint and
to_remove_breakpoint. These functions
insert and remove breakpoints on the target. They can choose to
use either hardware or software breakpoints. However if the
insert function allows use of hardware breakpoints, then the
GDB command
set breakpoint auto-hw off
will have no effect.
to_can_use_hw_breakpoint. This function
should return 1 (true) if the target can set a hardware
breakpoint or watchpoint and 0 otherwise. The function is passed
an enumeration to indicate whether watchpoints or breakpoints
are being queried, and should use information about the number
of hardware breakpoints/watchpoints currently in use to
determine if a breakpoint/watchpoint can be set.
to_insert_hw_breakpoint and
to_remove_hw_breakpoint. Functions to
insert and remove hardware breakpoints. Return a failure result
if no hardware breakpoint is available.
to_insert_watchpoint and
to_remove_watchpoint. Functions to insert
and remove watchpoints.
to_stopped_by_watchpoint. Function returns
1 (true) if the last stop was due to a watchpoint.
to_stopped_data_address. If the last stop
was due to a watchpoint, this function returns the address of
the data which triggered the watchpoint.
for targets capable of execution, these functions provide the mechanisms to start and stop execution.
to_resume. Function to tell the target to
start running again (or for the first time).
to_wait. Function to wait for the target to
return control to the debugger. Typically control returns when
the target finishes execution or hits a breakpoint. It could
also occur if the connection is interrupted (for example by
ctrl-C).
to_stop. Function to stop the
target—used whenever the target is to be interrupted (for
example by ctrl-C).
to_kill. Kill the connection to the
target. This should work, even if the connection to the target
is broken.
to_create_inferior. For targets which can
execute, this initializes a program to run, ready for it to
start executing. It is invoked by the GDB
run command, which will subsequently call
to_resume to start execution.
to_mourn_inferior. Tidy up after execution
of the target has finished (for example after it has exited or
been killed). Most implementations call the generic function,
generic_mourn_inferior, but may do some
additional tidying up.
[1] For a new remote target of any kind, the recommended approach is to use the standard GDB Remote Serial Protocol (RSP) and have the target implement the server side of this interface. The only remote targets remaining are historic legacy interfaces, such as the OpenRISC 1000 Remote JTAG Protocol.
As noted in Section 2.2, GDB's command
handling is extensible. Commands are grouped into a number of command
lists (of type struct cmd_list_element),
pointed to by a number of global variables (defined in
cli-cmds.h). Of these, cmdlist
is the list of all defined commands, with separate lists defined for
sub-commands of various top level commands. For example
infolist is the list of all info
sub-commands.