Services and Modeling for Embedded Software Development
Embecosm divider strip

Howto: Porting the GNU Debugger

Practical Experience with the OpenRISC 1000 Architecture

Jeremy Bennett

Embecosm

Application Note 3. Issue 2

November 2008


Table of Contents

1. Introduction
1.1. Rationale
1.2. Target Audience
1.3. Further Sources of Information
1.3.1. Written Documentation
1.3.2. Other Information Channels
1.4. About Embecosm
2. Overview of GDB Internals
2.1. GDB Nomenclature
2.2. Main Functional Areas and Data Structures
2.2.1. Binary File Description (BFD)
2.2.2. Architecture Description
2.2.3. Target Operations
2.2.4. Adding Commands to GDB
2.3. GDB Architecture Specification
2.3.1. Looking up an Existing Architecture
2.3.2. Creating a New Architecture
2.3.3. Specifying the Hardware Data Representation
2.3.4. Specifying the Hardware Architecture and ABI
2.3.5. Specifying the Register Architecture
2.3.6. Specifying Frame Handling
2.4. Target Operations
2.4.1. Target Strata
2.4.2. Specifying a New Target
2.4.3. struct target_ops Functions and Variables Providing Information
2.4.4. struct target_ops Functions Controlling the Target Connection
2.4.5. struct target_ops Functions to Access Memory and Registers
2.4.6. struct target_ops Functions to Handle Breakpoints and Watchpoints
2.4.7. struct target_ops Functions to Control Execution
2.5. Adding Commands to GDB
2.6. Simulators
2.7. Remote Serial Protocol (RSP)
2.7.1. RSP Client Implementation
2.7.2. RSP Server Implementation
2.8. GDB File Organization
2.9. Testing GDB
2.10. Documentation
2.11. Example Procedure Flows in GDB
2.11.1. Initial Start Up
2.11.2. The GDB target Command
2.11.3. The GDB load Command
2.11.4. The GDB break Command
2.11.5. The GDB run Command
2.11.6. The GDB backtrace Command
2.11.7. The GDB continue Command after a Breakpoint
2.12. Summary: Steps to Port a New Architecture to GDB
3. The OpenRISC 1000 Architecture
3.1. The OpenRISC 1000 JTAG Interface
3.2. The OpenRISC 1000 Remote JTAG Protocol
3.3. Application Binary Interface (ABI)
3.4. Or1ksim: the OpenRISC 1000 Architectural Simulator
4. Porting the OpenRISC 1000 Architecture
4.1. BFD Specification
4.2. OpenRISC 1000 Architecture Specification
4.2.1. Creating struct gdbarch
4.2.2. OpenRISC 1000 Hardware Data Representation
4.2.3. Information Functions for the OpenRISC 1000 Architecture
4.2.4. OpenRISC 1000 Register Architecture
4.2.5. OpenRISC 1000 Frame Handling
4.3. OpenRISC 1000 JTAG Remote Target Specification
4.3.1. Creating struct target_ops for OpenRISC 1000
4.3.2. OpenRISC 1000 Target Functions and Variables Providing Information
4.3.3. OpenRISC 1000 Target Functions Controlling the Connection
4.3.4. OpenRISC 1000 Target Functions to Access Memory and Registers
4.3.5. OpenRISC 1000 Target Functions to Handle Breakpoints and Watchpoints
4.3.6. OpenRISC 1000 Target Functions to Control Execution
4.3.7. OpenRISC 1000 Target Functions to Execute Commands
4.3.8. The Low Level JTAG Interface
4.4. The OpenRISC 1000 Disassembler
4.5. OpenRISC 1000 Specific Commands for GDB
4.5.1. The info spr Command
4.5.2. The spr Command
5. Summary
Glossary
References
Index

Chapter 1.  Introduction

This document complements the existing documentation for GDB ([3], [4], [5]). It is intended to help software engineers porting GDB to a new architecture for the first time.

This application note is based on the author's experience to date. It will be updated in future issues. Suggestions for improvements are always welcome.

1.1.  Rationale

Although the GDB project includes a 100 page guide to its internals, that document is aimed primarily at those wishing to develop GDB itself. The document also suffers from three limitations.

  1. It tends to document at a detailed level. Individual functions are described well, but it is hard to get the big picture.

  2. It is incomplete. Many of the most useful sections (for example on frame interpretation) are yet to be written.

  3. Is tends to be out of date. For example the documentation of the UI-Independent output describes a number of functions which no longer exist.

Consequently the engineer faced with their first port of GDB to a new architecture is faced with discovering how GDB works by reading the source code and looking at how other architectures have been ported.

The author of this application note went through that process when porting the OpenRISC 1000 architecture to GDB. This document captures the learning experience, with the intention of helping others.

1.2.  Target Audience

If you are about to start a port of GDB to a new architecture, this document is for you. If at the end of your endeavors you are better informed, please help by adding to this document.

If you have already been through the porting process, please help others by adding to this document.

1.3.  Further Sources of Information

1.3.1.  Written Documentation

The main user guide for GDB [3] provides a great deal of context about how GDB is intended to work.

The GDB Internals document [4] is essential reading before and during any porting exercise. It is not complete, nor is it always up to date, but it provides the first place to look for explanation of what a particular function does.

GDB relies on a separate specification of the Binary file format; for each architecture. That has its own comprehensive user guide [5].

The main GDB code base is generally well commented, particularly in the headers for the major interfaces. Inevitably this must be the definitive place to find out exactly how a particular function behaves.

The files making up the port for the OpenRISC 1000 are comprehensively commented, and can be processed with Doxygen [7]. Each function's behavior, its parameters and any return value is described.

1.3.2.  Other Information Channels

The main GDB website is at sourceware.org/gdb/. It is supplemented by the less formal GDB Wiki at sourceware.org/gdb/wiki/.

The GDB developer community communicate through the GDB mailing lists and using IRC chat. These are always good places to find solutions to problems.

The main mailing list for discussion is gdb@sourceware.org, although for detailed understanding, the patches mailing list, gdb-patches@sourceware.org. See the main GDB website for details of subscribing to these mailing lists.

IRC is channel #gdb on irc.freenode.net.

1.4.  About Embecosm

Embecosm is a consultancy specializing in open source tools, models and training for the embedded software community. All Embecosm products are freely available under open source licenses.

Embecosm offers a range of commercial services.

  • Customization of open source tools and software, including porting to new architectures.

  • Support, tutorials and training for open source tools and software.

  • Custom software development for the embedded market, including bespoke software models of hardware.

  • Independent evaluation of software tools.

For further information, visit the Embecosm website at www.embecosm.com.

Chapter 2.  Overview of GDB Internals

There are three major areas to GDB:

  1. The user interface. How GDB communicates with the user.

  2. The symbol side. The analysis of object files, and the mapping of the information contained to the corresponding source files.

  3. The target side. Executing programs and analyzing their data.

GDB has a very simple view of a processor. It has a block of memory and a block of registers. Executing code contains its state in the registers and in memory. GDB maps that information to the source level program being debugged.

Porting a new architecture to GDB means providing a way to read executable files, a description of the ABI, a description of the physical architecture and operations to access the target being debugged.

Probably the most common use of GDB is to debug the architecture on which it is actually running. This is native debugging where the architecture of the host and target are the same.

For the OpenRISC 1000 GDB is normally run on a host separate to the target (typically a workstation) connecting to the OpenRISC 1000 target via JTAG, using the OpenRISC 1000 Remote JTAG Protocol. Remote debugging in this way is the most common method of working for embedded systems.

2.1.  GDB Nomenclature

A full Glossary is provided at the end of this document. However a number of key concepts are worth explaining up front.

  • Exec or program. An executable program, i.e. a binary file which may be run independently of other programs. Commonly the term program is found in user documentation, and exec in comments and GDB internal documentation.

  • Inferior. A GDB entity representing a program or exec which has run, is running, or will run in the future. An inferior corresponds to a process or a core dump file.

  • Address space. A GDB entity which can interpret addresses (that is values of type CORE_ADDR). Inferiors must have at least one address space and inferiors may share an address space.

  • Thread. A single thread of control within an inferior.

The OpenRISC 1000 port for GDB is designed for "bare metal" debugging, so will have only a single address space and inferiors with a single thread.

2.2.  Main Functional Areas and Data Structures

2.2.1.  Binary File Description (BFD)

BFD is a package which allows applications to use the same routines to operate on object files whatever the object file format. A new object file format can be supported simply by creating a new BFD back end and adding it to the library.

The BFD library back end creates a number of data structures describing the data held in a particular type of object file. Ultimately a unique enumerated constant (of type enum bfd_architecture) is defined for each individual architecture. This constant is then used to access the various data structures associated with the BFD of the particular architecture.

In the case of the OpenRISC 1000, 32-bit implementation (which may be a COFF or ELF binary), the enumerated constant is bfd_arch_or32.

BFD is part of the binutils package. A binutils implementation must be provided for any architecture intending to support the GNU tool chain.

The OpenRISC 1000 is supported by the GNU tool chain. BFD back ends already exist which are suitable for use with 32-bit OpenRISC 1000 images in ELF or COFF format as used with either the RTEMS or Linux operating systems.

2.2.2.  Architecture Description

Any architecture to be debugged by GDB is described in a struct gdbarch. When an object file is to be debugged, GDB will select the correct struct gdbarch using information about the object file captured in its BFD.

The data in struct gdbarch facilitates both the symbol side processing (for which it also uses the BFD information) and the target side processing (in combination with the frame and target operation information).

struct gdbarch is a mixture of data values (number of bytes in an integer for example) and functions to perform standard operations (e.g. to print the registers). The major functional groups are:

  • Data values capturing details of the hardware architecture. For example the endianism and the number of bits in an address and in a word. Some of this data is captured in the BFD, to which there is a reference in the struct gdbarch. There is also a structure, struct gdbarch_tdep to capture additional target specific data, beyond that which is covered by the standard struct gdbarch.

  • Data values describing how all the standard high level scalar data structures are represented (char, int, double etc).

  • Functions to access and display registers. GDB includes the concept of "pseudo-registers", those registers which do not physically exist, but which have a meaning within the architecture. For example in the OpenRISC 1000, floating point registers are actually the same as the General Purpose Registers. However a set of floating point pseudo-registers could be defined, to allow the GPRs to be displayed in floating point format.

  • Functions to access information on stack frames. This includes setting up "dummy" frames to allow GDB to evaluate functions (for example using the call command).

An architecture will need to specify most of the contents of struct gdbarch, for which a set of functions (all starting set_gdbarch_) are provided. Defaults are provided for all entries, and in a small number of cases these will be suitable.

Analysis of the stack frames of executing programs is complex with different approaches needed for different circumstances. A set of functions to identify stack frames and analyze their contents is associated with each struct gdbarch.

A set of utility functions are provided to access the members of struct gdbarch. Element xyz of a struct gdbarch pointed to by g may be accessed by using gdbarch_xyz (g, ...). This will check, using gdb_assert that g is defined, and in the case of functions that g->x is not NULL and return either the value g->xyz (for values) or the result of calling g->xyz (...) (for functions). This saves the user testing for existence before each function call, and ensures any errors are handled cleanly.

2.2.3.  Target Operations

A set of operations is required to access a program using the target architecture described by struct gdbarch in order to implement the target side functionality. For any given architecture there may be multiple ways of connecting to the target, specified using the GDB target command. For example with the OpenRISC 1000 architecture, the connection may be directly to a JTAG interface connected through the host computer's parallel port, or through the OpenRISC 1000 Remote JTAG Protocol over TCP/IP.

These target operations are described in a struct target_ops. As with struct gdbarch this comprises a mixture of data and functions. The major functional groups are:

  • Functions to establish and close down a connection to the target.

  • Functions to access registers and memory on the target.

  • Functions to insert and remote breakpoints and watchpoints on the target.

  • Functions to start and stop programs running on the target.

  • A set of data describing the features of the target, and hence what operations can be applied. For example when examining a core dump, the data can be inspected, but the program cannot be executed.

As with struct gdbarch, defaults are provided for the struct target_ops values. In many cases these are sufficient, so need not be provided.

2.2.4.  Adding Commands to GDB

GDB's command handling is intended to be extensible. A set of functions (defined in cli-decode.h) provide that extensibility.

GDB groups its commands into a number of command lists (of struct cmd_list_element), pointed to by a number of global variables (defined in cli-cmds.h). Of these, cmdlist is the list of all defined commands. Separate lists define sub-commands of various top level commands. For example infolist is the list of all info sub-commands.

Commands are also classified according the the area they address, for example commands that provide support, commands that examine data, commands for file handling etc. These classes are specified by enum command_class, defined in command.h. These classes provide the top level categories in which help will be given.

2.3.  GDB Architecture Specification

A GDB description for a new architecture, arch is created by defining a global function _initialize_arch_tdep, by convention in the source file arch-tdep.c. In the case of the OpenRISC 1000, this function is called _initialize_or1k_tdep and is found in the file or1k-tdep.c.

The resulting object files containing the implementation of the _initialize_arch_tdep function are specified in the GDB configure.tgt file, which includes a large case statement pattern matching against the --target option of the configure command.

The new struct gdbarch is created within the _initialize_arch_tdep function by calling gdbarch_register:

void gdbarch_register (enum bfd_architecture    architecture,
                       gdbarch_init_ftype      *init_func,
                       gdbarch_dump_tdep_ftype *tdep_dump_func);
	

For example the _initialize_or1k_tdep creates its architecture for 32-bit OpenRISC 1000 architectures by calling.

gdbarch_register (bfd_arch_or32, or1k_gdbarch_init, or1k_dump_tdep);
	

The architecture enumeration will identify the unique BFD for this architecture (see Section 2.2.1). The init_func is called to create and return the new struct gdbarch (see Section 2.3). The tdep_dump_func is a function which will dump the target specific details associated with this architecture (also described in Section 2.3).

The call to gdbarch_register (see Section 2.2) specifies a function which will define a struct gdbarch for a particular BFD architecture.

struct gdbarch  gdbarch_init_func (struct gdbarch_info  info,
                                   struct gdbarch_list *arches);
	

For example, in the case of the OpenRISC 1000 architecture, the initialization function is or1k_gdbarch_init.

[Tip]Tip

By convention all target specific functions and global variables in GDB begin with a string unique to that architecture. This helps to avoid namespace pollution when using C. Thus all the MIPS specific functions begin mips_, the ARM specific functions begin arm_ etc.

For the OpenRISC 1000 all target specific functions and global variables begin with or1k_.

2.3.1.  Looking up an Existing Architecture

The first argument to the architecture initialization function is a struct gdbarch_info containing all the known information about this architecture (deduced from the BFD enumeration provided to gdbarch_register). The second argument is a list of the currently defined architectures within GDB.

The lookup is done using gdbarch_list_lookup_by_info. It is passed the list of existing architectures and the struct gdbarch_info (possibly updated) and returns the first matching architecture it finds, or NULL if none are found. If an architecture is found, the initialization function can finish, returning the found architecture as result.

2.3.1.1.  struct gdbarch_info

The struct gdbarch_info has the following components:

struct gdbarch_info
{
  const struct bfd_arch_info *bfd_arch_info;
  int                         byte_order;
  bfd                        *abfd;
  struct gdbarch_tdep_info   *tdep_info;
  enum gdb_osabi              osabi;
  const struct target_desc   *target_desc;
};
	    

bfd_arch_info holds the key details about the architecture. byte_order is an enumeration indicating the endianism. abfd is a pointer to the full BFD, tdep_info is additional custom target specific information, gdb_osabi is an enumeration identifying which (if any) of a number of operating specific ABIs are used by this architecture and target_desc is a set of name-value pairs with information about register usage in this target.

When the struct gdbarch initialization function is called, not all the fields are provided—only those which can be deduced from the BFD. The struct gdbarch_info is used as a look-up key with the list of existing architectures (the second argument to the initialization function) to see if a suitable architecture already exists. The tdep_info osabi and target_desc fields may be added before this lookup to refine the search.

2.3.2.  Creating a New Architecture

If no architecture is found, then a new architecture must be created, by calling gdbarch_alloc using the supplied struct gdbarch_info and and any additional custom target specific information in a struct gdbarch_tdep.

The newly created struct gdbarch must then be populated. Although there are default values, in most cases they are not what is required. For each element, X, there is a corresponding accessor function to set the value of that element, set_gdbarch_X.

The following sections identify the main elements that should be set in this way. This is not the complete list, but represents the functions and elements that must commonly be specified for a new architecture. Many of the functions are described in the header file, gdbarch.h and many may be found in the GDB Internals document [4].

2.3.2.1.  struct gdbarch_tdep

struct gdbarch *gdbarch_alloc (const struct gdbarch_info *info,
                               struct gdbarch_tdep       *tdep);
	    

struct gdbarch_tdep is not defined within GDB—it is up to the user to define this struct if it is needed to hold custom target information that is not covered by the standard struct gdbarch. For example with the OpenRISC 1000 architecture it is used to hold the number of matchpoints available in the target (along with other information). If there is no additional target specific information, it can be set to NULL.

2.3.3.  Specifying the Hardware Data Representation

A set of values in struct gdbarch define how different data types are represented within the architecture.

  • short_bit. Number of bits in a C/C++ short variable. Default is 2*TARGET_CHAR_BIT. TARGET_CHAR_BIT is a defined constant, which if not set explicitly defaults to 8.

  • int_bit, long_bit, long_long_bit, float_bit, double_bit, long_double_bit. These are analogous to short and are the number of bits in a C/C++ variable of the corresponding time. Defaults are 4*TARGET_CHAR_BIT for int, long and float and 4*TARGET_CHAR_BIT for long long, double and long double.

  • ptr_bit. Number of bits in a C/C++ pointer. Default is 4*TARGET_CHAR_BIT.

  • addr_bit. Number of bits in a C/C++ address. Almost always this is the same as the number of bits in a pointer, but there are a small number of architectures for which pointers cannot reach all addresses. Default is 4*TARGET_CHAR_BIT.

  • float_format, double_format and long_double_format. These point to an array of C structs (one for each endianism), defining the format for each of the floating point types. A number of these arrays are predefined. They in turn are built on top of a set of standard types defined by the library libiberty.

  • char_signed. 1 if char to be treated as signed, 0 if char is to be treated as unsigned. The default is -1 (undefined), so this should always be set.

2.3.4.  Specifying the Hardware Architecture and ABI

A set of function members of struct gdbarch define aspects of the architecture and its ABI. For some of these functions, defaults are provided which will be suitable for most architectures.

  • return_value. This function determines the return convention for a given data type. For example on the OpenRISC 1000, structs/unions and large (>32 bit) scalars are returned as references, while small scalars are returned in GPR 11. This function should always be defined.

  • breakpoint_from_pc. Returns the breakpoint instruction to be used when the PC is at a particular location in memory. For architectures with variable length instructions, the choice of breakpoint instruction may depend on the length of the instruction at the program counter. Returns the instruction sequence and its length.

    The default value is NULL (undefined). This function should always be defined if GDB is to support breakpointing for this architecture.

  • adjust_breakpoint_address. Some architectures do not allow breakpoints to be placed at all points. Given a program counter, this function returns an address where a breakpoint can be placed. Default value is NULL (undefined). The function need only be defined for architectures which cannot accept a breakpoint at all program counter locations.

  • memory_insert_breakpoint and memory_remove_breakpoint. These functions insert or remove memory based (a.k.a. soft) breakpoints. The default values default_memory_insert_breakpoint and default_memory_remove_breakpoint are suitable for most architectures, so in most cases these functions need not be defined.

  • decr_pc_after_break. Some architectures require the program counter to be decremented after a break, to allow the broken instruction to be executed on resumption. This function returns the number of bytes by which to decrement the address. The default value is NULL (undefined) which means the program counter is left unchanged. This function need only be defined if the functionality is required.

    In practice this function is only of use for the very simplest architectures. It applies only to software breakpoints, not watchpoints or hardware breakpoints. It is more usual to adjust the program counter as required in the target to_wait and to_resume functions (see Section 2.4).

  • single_step_through_delay. Returns 1 if the target is executing a delay slot and a further single step is needed before the instruction finishes. The default value is NULL (not defined). This function should be implemented if the target has delay slots.

  • print_insn. Disassemble an instruction and print it. Default value is NULL (undefined). This function should be defined if disassembly of code is to be supported.

    Disassembly is a function required by the binutils library. This function is defined in the opcodes sub-directory. A suitable implementation may already exist if binutils has already been ported.

2.3.5.  Specifying the Register Architecture

GDB considers registers to be a set with members numbered linearly from 0 upwards. The first part of that set corresponds to real physical registers, the second part to any "pseudo-registers". Pseudo-registers have no independent physical existence, but are useful representations of information within the architecture. For example the OpenRISC 1000 architecture has up to 32 general purpose registers, which are typically represented as 32-bit (or 64-bit) integers. However it could be convenient to define a set of pseudo-registers, to show the GPRs represented as floating point registers.

For any architecture, the implementer will decide on a mapping from hardware to GDB register numbers. The registers corresponding to real hardware are referred to as raw registers, the remaining registers are pseudo-registers. The total register set (raw and pseudo) is called the cooked register set.

2.3.5.1.  struct gdbarch Functions Specifying the Register Architecture

These functions specify the number and type of registers in the architecture.

  • read_pc and write_pc. Functions to read the program counter. The default value is NULL (no function available). However, if the program counter is just an ordinary register, it can be specified in struct gdbarch instead (see pc_regnum below) and it will be read or written using the standard routines to access registers. Thus this function need only be specified if the program counter is not an ordinary register.

  • pseudo_register_read and pseudo_register_write. These functions should be defined if there are any pseudo-registers (see Section 2.2.2 and Section 2.3.5.3 for more information on pseudo-registers). The default value is NULL.

  • num_regs and num_pseudo_regs. These define the number of real and pseudo-registers. They default to -1 (undefined) and should always be explicitly defined.

  • sp_regnum, pc_regnum, ps_regnum and fp0_regnum. These specify the register holding the stack pointer, program counter, processor status and first floating point register. All except the first floating-point register (which defaults to 0) default to -1 (not defined). They may be real or pseudo-registers. sp_regnum must always be defined. If pc_regnum is not defined, then the functions read_pc and write_pc (see above) must be defined. If ps_regnum is not defined, then the $ps variable will not be available to the GDB user. fp0_regnum is not needed unless the target offers support for floating point.

2.3.5.2.  struct gdbarch Functions Giving Register Information

These functions return information about registers.

  • register_name. This function should convert a register number (raw or pseudo) to a register name (as a C char *). This is used both to determine the name of a register for output and to work out the meaning of any register names used as input. For example with the OpenRISC 1000, GDB registers 0-31 are the General Purpose Registers, register 32 is the program counter and register 33 is the supervision register, which map to the strings "gpr00" through "gpr31", "pc" and "sr" respectively. This means that the GDB command print $gpr5 should print the value of the OR1K general purpose register 5. The default value for this function is NULL. It should always be defined.

    Historically, GDB always had a concept of a frame pointer register, which could be accessed via the GDB variable, $fp. That concept is now deprecated, recognizing that not all architectures have a frame pointer. However if an architecture does have a frame pointer register, and defines a register or pseudo-register with the name "fp", then that register will be used as the value of the $fp variable.

  • register_type. Given a register number, this function identifies the type of data it may be holding, specified as a struct type. GDB allows creation of arbitrary types, but a number of built in types are provided (builtin_type_void, builtin_type_int32 etc), together with functions to derive types from these. Typically the program counter will have a type of "pointer to function" (it points to code), the frame pointer and stack pointer will have types of "pointer to void" (they point to data on the stack) and all other integer registers will have a type of 32-bit integer or 64-bit integer. This information guides the formatting when displaying out register information. The default value is NULL meaning no information is available to guide formatting when displaying registers.

  • print_registers_info. Define this function to print out one or all of the registers for the GDB info registers command. The default value is the function default_print_registers_info which uses the type information (see register_type above) to determine how each register should be printed. Define this function for fuller control over how the registers are displayed.

  • print_float_info and print_vector_info. Define this function to provide output for the GDB info float and info vector commands respectively. The default value is NULL (not defined), meaning no information will be provided. Define each function if the target supports floating point or vector operations respectively.

  • register_reggroup_p. GDB groups registers into different categories (general, vector, floating point etc). This function given a register and group returns 1 (true) if the register is in the group and 0 otherwise. The default value is the function default_register_reggroup_p which will do a reasonable job based on the type of the register (see the function register_type above), with groups for general purpose registers, floating point registers, vector registers and raw (i.e not pseudo) registers.

2.3.5.3.  Register Caching

Caching of registers is used, so that the target does not need to be accessed and reanalyzed multiple times for each register in circumstances where the register value cannot have changed.

GDB provides struct regcache, associated with a particular struct gdbarch to hold the cached values of the raw registers. A set of functions is provided to access both the raw registers (with raw in their name) and the full set of cooked registers (with cooked in their name). Functions are provided to ensure the register cache is kept synchronized with the values of the actual registers in the target.

Accessing registers through the struct regcache routines will ensure that the appropriate struct gdbarch functions are called when necessary to access the underlying target architecture. In general users should use the "cooked" functions, since these will map to the "raw" functions automatically as appropriate.

The two key functions are regcache_cooked_read and regcache_cooked_write which read or write a register to or from a byte buffer (type gdb_byte *). For convenience the wrapper functions regcache_cooked_read_signed, regcache_cooked_read_unsigned, regcache_cooked_write_signed and regcache_cooked_write_unsigned are provided, which read or write the value and convert to or from a value as appropriate.

2.3.6.  Specifying Frame Handling

GDB needs to understand the stack on which local (automatic) variables are stored. The area of the stack containing all the local variables for a function invocation is known as the stack frame for that function (or colloquially just as the "frame"). In turn the function that called the function will have its stack frame, and so on back through the chain of functions that have been called.

Almost all architectures have one register dedicated to point to the end of the stack (the stack pointer). Many have a second register which points to the start of the currently active stack frame (the frame pointer). The specific arrangements for an architecture are a key part of the ABI.

A diagram helps to explain this. Here is a simple program to compute factorials:

 1:   #include <stdio.h>
 2:   
 3:   int fact( int  n )
 4:   {
 5:     if( 0 == n ) {
 6:       return 1;
 7:     }
 8:     else {
 9:       return n * fact( n - 1 );
10:     }
11:   }
12:   
13:   main()
14:   {
15:     int  i;
16:   
17:     for( i = 0 ; i < 10 ; i++ ) {
18:       int   f = fact( i );
19:       printf( "%d! = %d\n", i, f );
20:     }
21:   }
	  

Consider the state of the stack when the code reaches line 6 after the main program has called fact (3). The chain of function calls will be main, fact (3), fact (2), fact (1) and fact (0). In this example the stack is falling (as used by the OpenRISC 1000 ABI). The stack pointer (SP) is at the end of the stack (lowest address) and the frame pointer (FP) is at the highest address in the current stack frame. Figure 2.1 shows how the stack looks.

An example stack frame

Figure 2.1.  An example stack frame


In each stack frame, offset 0 from the stack pointer is the frame pointer of the previous frame and offset 4 (this is illustrating a 32-bit architecture) from the stack pointer is the return address. Local variables are indexed from the frame pointer, with negative indexes. In the function fact, offset -4 from the frame pointer is the argument n. In the main function, offset -4 from the frame pointer is the local variable i and offset -8 from the frame pointer is the local variable f.

[Note]Note

This is a simplified example for illustrative purposes only. Good optimizing compilers would not put anything on the stack for such simple functions. Indeed they might eliminate the recursion and use of the stack entirely!

It is very easy to get confused when examining stacks. GDB has terminology it uses rigorously throughout. The stack frame of the function currently executing, or where execution stopped is numbered zero. In this example frame #0 is the stack frame of the call to fact (0). The stack frame of its calling function (fact(1) in this case) is numbered #1 and so on back through the chain of calls.

The main GDB data structure describing frames is struct frame_info. It is not used directly, but only via its accessor functions. struct frame_info includes information about the registers in the frame and a pointer to the code of the function with which the frame is associated. The entire stack is represented as a linked list of struct frame_info.

2.3.6.1.  Frame Handling Terminology

It is easy to get confused when referencing stack frames. GDB uses some precise terminology.

  • THIS frame is the frame currently under consideration.

  • The NEXT frame, also sometimes called the inner or newer frame is the frame of the function called by the function of THIS frame.

  • The PREVIOUS frame, also sometimes called the outer or older frame is the frame of the function which called the function of THIS frame.

So in the example of Figure 2.1, if THIS frame is #3 (the call to fact (3)), the NEXT frame is frame #2 (the call to fact (2)) and the PREVIOUS frame is frame #4 (the call to main ()).

The innermost frame is the frame of the current executing function, or where the program stopped, in this example, in the middle of the call to fact (0)). It is always numbered frame #0.

The base of a frame is the address immediately before the start of the NEXT frame. For a falling stack this will be the lowest address and for a rising stack this will be the highest address in the frame.

GDB functions to analyze the stack are typically given a pointer to the NEXT frame to determine information about THIS frame. Information about THIS frame includes data on where the registers of the PREVIOUS frame are stored in this stack frame. In this example the frame pointer of the PREVIOUS frame is stored at offset 0 from the stack pointer of THIS frame.

The process whereby a function is given a pointer to the NEXT frame to work out information about THIS frame is referred to as unwinding. The GDB functions involved in this typically include unwind in their name.

The process of analyzing a target to determine the information that should go in struct frame_info is called sniffing. The functions that carry this out are called sniffers and typically include sniffer in their name. More than one sniffer may be required to extract all the information for a particular frame.

Because so many functions work using the NEXT frame, there is an issue about addressing the innermost frame—it has no NEXT frame. To solve this GDB creates a dummy frame #-1, known as the sentinel frame.

2.3.6.2.  Prologue Caches

All the frame sniffing functions typically examine the code at the start of the corresponding function, to determine the state of registers. The ABI will save old values and set new values of key registers at the start of each function in what is known as the function prologue.

For any particular stack frame this data does not change, so all the standard unwinding functions, in addition to receiving a pointer to the NEXT frame as their first argument, receive a pointer to a prologue cache as their second argument. This can be used to store values associated with a particular frame, for reuse on subsequent calls involving the same frame.

It is up to the user to define the structure used (it is a void * pointer) and arrange allocation and deallocation of storage. However for general use, GDB provides struct trad_frame_cache, with a set of accessor routines. This structure holds the stack and code address of THIS frame, the base address of the frame, a pointer to the struct frame_info for the NEXT frame and details of where the registers of the PREVIOUS frame may be found in THIS frame.

Typically the first time any sniffer function is called with NEXT frame, the prologue sniffer for THIS frame will be NULL. The sniffer will analyze the frame, allocate a prologue cache structure and populate it. Subsequent calls using the same NEXT frame will pass in this prologue cache, so the data can be returned with no additional analysis.

2.3.6.3.  struct gdbarch Functions to Analyze Frames

These struct gdbarch functions and value provide analysis of the stack frame and allow it to be adjusted as required.

  • skip_prologue. The prologue of a function is the code at the beginning of the function which sets up the stack frame, saves the return address etc. The code representing the behavior of the function starts after the prologue.

    This function skips past the prologue of a function if the program counter is within the prologue of a function. With modern optimizing compilers, this may be a far from trivial exercise. However the required information may be within the binary as DWARF2 debugging information, making the job much easier.

    The default value is NULL (not defined). This function should always be provided, but can take advantage of DWARF2 debugging information, if that is available.

  • inner_than. Given two frame or stack pointers, return 1 (true) if the first represents the "inner" stack frame and 0 (false) otherwise. This is used to determine whether the target has a rising or a falling stack frame. See Section 2.3.6 for an explanation of "inner" frames.

    The default value of this function is NULL and it should always be defined. However for almost all architectures one of the built-in functions can be used: core_addr_lessthan (for falling stacks) or core_addr_greaterthan (for rising stacks).

  • frame_align. The architecture may have constraints on how its frames are aligned. Given a proposed address for the stack pointer, this function returns a suitably aligned address (by expanding the stack frame). The default value is NULL (undefined). This function should be defined for any architecture where it is possible the stack could become misaligned. The utility functions align_down (for falling stacks) and align_up (for rising stacks) will facilitate the implementation of this function.

  • frame_red_zone_size. Some ABIs reserve space beyond the end of the stack for use by leaf functions without prologue or epilogue or by exception handlers (OpenRISC 1000 is in this category). This is known as a red zone (AMD terminology). The default value is 0. Set this field if the architecture has such a red zone.

2.3.6.4.  struct gdbarch Functions to Access Frame Data

These functions provide access to key registers and arguments in the stack frame.

  • unwind_pc and unwind_sp. These functions are given a pointer to THIS stack frame (see Section 2.3.6 for how frames are represented) and return the value of the program counter and stack pointer respectively in the PREVIOUS frame (i.e. the frame of the function that called this one).

  • frame_num_args. Given a pointer to THIS stack frame (see Section 2.3.6 for how frames are represented), return the number of arguments that are being passed, or -1 if not known. The default value is NULL (undefined), in which case the number of arguments passed on any stack frame is always unknown. For many architectures this will be a suitable default.

2.3.6.5.  struct gdbarch Functions Creating Dummy Frames

GDB can call functions in the target code (for example by using the call or print commands). These functions may be breakpointed, and it is essential that if a function does hit a breakpoint, commands like backtrace work correctly.

This is achieved by making the stack look as though the function had been called from the point where GDB had previously stopped. This requires that GDB can set up stack frames appropriate for such function calls.

The following functions provide the functionality to set up such "dummy" stack frames.

  • push_dummy_call. This function sets up a dummy stack frame for the function about to be called. push_dummy_call is given the arguments to be passed and must copy them into registers or push them on to the stack as appropriate for the ABI. GDB will then pass control to the target at the address of the function, and it will find the stack and registers set up just as expected.

    The default value of this function is NULL (undefined). If the function is not defined, then GDB will not allow the user to call functions within the target being debugged.

  • unwind_dummy_id. This is the inverse of push_dummy_call which restores the stack and frame pointers after a call to evaluate a function using a dummy stack frame. The default value is NULL (undefined). If push_dummy_call is defined, then this function should also be defined.

  • push_dummy_code. If this function is not defined (its default value is NULL), a dummy call will use the entry point of the target as its return address. A temporary breakpoint will be set there, so the location must be writable and have room for a breakpoint.

    It is possible that this default is not suitable. It might not be writable (in ROM possibly), or the ABI might require code to be executed on return from a call to unwind the stack before the breakpoint is encountered.

    If either of these is the case, then push_dummy_code should be defined to push an instruction sequence onto the end of the stack to which the dummy call should return.

    [Note]Note

    This does require that code in the stack can be executed. Some Harvard architectures may not allow this.

2.3.6.6.  Analyzing Stacks: Frame Sniffers

When a program stops, GDB needs to construct the chain of struct frame_info representing the state of the stack using appropriate sniffers.

Each architecture requires appropriate sniffers, but they do not form entries in struct gdbarch, since more than one sniffer may be required and a sniffer may be suitable for more than one struct gdbarch. Instead sniffers are associated with architectures using the following functions.

  • frame_unwind_append_sniffer is used to add a new sniffer to analyze THIS frame when given a pointer to the NEXT frame.

  • frame_base_append_sniffer is used to add a new sniffer which can determine information about the base of a stack frame.

  • frame_base_set_default is used to specify the default base sniffer.

These functions all take a reference to struct gdbarch, so they are associated with a specific architecture. They are usually called in the struct gdbarch initialization function, after the struct gdbarch has been set up. Unless a default has been set, the most recently appended sniffer will be tried first.

The main frame unwinding sniffer (as set by frame_unwind_append_sniffer) returns a structure specifying a set of sniffing functions:

struct frame_unwind
{
  enum frame_type            type;
  frame_this_id_ftype       *this_id;
  frame_prev_register_ftype *prev_register;
  const struct frame_data   *unwind_data;
  frame_sniffer_ftype       *sniffer;
  frame_prev_pc_ftype       *prev_pc;
  frame_dealloc_cache_ftype *dealloc_cache;
};
	    

The type field indicates the type of frame this sniffer can handle: normal, dummy (see push_dummy_call in Section 2.3), signal handler or sentinel. Signal handlers sometimes have their own simplified stack structure for efficiency, so may need their own handlers.

unwind_data holds additional information which may be relevant to particular types of frame. For example it may hold additional information for signal handler frames.

The remaining fields define functions that yield different types of information when given a pointer to the NEXT stack frame. Not all functions need be provided. If an entry is NULL, the next sniffer will be tried instead.

  • this_id determines the stack pointer and function (code entry point) for THIS stack frame.

  • prev_register determines where the values of registers for the PREVIOUS stack frame are stored in THIS stack frame.

  • sniffer takes a look at THIS frame's registers to determine if this is the appropriate unwinder.

  • prev_pc determines the program counter for THIS frame. Only needed if the program counter is not an ordinary register (see prev_pc in Section 2.3).

  • dealloc_cache frees any additional memory associated with the prologue cache for this frame (see Section 2.3.6.2).

In general it is only the this_id and prev_register functions that need be defined for custom sniffers.

The frame base sniffer is much simpler. It is a struct frame_base, which refers to the corresponding struct frame_unwind and provides functions yielding various addresses within the frame.

struct frame_base
{
  const struct frame_unwind *unwind;
  frame_this_base_ftype     *this_base;
  frame_this_locals_ftype   *this_locals;
  frame_this_args_ftype     *this_args;
};
	    

All these functions take a pointer to the NEXT frame as argument. this_base returns the base address of THIS frame, this_locals returns the base address of local variables in THIS frame and this_args returns the base address of the function arguments in this frame.

As described above the base address of a frame is the address immediately before the start of the NEXT frame. For a falling stack, this is the lowest address in the frame and for a rising stack it is the highest address in the frame. For most architectures the same address is also the base address for local variables and arguments, in which case the same function can be used for all three entries.

It is worth noting that if it cannot be determined in any other way (for example by there being a register with the name "fp"), then the result of the this_base function will be used as the value of the frame pointer variable $fp in GDB

2.4.  Target Operations

The communication with the target is down to a set of target operations. These operations are held in a struct target_ops, together with flags describing the behavior of the target. The struct target_ops elements are defined and documented in target.h. The sections following describe the most important of these functions.

2.4.1.  Target Strata

GDB has several different types of target: executable files, core dumps, executing processes etc. At any time, GDB may have several sets of target operations in use. For example target operations for use with an executing process (which can run code) might be different from the operations used when inspecting a core dump.

All the targets GDB knows about are held in a stack. GDB walks down the stack to find the set of target operations suitable for use. The stack is organized as a series of strata of decreasing importance: target operations for threads, then target operations suitable for processes, target operations to download remote targets, target operations for core dumps, target operations for executable files and at the bottom target operations for dummy targets. So GDB when debugging a running process will always select target operations from the process_stratum if available, over target operations from the file stratum, even if the target operations from the file stratum were pushed onto the stack more recently.

At any particular time, there is a current target, held in the global variable current_target. This can never be NULL—if there is no other target available, it will point to the dummy target.

target.h defines a set of convenience macros to access functions and values in the current_target. Thus current_target->to_xyz can be accessed as target_xyz.

2.4.2.  Specifying a New Target

Some targets (sets of target operations in a struct target_ops) are set up automatically by GDB—these include the operations to drive simulators (see Section 2.6 and the operations to drive the GDB Remote Serial Protocol (RSP) (see Section 2.7).

Other targets must be set up explicitly by the implementer, using the add_target function. By far the most common is the native target for native debugging of the host. Less common is to set up a non-native target, such as the JTAG target used with the OpenRISC 1000[1].

2.4.2.1.  Native Targets

A new native target is created by defining a function _initialize_arch_os_nat for the architecture, arch and operating system os, in the source file arch-os-nat.c. A fragment of a makefile to create the binary from the source is created in the file config/arch/os.mh with a header giving any macro definitions etc in config/arch/nm-os.h (which will be linked to nm.h at build time).

The _initialize_ function should create a new struct target_ops and call add_target to add this target to the list of available targets.

For new native targets there are standard implementations which can be reused, with just one or two changes. For example the function linux_trad_target returns a struct target_ops suitable for most Linux native targets. It may prove necessary only to alter the description field and the functions to fetch and store registers.

2.4.2.2.  Remote Targets

For a new remote target, the procedure is a little simpler. The source files should be added to configure.tgt, just as for the architectural description (see Section 2.3). Within the source file, define a new function _initialize_remote_arch to implement a new remote target, arch.

For new remote targets, the definitions in remote.c used to implement the RSP provide a good starting point.

2.4.3.  struct target_ops Functions and Variables Providing Information

These functions and variables provide information about the target. The first group identifies the name of the target and provides help information for the user.

  • to_shortname. This string is the name of target, for use with GDBs target. Setting to_shortname to foo means that target foo will connect to the target, invoking to_open for this target (see below).

  • to_longname. A string giving a brief description of the type of target. This is printed with the info target information (see also to_files_info below).

  • to_doc. The help text for this target. If the short name of the target is foo, then the command help target will print target foo followed by the first sentence of this help text. The command help target foo will print out the complete text.

  • to_files_info. This function provides additional information for the info target command.

The second group of variables provides information about the current state of the target.

  • to_stratum. An enumerated constant indicating to which stratum this struct target_ops belongs

  • to_has_all_memory. Boolean indicating if the target includes all of memory, or only part of it. If only part, then a failed memory request may be able to be satisfied by a different target in the stack.

  • to_has_memory. Boolean indicating if the target has memory (dummy targets do not)

  • to_has_stack. Boolean indicating if the target has a stack. Object files do not, core dumps and executable threads/processes do.

  • to_has_registers. Boolean indicating if the target has registers. Object files do not, core dumps and executable threads/processes do.

  • to_has_execution. Boolean indicating if the target is currently executing. For some targets that is the same as if they are capable of execution. However some remote targets can be in the position where they are not executing until create_inferior or attach is called.

2.4.4.  struct target_ops Functions Controlling the Target Connection

These functions control the connection to the target. For remote targets this may mean establishing and tearing down links using protocols such as TCP/IP. For native targets, these functions will be more concerned with setting flags describing the state.

  • to_open. This function is invoked by the GDB target command. Any additional arguments (beyond the name of the target being invoked) are passed to this function. to_open should establish the communications with the target. It should establish the state of the target (is it already running for example), and initialize data structures appropriately.

    This function should not start the target running if it is not currently running—that is the job of the functions (to_create_inferior and to_resume) invoked by the GDB run command.

  • to_xclose and to_close. Both these functions should close the remote connection. to_close is the legacy function. New implementations should use to_xclose which should also free any memory allocated for this target.

  • to_attach. For targets which can run without a debugger connected, this function attaches the debugger to a running target (which should first have been opened).

  • to_detach. Function to detach from a target, leaving it running.

  • to_disconnect. This is similar to to_detach, but makes no effort to inform the target that the debugger is detaching. It should just drop the connection to the target.

  • to_terminal_inferior. This function connects the target's terminal I/O to the local terminal. This functionality is not always available with remote targets.

  • to_rcmd. If the target is capable of running commands, then this function requests that command to be run on the target. This is of most relevance to remote targets.

2.4.5.  struct target_ops Functions to Access Memory and Registers

These functions transfer data to and from the target registers and memory.

  • to_fetch_registers and to_store_registers. Functions to populate the register cache with values from the target and to set target registers with values in the register cache.

  • to_prepare_to_store. This function is called prior to storing registers to set up any additional information required. In most cases it will be an empty function.

  • to_load. Load a file into the target. For most implementations, the generic function, generic_load, which is reuses the other target operations for memory access is suitable.

  • to_xfer_partial. This function is a generic function to transfer data to and from the target. Its most important function (often the only one actually implemented) is to load and store data from and to target memory.

2.4.6.  struct target_ops Functions to Handle Breakpoints and Watchpoints

For all targets, GDB can implement breakpoints and write access watchpoints in software, by inserting code in the target. However many targets provide hardware assistance for these functions which is far more efficient, and in addition may implement read access watchpoints.

These functions in struct target_ops provide a mechanism to access such functionality if it is available.

  • to_insert_breakpoint and to_remove_breakpoint. These functions insert and remove breakpoints on the target. They can choose to use either hardware or software breakpoints. However if the insert function allows use of hardware breakpoints, then the GDB command set breakpoint auto-hw off will have no effect.

  • to_can_use_hw_breakpoint. This function should return 1 (true) if the target can set a hardware breakpoint or watchpoint and 0 otherwise. The function is passed an enumeration to indicate whether watchpoints or breakpoints are being queried, and should use information about the number of hardware breakpoints/watchpoints currently in use to determine if a breakpoint/watchpoint can be set.

  • to_insert_hw_breakpoint and to_remove_hw_breakpoint. Functions to insert and remove hardware breakpoints. Return a failure result if no hardware breakpoint is available.

  • to_insert_watchpoint and to_remove_watchpoint. Functions to insert and remove watchpoints.

  • to_stopped_by_watchpoint. Function returns 1 (true) if the last stop was due to a watchpoint.

  • to_stopped_data_address. If the last stop was due to a watchpoint, this function returns the address of the data which triggered the watchpoint.

2.4.7.  struct target_ops Functions to Control Execution

for targets capable of execution, these functions provide the mechanisms to start and stop execution.

  • to_resume. Function to tell the target to start running again (or for the first time).

  • to_wait. Function to wait for the target to return control to the debugger. Typically control returns when the target finishes execution or hits a breakpoint. It could also occur if the connection is interrupted (for example by ctrl-C).

  • to_stop. Function to stop the target—used whenever the target is to be interrupted (for example by ctrl-C).

  • to_kill. Kill the connection to the target. This should work, even if the connection to the target is broken.

  • to_create_inferior. For targets which can execute, this initializes a program to run, ready for it to start executing. It is invoked by the GDB run command, which will subsequently call to_resume to start execution.

  • to_mourn_inferior. Tidy up after execution of the target has finished (for example after it has exited or been killed). Most implementations call the generic function, generic_mourn_inferior, but may do some additional tidying up.



[1] For a new remote target of any kind, the recommended approach is to use the standard GDB Remote Serial Protocol (RSP) and have the target implement the server side of this interface. The only remote targets remaining are historic legacy interfaces, such as the OpenRISC 1000 Remote JTAG Protocol.

2.5.  Adding Commands to GDB

As noted in Section 2.2, GDB's command handling is extensible. Commands are grouped into a number of command lists (of type struct cmd_list_element), pointed to by a number of global variables (defined in cli-cmds.h). Of these, cmdlist is the list of all defined commands, with separate lists defined for sub-commands of various top level commands. For example infolist is the list of all info sub-commands.