4.4. Encoding Instructions

Instructions are encoded through an implementation of the MCCodeEmitter class, which uses information about the target and encodes instructions through the EncodeInstruction function, streaming encoded bytes through a provided output stream.

	Note
	Information about this class can be found in LLVM's documentation at llvm.org/docs/doxygen/html/classllvm_1_1MCCodeEmitter.html

This class requires only the EncodeInstruction function to be defined. However other functions should also be defined to assist in encoding instructions.

The key function here is getBinaryCodeForInstr, which is generated by TableGen. It takes a provided instruction and with the instruction encoding information defined with the instructions to generate the encoded instruction.

  // getBinaryCodeForInstr - TableGen'erated function for getting the
  // binary encoding for an instruction.
  uint64_t getBinaryCodeForInstr(const MCInst &MI) const;

Other functions that are defined in this class (though not necessary), are support functions for outputting a number of bytes and constants (encoded instruction) with the correct endianness for the target. Examples of these functions are provided below.

In addition, functions are required to assist in encoding custom operand types, should their encoding not be known from their TableGen definition.

  // Emit one byte through output stream (from MCBlazeMCCodeEmitter)
  void EmitByte(unsigned char C, unsigned &CurByte, raw_ostream &OS) const {
    OS << (char)C;
    ++CurByte;
  }

  // Emit a series of bytes (from MCBlazeMCCodeEmitter)
  void EmitConstant(uint64_t Val, unsigned Size, unsigned &CurByte,
                    raw_ostream &OS) const {
    assert(Size <= 8 && "size too big in emit constant");

    for (unsigned i = 0; i != Size; ++i) {
      EmitByte(Val & 255, CurByte, OS);
      Val >>= 8;
    }
  }

The generated getBinaryCodeForInstr function requires one other function to be defined, getMachineOpValue, which provides the encoding for the default operand types (registers and immediates) where no relocation is required.

This function first checks the type of operand. If it is a register, then the custom function for retrieving register numbers defined above is called to get the encoding of this register. If instead it is an immediate, then it is cast to an unsigned value which is then returned.

Finally, if the operand is an expression (which is the case where relaxation is required), then information about this relocation is stored in a fixup, with 0 being returned as the encoding at this point.

unsigned OR1KMCCodeEmitter::
getMachineOpValue(const MCInst &MI, const MCOperand &MO,
                  SmallVectorImpl<MCFixup> &Fixups) const {
  if (MO.isReg())
    return getOR1KRegisterNumbering(MO.getReg());
  if (MO.isImm())
    return static_cast<unsigned>(MO.getImm());
  
  // MO must be an expression
  assert(MO.isExpr());

  const MCExpr *Expr = MO.getExpr();
  MCExpr::ExprKind Kind = Expr->getKind();

  if (Kind == MCExpr::Binary) {
    Expr = static_cast<const MCBinaryExpr*>(Expr)->getLHS();
    Kind = Expr->getKind();
  }

  assert (Kind == MCExpr::SymbolRef);

  OR1K::Fixups FixupKind = OR1K::Fixups(0);

  switch(cast<MCSymbolRefExpr>(Expr)->getKind()) {
    default: llvm_unreachable("Unknown fixup kind!");
      break;
    case MCSymbolRefExpr::VK_OR1K_PLT:
      FixupKind = OR1K::fixup_OR1K_PLT26;
      break;
    ... other cases not shown ...
  }

  // Push fixup (all info is contained within)
  Fixups.push_back(MCFixup::Create(0, MO.getExpr(), MCFixupKind(FixupKind)));
  return 0;
}

The main EncodeInstruction function simply takes a provided instruction, passing it to the TableGen getBinaryCodeForInstr function, returning the encoded version of the instruction, which is then emitted via the support functions for outputting constants which were defined above.

	Note
	In the case of OpenRISC 1000 , all instructions are 32 bits in length, therefore the same amount of data is outputted in all cases. Where an instruction set has variable length instructions, then testing the opcode would be required to determine the length of instruction to emit.

void OR1KMCCodeEmitter::
EncodeInstruction(const MCInst &MI, raw_ostream &OS,
                         SmallVectorImpl<MCFixup> &Fixups) const {
  // Keep track of the current byte being emitted
  unsigned CurByte = 0;
  // Get instruction encoding and emit it
  ++MCNumEmitted;       // Keep track of the number of emitted insns.
  unsigned Value = getBinaryCodeForInstr(MI);
  EmitConstant(Value, 4, CurByte, OS);
}