Utilizing TableGen for Non-Compiling Processes

When porting any compiler, one of the large pieces of code is the machine description, defining instructions, registers, calling conventions, etc. In LLVM, this is done via TableGen, a simple record based DSL that allows all information about an architecture to be built rapidly and concisely. However, its generality and simple design appears to naturally allow expansion beyond the compilation space to other areas.

So what areas are ripe for simplification where lots of information is inherited, but ultimately is a set of records? SSH configurations for one. As such, I forked the llvm-tblgen tool for other purposes. This blog post explains how I use TableGen for this, and how the backend is implemented.

Most TableGen backends work on by searching for all records that are derived from a particular class. My SSH backend is no exception, which uses the following base class. It does not (yet) support all possible SSH configuration directives, but covers all the common ones that I usually find in a user’s SSH configuration:

class SSH {
string User = "";
string HostName = "";
int Port = -1;
string IdentityFile = "";
string ProxyCommand = "";
string PreferredAuthentications = "";
int ServerAliveInterval = -1;
string Compression = "";
list Aliases = [];
}

As SSH allows you to specify settings for multiple connections at once, I use the Aliases field to specify these as shorthand aliases for the main connection. When creating a config in llvm-othergen, the default name for a connection is the builtin NAME variable, which is the name of the record (what is provided after the def keyword).

Many SSH configs have a Host * block to set options for all connections, including those without a correspond host block. My SSH backend supports this through the specially named Common record. This can be used for example to specify my preferred authentication method and ServerAliveInterval:

def Common : SSH {
let Compression = "yes";
let ServerAliveInterval = 60;
let Port = 22;
let PreferredAuthentications = "publickey,password";
}

Now where TableGen’s power comes in is defining multiple records at once. I do this for instance where I have multiple machines in one network that I want to connect to, but only one which has its SSH port exposed. At the same time I do not want to proxy my connection when I’m on the network locally.

To achieve this, I first declare a multiclass that defines two records, one for the local and one for the remote case, such as the following:

multiclass mymachine {
def local : SSH {
let User = "simon";
let HostName = IP;
let IdentityFile = "~/.ssh/idmykey";
let Aliases = [NAME, alias];
}
def remote : SSH {
let User = "simon";
let HostName = IP;
let IdentityFile = "~/.ssh/idmykey";
let ProxyCommand = "ssh -q proxyserver.example.net nc %h %p";
let Aliases = [!strconcat(NAME, "r"), !strconcat(alias, "r")];
}
}

Now every time I create a version of this class, two records are created, the first (NAME_local) connects to the machine locally via IP, and the second running netcat on the proxyserver to give us connectivity to IP. I use the alias field to give shorthand aliases to the machine, appending an r to the end to indicate the remote version.

To create a new block of connections, I just write a defm command for each machine I want to SSH into via this method:

defm alice : mymachine<"192.168.2.2", "a">;
defm bob : mymachine<"192.168.2.3", "b">;

Running this through llvm-othergen -ssh-config then generates the following blocks:

Host alicelocal alice a
User simon
HostName 192.168.2.2
IdentityFile ~/.ssh/idmykey

Host aliceremote alicer ar
User simon
HostName 192.168.2.2
IdentityFile ~/.ssh/idmykey
ProxyCommand ssh -q proxyserver.example.net nc %h %p

Host boblocal bob b
User simon
HostName 192.168.2.3
IdentityFile ~/.ssh/idmykey

Host bobremote bobr br
User simon
HostName 192.168.2.3
IdentityFile ~/.ssh/idmykey
ProxyCommand ssh -q proxyserver.example.net nc %h %p

Now I can SSH into alice or bob using the shorthand aliases a, b, arbr, and more importantly, I can easily expand this list without copying and pasting large blocks of SSH config.

So how is this implemented in TableGen? Most backends work on all records of a particular class, in this case SSH, so we look at all records of this type:

std::vector<Record*> Cfgs = Records.getAllDerivedDefinitions("SSH");

For each of these records, I then print the various defined records. All data stored in the records by my backend are either integers or strings, with the directive to appear in the SSH config sharing the name of the TableGen field name. To keep the output file small, I also only want to print fields if they are defined, that is set differently to the default value of “” or -1. With a little help from the preprocessor, I can construct a EmitSSHRecord class as follows:

#define EmitConditionalInt(x) do { \
    auto x = Cfg->getValueAsInt(#x);\
    if (x != -1) \
      O << "  " << #x << " " << x << "\n"; \   } while(0) #define EmitConditionalString(x) do { \     auto x = Cfg->getValueAsString(#x);\
    if (x != "") \
      O << "  " << #x << " " << x << "\n"; \   } while(0) void SSHConfigEmitter::EmitSSHConfig(Record *Cfg, raw_ostream &O,                                      bool PrintHost) {   if (PrintHost) {     auto aliases = Cfg->getValueAsListOfStrings("Aliases");
    O << "Host " << Cfg->getName();
    for (auto I = aliases.begin(), E = aliases.end(); I != E; ++I)
      O << " " << *I;
    O << "\n";
  }
  EmitConditionalString(User);
  EmitConditionalString(HostName);
  EmitConditionalInt(Port);
  EmitConditionalString(IdentityFile);
  EmitConditionalString(ProxyCommand);
  EmitConditionalString(PreferredAuthentications);
  EmitConditionalString(Compression);
  EmitConditionalInt(ServerAliveInterval);
  O << "\n";
}

After adding the code needed for llvm-othergen to use this backend, I can now maintain a much simpler SSH configuration format and just re-run the backend when I have made any changes.

Now SSH is just one case where an inheriting record language like TableGen would come in useful, and being able to write new backends would come in useful. In the future, I plan to expand my tblgen fork with a more script-friendly expansion mechanism (for example via some Python bindings). In the mean time, llvm-othergen exists for expanding into the non-compilation TableGen space.

The code for llvm-othergen can be found on GitHub. Note that this should be built statically and not the whole compiler.