View on GitHub

jmmut

Personal blog

Comparison of C++ Command Line Interface (CLI) parsers

So given how many years C++ has on its back, you would say it should have well-thought libraries, built with the pain-hardened experience of failed attemps, right? As much as a C++ fanboy I am, I’m afraid that’s not always (or not often?) the case.

Here I’m going to summarize the good and bad design ideas of some CLI parsers I got my hands on.

Getopt

I guess this is the good old cli parser that came from C era.

Complete example taken from official docs:

#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

/* Flag set by ‘--verbose’. */
static int verbose_flag;

int
main (int argc, char **argv)
{
  int c;

  while (1)
    {
      static struct option long_options[] =
        {
          /* These options set a flag. */
          {"verbose", no_argument,       &verbose_flag, 1},
          {"brief",   no_argument,       &verbose_flag, 0},
          /* These options don’t set a flag.
             We distinguish them by their indices. */
          {"add",     no_argument,       0, 'a'},
          {"append",  no_argument,       0, 'b'},
          {"delete",  required_argument, 0, 'd'},
          {"create",  required_argument, 0, 'c'},
          {"file",    required_argument, 0, 'f'},
          {0, 0, 0, 0}
        };
      /* getopt_long stores the option index here. */
      int option_index = 0;

      c = getopt_long (argc, argv, "abc:d:f:",
                       long_options, &option_index);

      /* Detect the end of the options. */
      if (c == -1)
        break;

      switch (c)
        {
        case 0:
          /* If this option set a flag, do nothing else now. */
          if (long_options[option_index].flag != 0)
            break;
          printf ("option %s", long_options[option_index].name);
          if (optarg)
            printf (" with arg %s", optarg);
          printf ("\n");
          break;

        case 'a':
          puts ("option -a\n");
          break;

        case 'b':
          puts ("option -b\n");
          break;

        case 'c':
          printf ("option -c with value `%s'\n", optarg);
          break;

        case 'd':
          printf ("option -d with value `%s'\n", optarg);
          break;

        case 'f':
          printf ("option -f with value `%s'\n", optarg);
          break;

        case '?':
          /* getopt_long already printed an error message. */
          break;

        default:
          abort ();
        }
    }

  /* Instead of reporting ‘--verbose’
     and ‘--brief’ as they are encountered,
     we report the final status resulting from them. */
  if (verbose_flag)
    puts ("verbose flag is set");

  /* Print any remaining command line arguments (not options). */
  if (optind < argc)
    {
      printf ("non-option ARGV-elements: ");
      while (optind < argc)
        printf ("%s ", argv[optind++]);
      putchar ('\n');
    }

  exit (0);
}

Well, that was not beautiful…

Good ideas

Interesting flag handling

{"verbose", no_argument, &verbose_flag, 1}, means that if --verbose is provided, it will “automatically” fill the variable verbose_flag with that last parameter 1.

Powerful (?)

When you are the one on charge of doing things, only your laziness is the limit! It allows flags, parameters with arguments (required or not) and standalone arguments; you can do whatever you want with them.

Teaching good practices (?)

Some hours of trial and error can save you some minutes of reading documentation.

This is also a downside, but this library teaches you to look for the documentation as if it were air to breathe. When you first encounter an example of usage of this library without reading the documentation, there are so many questions, like:

Those are not complex concepts once you read the docs, and noob programmers should learn soon that docs are essential to programming, but honestly, good code should be low on WTFs per minute, even when you didn’t start reading the docs yet.

Bad ideas

That fat-ass switch.

You know, switch is not the best feature of programming languages, especially old ones.

Global hidden state.

It’s a switch within a while(1), so each iteration will go through one of the cases of the switch. The parameter you can handle in each iteration is provided by c = getopt_long (argc, argv, "abc:d:f:", long_options, &option_index);. c is the one-char parameter name, and option_index is modified to point to a row in long_options. So each time you call that function (with the same parameters) you will get a different result.

Moreover, this is not all the information needed. The library provides some global variables to pass parameter values, unexpected input, further details and error codes.

Getopt Summary

To be fair, with plain C these issues are quite unavoidable, it’s an OK solution for those early days, but nowadays we should know better.

TCLAP

Again, official example:

#include <string>
#include <iostream>
#include <algorithm>
#include <tclap/CmdLine.h>

int main(int argc, char** argv)
{

	// Wrap everything in a try block.  Do this every time, 
	// because exceptions will be thrown for problems.
	try {  

	// Define the command line object, and insert a message
	// that describes the program. The "Command description message" 
	// is printed last in the help text. The second argument is the 
	// delimiter (usually space) and the last one is the version number. 
	// The CmdLine object parses the argv array based on the Arg objects
	// that it contains. 
	TCLAP::CmdLine cmd("Command description message", ' ', "0.9");

	// Define a value argument and add it to the command line.
	// A value arg defines a flag and a type of value that it expects,
	// such as "-n Bishop".
	TCLAP::ValueArg<std::string> nameArg("n","name","Name to print",true,"homer","string");

	// Add the argument nameArg to the CmdLine object. The CmdLine object
	// uses this Arg to parse the command line.
	cmd.add( nameArg );

	// Define a switch and add it to the command line.
	// A switch arg is a boolean argument and only defines a flag that
	// indicates true or false.  In this example the SwitchArg adds itself
	// to the CmdLine object as part of the constructor.  This eliminates
	// the need to call the cmd.add() method.  All args have support in
	// their constructors to add themselves directly to the CmdLine object.
	// It doesn't matter which idiom you choose, they accomplish the same thing.
	TCLAP::SwitchArg reverseSwitch("r","reverse","Print name backwards", cmd, false);

	// Parse the argv array.
	cmd.parse( argc, argv );

	// Get the value parsed by each arg. 
	std::string name = nameArg.getValue();
	bool reverseName = reverseSwitch.getValue();

	// Do what you intend. 
	if ( reverseName )
	{
		std::reverse(name.begin(),name.end());
		std::cout << "My name (spelled backwards) is: " << name << std::endl;
	}
	else
		std::cout << "My name is: " << name << std::endl;


	} catch (TCLAP::ArgException &e)  // catch any exceptions
	{ std::cerr << "error: " << e.error() << " for arg " << e.argId() << std::endl; }
}

Good ideas

Easier handling

The API seems easier to understand. We declare a cmd that will hold the parameters, we declare a <variable>Arg with the configuration of each parameter, we parse the cmd, and then we get the results into the corresponding variables. Also catching any error with exceptions.

In fact, the next lines are doing the same as the previous switch in getopt:

	std::string name = nameArg.getValue();
	bool reverseName = reverseSwitch.getValue();

Type safety

With getopt all we had were strings, but now we have templates and for basic types the parsing is done automatically.

Good automatic help

The example doesn’t provide the --help parameter, but it will work as expected if it’s used. This is a good design in my opinion. Providing sensible defaults so that you only need to add code when you want to do uncommon things. I would say python and the spring framework follow this idea.

It even asks for a human readable string to explain what type is expected, which I have used sometimes to put the units and/or range of the parameter, e.g. “seconds (up to 3600)” in parameter:

    TCLAP::ValueArg<double> durationArg("d", names[DURATION], "duration of the song",
            false, defaultDuration, "seconds (up to 3600)", cmd);

which is shown in the help as:

-d <seconds (up to 3600)>, --duration <seconds (up to 3600)>
    duration of the song

Not bad as default help message.

Bad ideas

Not showing default values

The reason why I stopped using TCLAP was not being able to show the default value I provided. In the previous durationArg parameter example, I used double defaultDuration = 20;, but I found no easy way to show in the help message this “20”.

I could build the description string with the variable like "duration of the song (default " + std::to_string(defaultDuration) + ")", but it seems unreasonably complex for such a relevant feature in a CLI parser.

I also could have hardcoded the value in the description string, but I have a condition that prevents me from that: my stomach hurts when one conceptual change involves changing several parts of the code.

Too many parameters

For most parameter definitions, it feels that some variables are just boilerplate. Have you ever used a command where all the parameters were required? Then, why not making them optional by default? Instead, TCLAP requires that false (or true) in all parameters.

Also, adding the cmd parameter seems like could be avoided with other design, although this might be more difficult, as it was clearly in favour of direct access to the Arg variables (such as durationArg in my example).

TCLAP summary

Not bad, not perfect, easy to integrate (only one header) and fast to learn and use. You can tweak some things, like how the help message is written, but it’s not so easy and non-trivial work is needed.

Boost.Program_options

Official example, complete code in github:

// Copyright Vladimir Prus 2002-2004.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt
// or copy at http://www.boost.org/LICENSE_1_0.txt)

/* The simplest usage of the library.
 */

#include <boost/program_options.hpp>
namespace po = boost::program_options;

#include <iostream>
#include <iterator>
using namespace std;

int main(int ac, char* av[])
{
    try {

        po::options_description desc("Allowed options");
        desc.add_options()
            ("help", "produce help message")
            ("compression", po::value<double>(), "set compression level")
        ;

        po::variables_map vm;        
        po::store(po::parse_command_line(ac, av, desc), vm);
        po::notify(vm);    

        if (vm.count("help")) {
            cout << desc << "\n";
            return 0;
        }

        if (vm.count("compression")) {
            cout << "Compression level was set to " 
                 << vm["compression"].as<double>() << ".\n";
        } else {
            cout << "Compression level was not set.\n";
        }
    }
    catch(exception& e) {
        cerr << "error: " << e.what() << "\n";
        return 1;
    }
    catch(...) {
        cerr << "Exception of unknown type!\n";
    }

    return 0;
}

Good ideas

Terse parameter configuration

Here we can see basic configuration in action, working with sensible defaults.

If we want to make a parameter required, we can add it po::value<double>()->required().

If we want to add default values, we can add it: po::value<double>()->default_value(defaultCompression).

If we want to fill our variable directly, we can add it: po::value<double>(&compression).

Flexibility

With Program Options we don’t need a dedicated variable, we can use vm[].count() and/or vm[].as<>(), but we are also able to use a variable directly as po::value<double>(&compression) and not use the count nor as. I saw some other minor flexibility features.

Showing default values (yes!)

The next parameter:

desc.add_options()
        ("duration,d",
                po::value<double>(&duration)->default_value(defaultDuration),
                "duration in seconds (up to 3600)")

is shown as:

  -d [ --duration ] arg (=20)           duration in seconds (up to 3600)

I didn’t see an easy way to put the value description (such as units) instead of that not-so-clear “arg”, so I had to put that in the description, but it did show that “20” was the default value, without configuring anything.

More features

Although I haven’t dug in the details, Program Options seems more complete than TCLAP. For instance, PO allows taking parameters from config files and merge the parameters with the CLI ones.

Bad ideas

Expert friendly

Most Boost libraries err a bit on the side of making the library too difficult to use even in the basic use case. Compare the 4 straightforward steps of TCLAP:

    TCLAP::CmdLine cmd(...);
    TCLAP::SwitchArg reverseSwitch(..., cmd);
    cmd.parse( argc, argv );
    bool reverseName = reverseSwitch.getValue();

with the PO workflow:

    po::options_description desc(...);
    desc.add_options()
        ("compression", po::value...)
    ;

    po::variables_map vm;
    po::store(po::parse_command_line(ac, av, desc), vm);
    po::notify(vm);

    // use vm["compression"], or the variable compression if set with po::value<double>(&compression)

On the edge of abusing operator overload

Operator overloading is like a “pay what you can” store. It’s cool and less cumbersome, but it’s too easy to abuse the system.

Once you understand that the operator() in desc.add_options()(...) is only constructing an option and adding it to the options_description, it’s not a big deal, but wouldn’t it be more intuitive a desc.add_options().add(...)? it’s 4 characters, and it would fit common ideas like the design pattern builder.

Encourages char arrays instead of std::string

I agree this is not a big issue, but it’s been 2 projects already when I found this a bit annoying. The operator() asks only for char* and not std::string, so if you need to compose the strings for the parameter names or the description, you will have to use either char arrays or lots of .c_str(). This is how it might look.

desc.add_options()
        ((names[DURATION] + ",d").c_str(),
                po::value<double>(&duration)->default_value(defaultDuration),
                ("duration in seconds (up to " + std::to_string(MAX_DURATION) + ")").c_str())

Program Options summary

The bazooka solution. It will require other parts of Boost. Read the documentation with no hurries and hope you don’t need to tweak it.