Compile-Time and Runtime-Safe Replacement for “printf”

January 29, 2015

tags: ,
14 comments

C++ 11 is truly beautiful. And one of the ways in which it is beautiful is how you can implement a compile-time and runtime-safe version of the popular C runtime function printf.

Originally, printf was implemented as a variadic function that parses its first argument (a format string) to determine how many additional arguments it should read from the stack. printf has no way of knowing how many parameters were actually provided; any mismatch means a nasty exception at runtime, or, worse, undesired output. For example:

printf("%d");
printf("%s");
printf("%d %d", 42);
printf("hmpf", 42);

The first version will probably work and print some arbitrary stack value that happens to be after the format string parameter. The second version will probably crash because it tries to interpret that arbitrary stack value as a pointer to a string, which it is unlikely to be. The third version will probably print 42 followed by some stack garbage. And finally, the fourth version will probably print “hmpf”, ignoring its argument completely.

But that’s not all. printf doesn’t know the types of the arguments — it relies on the format string to determine what they are. This can lead to undesired results again. For example:

printf("%d", static_cast<unsigned long long>(-1));
printf("%f", 39812);
printf("%s", std::string("hello"));
printf("%s", 42);

In this case, the first version will probably print the lower or upper half (32 bits) of the 64-bit value provided. The second version will print something which barely resembles 39,812 (I’ve just checked and my clang 3.5 prints 0.000000). The third version might actually work (!) if std::string has a pointer to the string as its first field, but otherwise it would fail spectacularly. And finally, the fourth version will try to interpret the number 42 as a pointer to a string, which will likely crash.

Now that we’re all agreed printf is a horrible function, let’s see if we can make it any better. First, C++ 11 has variadic templates, so we can build a template function that takes an arbitrary number of arguments without losing type safety, and while still keeping track of how many arguments were provided! For example:

template <typename... Types>
unsigned count_args(Types&&...)
{
  return sizeof...(Types);
}

This function clearly knows how many arguments were provided. Plus, because it is a template, the arguments aren’t coerced to arbitrary stack locations — we can preserve their type. Let’s see how we can make use of this feature to safely print a list of arguments provided by the user. Our API is going to be the following:

safe_printf("x = %\n", x);

For the sake of sticking to convention, I am going to use ‘%’ as the format specifier. The ‘%’ indicator can be escaped by putting two of them in a row: safe_printf(“You scored % %% on the test”, 100) prints “You scored 100 % on the test”. Now, to actually print the arguments we are going to use the standard variadic template trick — the function takes the head argument explicitly and the rest of the parameter pack is a tail that is forwarded to a “recursive” invocation. (I’m putting “recursive” in quotes here because, strictly speaking, we are calling a different instantiation of the same function template; so it’s not really recursion because a function isn’t calling itself.) Here goes:

constexpr char FORMAT_SPECIFIER = '%';

unsigned count_format_specifiers(std::string const& format)
{
  unsigned count = 0;
  for (size_t i = 0; i < format.size(); ++i)
  {
    if (format[i] == FORMAT_SPECIFIER)
    {
      if ((i != format.size() - 1) && (format[i+1] == FORMAT_SPECIFIER))
        ++i; // skip next % as well, it was consumed by this one
      else
        ++count;
    }
  }
  return count;
}

void safe_printf(std::string const& format)
{
  if (count_format_specifiers(format) != 0)
    throw std::invalid_argument("number of arguments doesn't match the format string");
  std::cout << format; // TODO: take care of escaped format specifiers
}

template <typename Head, typename... Tail>
void safe_printf(std::string const& format, Head&& head, Tail&&... tail)
{
  if (count_format_specifiers(format) != sizeof...(Tail) + 1)
    throw std::invalid_argument("number of arguments doesn't match the format string");
  // TODO: take care of escaped format specifiers
  auto first_format_pos = format.find_first_of(FORMAT_SPECIFIER);
  std::cout << format.substr(0, first_format_pos);
  std::cout << head;
  safe_printf(format.substr(first_format_pos+1), std::forward<Tail>(tail)...);
}

OK, so we have a base case that isn’t even a template. It simply takes a string and prints it out, given that it doesn’t contain any leftover format specifiers. Next, the template version takes a head and a tail, prints the head, and forwards the rest of the arguments to the “recursive” invocation. We also verify that the number of format specifiers is exactly equal to the number of arguments provided (1 for head plus the size of the tail).

At this point, we have a functional version that does parameter checking at runtime. For example, the following calls will throw an exception:

safe_printf("%");
safe_printf("% %", 42);
safe_printf("", 42);

But still, we’re not providing any help at compile-time. Wouldn’t it be awesome if we could generate a compile-time error if safe_printfย was called with the wrong number of arguments? It requires compile-time string parsing, which sounds infeasible, but it actually is — with constexpr functions. Take a look at this:

struct constexpr_string
{
  unsigned size_;
  char const* string_;

  template <unsigned N>
  constexpr constexpr_string(char const(&str)[N])
    : size_(N), string_(str)
  {
  }
};

This beautiful struct captures a compile-time string literal and stores it along with its size. The constructor treats the string literal as an array of characters, which lets it determine the size of the string. Armed with this tool, we can add a member function that counts the number of times a particular character (such as a format specifier!) appears in the string:

constexpr unsigned count_format_specifiers() const
{
  unsigned count = 0;
  for (auto i = 0; i < size_; ++i)
  {
    if (string_[i] == FORMAT_SPECIFIER)
    {
      if ((i != size_ - 1) && (string_[i+1] == FORMAT_SPECIFIER))
        ++i; // skip next % as well, it was consumed by this one
      else
        ++count;
    }
  }
  return count;
}

Note that I’m relying on C++14, which allows variable declarations, loops, and mutable variables in constexpr functions. This function can be obviously rewritten using recursion and a single statement, but it would be a lot more complex to read.

Now that we have this tool, we need to somehow generate statements of the form:

static_assert(2 == constexpr_string("% %").count_format_specifiers(), "error");

This will make sure that the format string provided by the user has the right number of format specifiers. At compile-time. All we need is… variadic macros.

If you’re thinking: Macros? In C++11? Yuck! — you’re absolutely right. But I don’t see a clean solution to this problem that doesn’t involve a macro. And it’s a relatively harmless one:

template <typename... Types>
constexpr unsigned sizeof_args(Types&&...)
{
  return sizeof...(Types);
}

#define SAFE_PRINTF(format, ...) \
  static_assert(constexpr_string(format).count_format_specifiers() \
    == sizeof_args(__VA_ARGS__), \
    "number of arguments doesn't match the format string"); \
  safe_printf(format, ##__VA_ARGS__);

Look at this beauty! SAFE_PRINTF takes a literal format string followed by an arbitrary number of arguments. That format string is forwarded to constexpr_string::count_format_specifiers, which returns the number of format specifiers in the string. The sizeof_args helper gives us the number of arguments provided by the user, as a compile-time value. To forward the arguments to sizeof_args and safe_printf, we use the C11 __VA_ARGS__ magic, which evaluates to a comma-separated list of the macro’s parameters. Note that for the safe_printf call we use ##__VA_ARGS__, which is a non-standard extension that erases the leading comma if the macro argument list is empty.

The end result is that we get compile-time errors from invalid uses of SAFE_PRINTF. For example (error messages by clang 3.5):

SAFE_PRINTF("%", 17, 42);

main.cpp:109:5: error: static_assert failed "number of arguments doesn't match the format string"
  SAFE_PRINTF("hello, %", 42, 17);
<edited for brevity> 
1 error generated.

I think this is a pretty incredible result. We can parse literal strings at compile-time and generate detailed errors when a format string doesn’t match; and when passed a string that isn’t a compile-time constant, we do runtime verification anyway. And we don’t need any compiler magic or support — just a combination of variadic templates, constexpr functions, and variadic macros.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

14 comments

  1. nikitablackJanuary 30, 2015 ื‘ 3:00 PM

    Thank you. That was very informative. But there’re problems with escaped ‘%’ – it doesn’t work. There’s no logic to handle it in non-template function. And template function handles it incorrectly.

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinJanuary 30, 2015 ื‘ 3:39 PM

      Thanks, I forgot to take care of it and frankly the additional code required to handle it isn’t really important for the purpose of the post. Therefore, I’ve added a TODO comment ๐Ÿ™‚

      Reply
  2. PetrJanuary 30, 2015 ื‘ 8:10 PM

    An additional problem with the % is that the escaping doesn’t really work. %%% could be interpreted two ways, and meanwhile what I might have had in mind was “format three arguments without spaces”. But I understand that’s completely tangential to the main point ๐Ÿ™‚

    Reply
  3. JonFebruary 2, 2015 ื‘ 2:43 AM

    How does it perform? Previous attempts at type-safe replacements for sprintf have been disappointing in terms of performance (boost::format, iostreams).

    I note you’re using std::cout here which may have the same issue.

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinFebruary 3, 2015 ื‘ 11:50 AM

      If you don’t like cout, you’re free to use something else for character output. The core doesn’t have any runtime performance cost at all — the “recursive” invocations of safe_printf will be flattened by any decent compiler. And obviously the SAFE_PRINTF macro has no runtime cost as well.

      Reply
    2. AlexFebruary 5, 2015 ื‘ 10:04 AM

      Check this article and a library which provides both good performance and type safety:

      http://www.codeproject.com/Articles/159910/Extremely-Efficient-Type-safe-printf-Library

      Reply
  4. adiq_ambroFebruary 3, 2015 ื‘ 5:43 PM

    Ironically, you’ve got a typo in “safe_printf(โ€œYou scored % %% on the testโ€, 100)” ๐Ÿ™‚

    Reply
    1. adiq_ambroFebruary 3, 2015 ื‘ 5:48 PM

      Oh, I didn’t catch at first that it’s your function. Please ignore that comment ๐Ÿ™‚

      Reply
  5. DorinFebruary 5, 2015 ื‘ 6:36 AM

    It’s nice, valiant effort, but being C++, I think that people should not shy away from cout and stream functions in general.

    There are a few problems with this: you can’t really format the output (how would %08x look like?) and it’s “a bit harder” to have a customizable format that is not a constexpr. But other than that, it’s a nice effort.

    Reply
  6. ksFebruary 5, 2015 ื‘ 9:51 AM

    In my opinion this is a non-issue, when using a compiler which supports an appropriate warning option (like -Wformat for gcc, clang). gcc even provides function attributes to introduce printf-like custom functions to the compiler. Paired with “-Werror” this is pretty safe. We use it for years Anything the like in VC++?

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinFebruary 13, 2015 ื‘ 3:46 PM

      Static code analysis is good, but it doesn’t generalize. Suppose tomorrow you need something similar for your own format string format, or something else entirely. You’re not going to build static analysis tools just for enforcing your format, right? But you can easily do so with the constexpr approach shown in the post.

      Reply
  7. qweFebruary 5, 2015 ื‘ 2:19 PM

    What about gcc’s -Wformat option? It provides same functionality, I believe.

    Reply
  8. Alex ShFebruary 6, 2015 ื‘ 12:12 AM

    You could go one step further and convert argument packs to POD structs and store them in a binary file. This would give you a fast logging with a compact (binary) layout. You basically store every format string once and then store binary structs along with a format string id.

    Reply
  9. Pingback: Materials From This Year’s First SDP | All Your Base Are Belong To Us