One thing that has always worried me with writing C samples in documentation for my single-header libraries is that you can’t be 100% sure that they will compile successfully. You can always extract them and run them, but you might change them and forget to re-test. Having this be automatic is so powerful. Rust has this feature built in to the ecosystem so that all code samples are testing automagically.

So I wondered - is there anyway to do something similar for my single-header libraries? I decided to try with json.h.

The Code Samples

I want to have more and more code samples to show how easy the library is to use, but the big one there currently is:

 const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
 struct json_value_s* root = json_parse(json, strlen(json));
 assert(root->type == json_type_object);

 struct json_object_s* object = (struct json_object_s*)root->payload;
 assert(object->length == 2);

 struct json_object_element_s* a = object->start;

 struct json_string_s* a_name = a->name;
 assert(0 == strcmp(a_name->string, "a"));
 assert(a_name->string_size == strlen("a"));

 struct json_value_s* a_value = a->value;
 assert(a_value->type == json_type_true);
 assert(a_value->payload == NULL);

 struct json_object_element_s* b = a->next;
 assert(b->next == NULL);

 struct json_string_s* b_name = b->name;
 assert(0 == strcmp(b_name->string, "b"));
 assert(b_name->string_size == strlen("b"));

 struct json_value_s* b_value = b->value;
 assert(b_value->type == json_type_array);

 struct json_array_s* array = (struct json_array_s*)b_value->payload;
 assert(array->length == 3);

 struct json_array_element_s* b_1st = array->start;

 struct json_value_s* b_1st_value = b_1st->value;
 assert(b_1st_value->type == json_type_false);
 assert(b_1st_value->payload == NULL);

 struct json_array_element_s* b_2nd = b_1st->next;

 struct json_value_s* b_2nd_value = b_2nd->value;
 assert(b_2nd_value->type == json_type_null);
 assert(b_2nd_value->payload == NULL);

 struct json_array_element_s* b_3rd = b_2nd->next;
 assert(b_3rd->next == NULL);

 struct json_value_s* b_3rd_value = b_3rd->value;
 assert(b_3rd_value->type == json_type_string);

 struct json_string_s* string = (struct json_string_s*)b_3rd_value->payload;
 assert(0 == strcmp(string->string, "foo"));
 assert(string->string_size == strlen("foo"));

 /* Don't forget to free the one allocation! */
 free(root);

I want to be able to parse the README.md, extract the code samples, turn them into a test for use with my utest.h library will run. I already use CMake for building just the unit-tests, and so given I already use it and I’m pretty familiar with it (despite its glaring flaws), I wondered if I could use it to do the extraction.

CMake of Horrors

So CMake has built-in regex string support so I thought could I use that to do the extraction? The one big issue is that CMake only supports greedy matching of regex - meaning that I have to be super careful when searching for start/end tokens with which to match.

First of all we need to read the whole file into a CMake variable:

file(READ ${CMAKE_CURRENT_SOURCE_DIR}/../README.md readme_md)

CMake has this wonderfully messed up method for differentiating between strings and lists - where a list in CMake terminology is just a string that has semi-colons within it. The problem is that code samples in languages like C use semi-colons as end of statement terminators - which will cause us issues. The best way I’ve found around this is to change the semi-colons to some symbol that wouldn’t appear in the original source. I used the ‘@’ symbol for this since there isn’t an operator in C for it:

string(REPLACE ";" "@" readme_md "${readme_md}")

Ok now we have the string as a real string (non-list) we can extract the code samples themselves. You’ll notice that in the README.md all code samples begin with “```c” and end with “```”. So we can use this to look for our code.

As I said earlier CMake is greedy when it comes to regex, which means if we used the more natural “.*” we’d match from the very first code sample to the very last in the file. Not ideal. Instead we need to use the more constrained search of “```c[^`]*```” - search for the start pattern, and then all symbols except a “`” until we get to our end. This stores each match as a list entry into the variable snippets - meaning we have introdued some semi-colons into the string too:

string(REGEX MATCHALL "```c[^`]*```" snippets "${readme_md}")

Now to help us be able to test the examples we want to be able to compile each of the code snippets in isolation from each other. I first attempted to create a UTEST(foo, bar) wrapper around each snippet, but I could not figure out how to create these wrappers such that they would be unique. What I mean is that the first snippet would be UTEST(generated, snippet0) the next UTEST(generated, snippet1), etc. For the life of me I couldn’t work out how this was possible. So instead I just wrapped each snippet into its own braced region which guaranteed their isolation.

string(REPLACE "```c" "{" snippets ${snippets})
string(REPLACE "```" "}\n\n" snippets ${snippets})

Now all we need to do is remove the semi-colons that were added for the lists, and then turn all “@” symbols we introduced before back into semi-colons:

string(REPLACE ";" "" snippets "${snippets}")
string(REPLACE "@" ";" snippets "${snippets}")

And then we just need to write out the file into some location for inclusion:

file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/generated.h "${snippets}")

Sample generated.h

For the current master json.h, the generated.h file is:

{
struct json_value_s *json_parse(
    const void *src,
    size_t src_size);
}

{
struct json_value_s {
  void *payload;
  size_t type;
};
}

{
struct json_value_s *json_parse_ex(
    const void *src,
    size_t src_size,
    size_t flags_bitset,
    void*(*alloc_func_ptr)(void *, size_t),
    void *user_data,
    struct json_parse_result_s *result);
}

{
enum json_parse_flags_e {
  json_parse_flags_default = 0,
  json_parse_flags_allow_trailing_comma = 0x1,
  json_parse_flags_allow_unquoted_keys = 0x2,
  json_parse_flags_allow_global_object = 0x4,
  json_parse_flags_allow_equals_in_object = 0x8,
  json_parse_flags_allow_no_commas = 0x10,
  json_parse_flags_allow_c_style_comments = 0x20,
  json_parse_flags_deprecated = 0x40,
  json_parse_flags_allow_location_information = 0x80,
  json_parse_flags_allow_single_quoted_strings = 0x100,
  json_parse_flags_allow_hexadecimal_numbers = 0x200,
  json_parse_flags_allow_leading_plus_sign = 0x400,
  json_parse_flags_allow_leading_or_trailing_decimal_point = 0x800,
  json_parse_flags_allow_inf_and_nan = 0x1000,
  json_parse_flags_allow_multi_line_strings = 0x2000,
  json_parse_flags_allow_simplified_json =
      (json_parse_flags_allow_trailing_comma |
       json_parse_flags_allow_unquoted_keys |
       json_parse_flags_allow_global_object |
       json_parse_flags_allow_equals_in_object |
       json_parse_flags_allow_no_commas),
  json_parse_flags_allow_json5 =
      (json_parse_flags_allow_trailing_comma |
       json_parse_flags_allow_unquoted_keys |
       json_parse_flags_allow_c_style_comments |
       json_parse_flags_allow_single_quoted_strings |
       json_parse_flags_allow_hexadecimal_numbers |
       json_parse_flags_allow_leading_plus_sign |
       json_parse_flags_allow_leading_or_trailing_decimal_point |
       json_parse_flags_allow_inf_and_nan |
       json_parse_flags_allow_multi_line_strings)
};
}

{
const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
struct json_value_s* root = json_parse(json, strlen(json));
assert(root->type == json_type_object);

struct json_object_s* object = (struct json_object_s*)root->payload;
assert(object->length == 2);

struct json_object_element_s* a = object->start;

struct json_string_s* a_name = a->name;
assert(0 == strcmp(a_name->string, "a"));
assert(a_name->string_size == strlen("a"));

struct json_value_s* a_value = a->value;
assert(a_value->type == json_type_true);
assert(a_value->payload == NULL);

struct json_object_element_s* b = a->next;
assert(b->next == NULL);

struct json_string_s* b_name = b->name;
assert(0 == strcmp(b_name->string, "b"));
assert(b_name->string_size == strlen("b"));

struct json_value_s* b_value = b->value;
assert(b_value->type == json_type_array);

struct json_array_s* array = (struct json_array_s*)b_value->payload;
assert(array->length == 3);

struct json_array_element_s* b_1st = array->start;

struct json_value_s* b_1st_value = b_1st->value;
assert(b_1st_value->type == json_type_false);
assert(b_1st_value->payload == NULL);

struct json_array_element_s* b_2nd = b_1st->next;

struct json_value_s* b_2nd_value = b_2nd->value;
assert(b_2nd_value->type == json_type_null);
assert(b_2nd_value->payload == NULL);

struct json_array_element_s* b_3rd = b_2nd->next;
assert(b_3rd->next == NULL);

struct json_value_s* b_3rd_value = b_3rd->value;
assert(b_3rd_value->type == json_type_string);

struct json_string_s* string = (struct json_string_s*)b_3rd_value->payload;
assert(0 == strcmp(string->string, "foo"));
assert(string->string_size == strlen("foo"));

/* Don't forget to free the one allocation! */
free(root);
}

{
const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
struct json_value_s* root = json_parse(json, strlen(json));

struct json_object_s* object = json_value_as_object(root);
assert(object != NULL);
assert(object->length == 2);

struct json_object_element_s* a = object->start;

struct json_string_s* a_name = a->name;
assert(0 == strcmp(a_name->string, "a"));
assert(a_name->string_size == strlen("a"));

struct json_value_s* a_value = a->value;
assert(json_value_is_true(a_value));

struct json_object_element_s* b = a->next;
assert(b->next == NULL);

struct json_string_s* b_name = b->name;
assert(0 == strcmp(b_name->string, "b"));
assert(b_name->string_size == strlen("b"));

struct json_array_s* array = json_value_as_array(b->value);
assert(array->length == 3);

struct json_array_element_s* b_1st = array->start;

struct json_value_s* b_1st_value = b_1st->value;
assert(json_value_is_false(b_1st_value));

struct json_array_element_s* b_2nd = b_1st->next;

struct json_value_s* b_2nd_value = b_2nd->value;
assert(json_value_is_null(b_2nd_value));

struct json_array_element_s* b_3rd = b_2nd->next;
assert(b_3rd->next == NULL);

struct json_string_s* string = json_value_as_string(b_3rd->value);
assert(string != NULL);
assert(0 == strcmp(string->string, "foo"));
assert(string->string_size == strlen("foo"));

/* Don't forget to free the one allocation! */
free(root);
}

I wanted to keep normal assert.h assert’s in the sample source, but I also want these to be turned into my utest.h ASSERT_TRUE macros, so I just use the preprocessor to define these over, and include the source into the test:

#define assert(x) ASSERT_TRUE(x)

UTEST(generated, readme) {
#include "generated.h"
}

And the output when I run?

[ RUN      ] generated.readme
[       OK ] generated.readme (9715ns)

A pass!

Conclusion

Ok - it is not as nice as what Rust has built-in, but it works! I can now modify the README.md and be sure that the code compiles correctly. I even found a bug in the sample in the process, so well worth the work. Just a shame I had to invest in proper demonology to support this within the C eco-system.