I’ve had a lot of people using my unicode library utf8.h since its release - thanks to all who’ve found the library useful and provided feedback!

One of the things that I’d scrimped on previously was my support for case-insensitive comparisons for letters beyond those in ASCII. I knew little about this, but when a user requested that my library also supported accented latin characters, and later greek symbols, I jumped to the occassion to add support.

The following utf8 functions utf8casecmp, utf8ncasecmp, utf8casestr, utf8isupper, utf8islower, utf8lwr, and utf8upr, have been modified to support the Latin-1 Supplement, Latin Extended-A, Latin Extended-B, and Greek & Coptic unicode sections. I’ve also added two new functions utf8lwrcodepoint and utf8uprcodepoint that’ll make a single codepoint upper or lower case.

The main logic of how you convert between the lower and upper cases is both slightly concise and utterly disgusting. Lets take a look at the code to convert a lower case codepoint to an upper case.

For ASCII characters, and also some Latin and Greek, the upper case codepoints are simply 32 places below the lower case ones:

The next set of codepoints are offset by 1 between the lower and upper cased variants. Depending on whether the lower case codepoint was odd or even, we have two if statements that handle both cases:

And lastly, for all other codepoints in the ranges that don’t have any sane approach whatsoever, we’ll fire them all into a single big switch statement:

With the above, we can handle all the lower/upper case variants for the Latin and Greek characters requested!

I hope these additions are found to be useful to my users, and if you’ve got any requests yourself feel free to file them here.