0

I am printing some ASCII art to the Serial monitor from Arduino UNO, with some success. However, using string literals uses more memory than I would like. I wanted to try constructing the strings/chars from other data types so that I can manipulate the data and print the ascii art without storing it in string literals.

However, I have found that there seems to be no way to print UTF-8 characters other than from a string literal. Is this the case? Is there no way to construct a string containing characters that have numeric values that are too big for char?

As an example, to print "▓" works fine as a String literal but, it seams, no other way.

Serial.println("▓");                         // works fine  
Serial.println('▓');                         // char can't store value
Serial.println(String('▓'));                 // char can't store value
Serial.println(0x2593);                      // just prints the numeric value
Serial.println((char)0x2593);                // char can't store value
Serial.println((wchar_t)0x2593);             // doesn't work
Serial.println(String(0x2593));              // doesn't work
Serial.println(String((wchar_t)0x2593));     // doesn't work

Similarily with write() instead of print():

Serial.write("▓");                  // Works fine
Serial.write('▓');                  // char can't store value
Serial.write(0x2593);               // just prints the numeric value
Serial.write((char)0x2593);         // char can't store value
Serial.write((wchar_t)0x2593);      // doesn't work

I also tried deriving new component strings from a string literal using substring() and charAt(). Neither works. Both produce � in the output.

String((char)65) constructs the string "A" from the numeric value 65. Neither String((char)0x2593) nor String((wchar_t)0x2593) produce the desired results. Is there a way to construct a string from numeric values that are too big to store in 'char'?

Hubert B
  • 71
  • 9

1 Answers1

3

As you have noticed, Serial doesn't know how to deal with wchar_t. If you are building your strings algorithmically from Unicode code points, you need to convert those code points to UTF-8 for printing. I am not aware of any built-in function that does that. You may want to search the library manager for a library providing this functionality.

Alternatively, you could write the conversion yourself: it is not that complicated. For example, here is a function that converts any code point from the BMP (i.e. < 216) to an UTF-8 string:

// A BMP multi-byte character, with terminating NUL byte.
struct Mbchar {
    char utf8[4];
};

// Convert a wide character to UTF-8. Only works within the BMP. Mbchar wchar_to_utf8(wchar_t c) { Mbchar result; if (c < 128) { // 0xxx.xxxx result.utf8[0] = c; result.utf8[1] = 0; } else if (c < 2048) { // 110x.xxxx 10xx.xxxx result.utf8[0] = 0xc0 | (c >> 6); result.utf8[1] = 0x80 | (c & 0x3f); result.utf8[2] = 0; } else { // 1110.xxxx 10xx.xxxx 10xx.xxxx result.utf8[0] = 0xe0 | (c >> 12); result.utf8[1] = 0x80 | (c >> 6 & 0x3f); result.utf8[2] = 0x80 | (c & 0x3f); result.utf8[3] = 0; } return result; }

Keep in mind that, depending on the Arduino you are using, wchar_t may not support characters outside the BMP. The AVR-based Arduinos definitively don't.

This function can be used like this:

Serial.println(wchar_to_utf8(0x2593).utf8);  // prints “▓”
Edgar Bonet
  • 45,094
  • 4
  • 42
  • 81