LC_CTYPE data block
The LC_CTYPE
data block configures character classification and conversion.
When defining a locale data block in the C library, the macros
that define an LC_CTYPE
data block are as follows:
Call
LC_CTYPE_begin
with a symbol name and a locale name.Call
LC_CTYPE_table
repeatedly to specify 256 table entries.LC_CTYPE_table
takes a single argument in quotes. This must be a comma-separated list of table entries. Each table entry describes one of the 256 possible characters, and can be either an illegal character (IL
) or the bitwise OR of one or more of the following flags:__S
whitespace characters
__P
punctuation characters
__B
printable space characters
__L
lowercase letters
__U
uppercase letters
__N
decimal digits
__C
control characters
__X
hexadecimal digit letters A-F and a-f
__A
alphabetic but neither uppercase nor lowercase, such as Japanese katakana.
Note
A printable space character is defined as any character where the result of both
isprint()
andisspace()
is true.__A
must not be specified for the same character as either__N
or__X
.If required, call one or both of the following optional macros:
LC_CTYPE_full_wctype
. Calling this macro without arguments causes the C99 wide-character ctype functions (iswalpha
(),iswupper()
, ...) to return useful values across the full range of Unicode when thisLC_CTYPE
locale is active. If this macro is not specified, the wide ctype functions treat the first 256wchar_t
values as the same as the 256char
values, and the rest of thewchar_t
range as containing illegal characters.LC_CTYPE_multibyte
defines this locale to be a multibyte character set. Call this macro with three arguments. The first two arguments are the names of functions that perform conversion between the multibyte character set and Unicode wide characters. The last argument is the value that must be taken by the C macroMB_CUR_MAX
for the respective character set. The two function arguments have the following prototypes:size_t internal_mbrtowc(char32_t *pwc, char c, mbstate_t *pstate, int wchar32); size_t internal_wcrtomb(char *s, char32_t w, mbstate_t *pstate, int wchar32);
internal_mbrtowc()
takes one byte,
c
, as input, and updates thembstate_t
pointed to bypstate
as a result of reading that byte. If the byte completes the encoding of a multibyte character, it writes the corresponding wide character into the location pointed to bypwc
, and returns 1 to indicate that it has done so. If not, it returns -2 to indicate the state change ofmbstate_t
and that no character is output. Otherwise, it returns -1 to indicate that the encoded input is invalid.internal_wcrtomb()
takes one wide character,
w
, as input, and writes some number of bytes into the memory pointed to bys
. It returns the number of bytes output, or -1 to indicate that the input character has no valid representation in the multibyte character set.
The
wchar32
parameter specifies whether the wide character is 32-bit (1) or 16-bit (0). If your code does not use the C11/C++11 headers<uchar.h>
or<cuchar>
, thewchar32
parameter can be ignored because it defaults to the current definition ofwchar_t
.
Call
LC_CTYPE_end
, without arguments, to finish the locale block definition.
Example LC_CTYPE data block
LC_CTYPE_begin utf8_ctype, "UTF-8" ; ; Single-byte characters in the low half of UTF-8 are exactly ; the same as in the normal "C" locale. LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x00-0x08 LC_CTYPE_table "__C|__S, __C|__S, __C|__S, __C|__S, __C|__S" ; 0x09-0x0D(BS,LF,VT,FF,CR) LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x0E-0x16 LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x17-0x1F LC_CTYPE_table "__B|__S" ; space LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P, __P" ; !"#$%&'( LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P" ; )*+,-./ LC_CTYPE_table "__N, __N, __N, __N, __N, __N, __N, __N, __N, __N" ; 0-9 LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P" ; :;<=>?@ LC_CTYPE_table "__U|__X, __U|__X, __U|__X, __U|__X, __U|__X, __U|__X" ; A-F LC_CTYPE_table "__U, __U, __U, __U, __U, __U, __U, __U, __U, __U" ; G-P LC_CTYPE_table "__U, __U, __U, __U, __U, __U, __U, __U, __U, __U" ; Q-Z LC_CTYPE_table "__P, __P, __P, __P, __P, __P" ; [\]^_` LC_CTYPE_table "__L|__X, __L|__X, __L|__X, __L|__X, __L|__X, __L|__X" ; a-f LC_CTYPE_table "__L, __L, __L, __L, __L, __L, __L, __L, __L, __L" ; g-p LC_CTYPE_table "__L, __L, __L, __L, __L, __L, __L, __L, __L, __L" ; q-z LC_CTYPE_table "__P, __P, __P, __P" ; {|}~ LC_CTYPE_table "__C" ; 0x7F ; ; Nothing in the top half of UTF-8 is valid on its own as a ; single-byte character, so they are all illegal characters (IL). LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL" ; ; The UTF-8 ctype locale wants the full version of wctype. LC_CTYPE_full_wctype ; ; UTF-8 is a multibyte locale, so we must specify some ; conversion functions. MB_CUR_MAX is 6 for UTF-8 (the lead ; bytes 0xFC and 0xFD are each followed by five continuation ; bytes). ; ; The implementations of the conversion functions are not ; provided in this example. ; IMPORT utf8_mbrtowc IMPORT utf8_wcrtomb LC_CTYPE_multibyte utf8_mbrtowc, utf8_wcrtomb, 6 LC_CTYPE_end