• Locale & C Library

    From apam@21:3/197 to tenser on Wed Sep 24 02:47:04 2025
    Hi

    I was wondering if you know about setting the locale in a c library? I'm thinking maybe I should do that in crt0.o?

    At present on my OS, one must set the locale using setlocale in the
    program, but if I want my programs to have a default locale, I am
    thinking maybe I should set it before jumping to main() from an
    environment variable.

    Does this sound reasonable? Locales and things have always been a bit of
    a mystery to me :)

    Andrew


    --- envy/0.1-6dee535
    * Origin: Quinn - Random Things - bbs.quinnos.com:2323 (21:3/197)
  • From tenser@21:1/101 to apam on Fri Sep 26 01:45:29 2025
    On 24 Sep 2025 at 02:47a, apam pondered and said...

    I was wondering if you know about setting the locale in a c library? I'm thinking maybe I should do that in crt0.o?

    At present on my OS, one must set the locale using setlocale in the program, but if I want my programs to have a default locale, I am
    thinking maybe I should set it before jumping to main() from an environment variable.

    Does this sound reasonable? Locales and things have always been a bit of
    a mystery to me :)

    Whoo boy. This opens up a can of worms. But let me try to
    address your specific question first. The short answer is no,
    you probably don't want to do that.

    More specifically, I'm going by what POSIX, C, and existing
    libc implementations do. POSIX says there's a global default,
    called "POSIX" (aliased as "C") that gives you sort of the
    minimum baseline for running C programs.

    The current version of POSIX, POSIX 2024, includes the 2018
    revision of ISO C standard as a normative reference, and kicks
    the specifics of what's done when here over to C.

    C, in turn, is quite clear about this; section 7.11.1 of C 2018
    covers the details, and para (4) of that section states:
    "At program startup, the equivalent of `setlocale(LC_ALL, "C");`
    is executed."

    This strongly implies that one wouldn't do `setlocale()` for
    a non-default locale in crt0 before calling `main`. Looking
    at a smattering of `libc` implementations, I don't see any
    that touch locales in the pre-main C runtime code.

    So this suggests to me that your OS should arrange things so
    that, on entry to a program, the default has been selected.
    It is up to individual programs to call `setlocale()` as
    appropriate, if they need to care.

    Ok, so why is this stuff troublesome?

    Bluntly, the C/POSIX locale stuff isn't very good; it was
    designed to solve a problem that was, and is, very real: how
    do we write a single program that can work with the myriad
    different human languages and notations for similar concepts.

    An obvious example is, "how do we write dates?" Here in
    North American, we often write the numeric month first,
    and then the day of the month and then the year. But in
    other parts of the world, folks write the day of the month
    first, then the month, then the year. ISO-8601 date times
    write dates as 'year-month-day' (which has the considerable
    advantage of being sortable trivially, btw). Or consider
    the formatting of large numbers: again, in the US, we tend
    to write these with a comma separating multiples of powers
    of a thousand (that is, a comma between factors of 10^(k*3)
    for k>0), and use a period to separate the integral part of
    a number from the fractional part, such as 10,000.02. But
    over in Europe, they often use '.' to separate powers of
    thousands, and ',' to separate the integral and fractional
    parts. E.g., 10.000,02. To make things even more confusing,
    in India, they use the group things beyond a thousand ("hazar")
    into "lakh" (hundred thousands) and "crore" (ten million, or
    100 lakh), so one hundred million (10 crore) might be written
    as, "10,00,00,000". And we haven't even started to talk about
    currency....

    C and early Unix systems were invented in the US, so C
    programs and Unix systems tended to use US-centric conventions
    for such things, and the vast bulk of documentation, comments,
    etc, were written in (American) English. That's not
    unreasonable given the history, but folks elsewhere in the
    world wanted to use their own conventions and languages;
    locales were introduced to solve this.

    Except that they solve the wrong problems: in particular,
    they conflate things like the collating sequences used to
    represent textual data (important for ensuring that things
    like "strcmp" give the expected results in for a given
    locale) with how dates, times, and currency are formatted.
    But the former is now a solved problem: we should just use
    UTF-8 and Unicode everywhere. And the latter is a lot
    more general than what's in locale stuff in C and POSIX,
    and the locale stuff is not flexible enough to accommodate
    all of that generality. As a result, few people actually
    use it, preferring instead to use special-purpose libraries
    for handling these sorts of things. Sure, it's kind of
    neat that I can `export LC_ALL=fr_FR.UTF-8` and `ls -lh`
    will show me dates in French and use `,` for the decimal
    separator, but if I really want to do something in French,
    I'm not going to rely only on that support, Oui? Non.
    (For the record, I don't know French.)

    Anyway, that's my 2c on it: don't call `setlocale()` from
    CSU, and only call it in programs that actually need to
    care for some reason. In general, it's a pretty bad
    interface.

    --- Mystic BBS v1.12 A48 (Linux/64)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)