On 24 Sep 2025 at 02:47a, apam pondered and said...
I was wondering if you know about setting the locale in a c library? I'm thinking maybe I should do that in crt0.o?
At present on my OS, one must set the locale using setlocale in the program, but if I want my programs to have a default locale, I am
thinking maybe I should set it before jumping to main() from an environment variable.
Does this sound reasonable? Locales and things have always been a bit of
a mystery to me :)
Whoo boy. This opens up a can of worms. But let me try to
address your specific question first. The short answer is no,
you probably don't want to do that.
More specifically, I'm going by what POSIX, C, and existing
libc implementations do. POSIX says there's a global default,
called "POSIX" (aliased as "C") that gives you sort of the
minimum baseline for running C programs.
The current version of POSIX, POSIX 2024, includes the 2018
revision of ISO C standard as a normative reference, and kicks
the specifics of what's done when here over to C.
C, in turn, is quite clear about this; section 7.11.1 of C 2018
covers the details, and para (4) of that section states:
"At program startup, the equivalent of `setlocale(LC_ALL, "C");`
is executed."
This strongly implies that one wouldn't do `setlocale()` for
a non-default locale in crt0 before calling `main`. Looking
at a smattering of `libc` implementations, I don't see any
that touch locales in the pre-main C runtime code.
So this suggests to me that your OS should arrange things so
that, on entry to a program, the default has been selected.
It is up to individual programs to call `setlocale()` as
appropriate, if they need to care.
Ok, so why is this stuff troublesome?
Bluntly, the C/POSIX locale stuff isn't very good; it was
designed to solve a problem that was, and is, very real: how
do we write a single program that can work with the myriad
different human languages and notations for similar concepts.
An obvious example is, "how do we write dates?" Here in
North American, we often write the numeric month first,
and then the day of the month and then the year. But in
other parts of the world, folks write the day of the month
first, then the month, then the year. ISO-8601 date times
write dates as 'year-month-day' (which has the considerable
advantage of being sortable trivially, btw). Or consider
the formatting of large numbers: again, in the US, we tend
to write these with a comma separating multiples of powers
of a thousand (that is, a comma between factors of 10^(k*3)
for k>0), and use a period to separate the integral part of
a number from the fractional part, such as 10,000.02. But
over in Europe, they often use '.' to separate powers of
thousands, and ',' to separate the integral and fractional
parts. E.g., 10.000,02. To make things even more confusing,
in India, they use the group things beyond a thousand ("hazar")
into "lakh" (hundred thousands) and "crore" (ten million, or
100 lakh), so one hundred million (10 crore) might be written
as, "10,00,00,000". And we haven't even started to talk about
currency....
C and early Unix systems were invented in the US, so C
programs and Unix systems tended to use US-centric conventions
for such things, and the vast bulk of documentation, comments,
etc, were written in (American) English. That's not
unreasonable given the history, but folks elsewhere in the
world wanted to use their own conventions and languages;
locales were introduced to solve this.
Except that they solve the wrong problems: in particular,
they conflate things like the collating sequences used to
represent textual data (important for ensuring that things
like "strcmp" give the expected results in for a given
locale) with how dates, times, and currency are formatted.
But the former is now a solved problem: we should just use
UTF-8 and Unicode everywhere. And the latter is a lot
more general than what's in locale stuff in C and POSIX,
and the locale stuff is not flexible enough to accommodate
all of that generality. As a result, few people actually
use it, preferring instead to use special-purpose libraries
for handling these sorts of things. Sure, it's kind of
neat that I can `export LC_ALL=fr_FR.UTF-8` and `ls -lh`
will show me dates in French and use `,` for the decimal
separator, but if I really want to do something in French,
I'm not going to rely only on that support, Oui? Non.
(For the record, I don't know French.)
Anyway, that's my 2c on it: don't call `setlocale()` from
CSU, and only call it in programs that actually need to
care for some reason. In general, it's a pretty bad
interface.
--- Mystic BBS v1.12 A48 (Linux/64)
* Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)