Internationalization for non-OS error messages #77
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
i18n for non-OS error messagesto Internationalization for non-OS error messagesalright this turned into a nearly three-hour rabbit hole but i've crawled back out of it with a something that might hopefully be of some use:
no clue how idiomatic any of this is, never used macros before either so there are almost certainly issues here but it at least seems like a reasonable enough starting point. thoughts, praise, criticism, personal attacks, etc are all welcome.
and yes i'm aware of how disgusting it is to reuse the identifier like that, but it works and i don't really see any practical issue with it?
one potential issue that i do see is that, because of my choice to do
str_id.lang
instead oflang.str_id
, there's seemingly no simple way to get and store a usable language identifier once at program init and then not incur additional overhead with language selection each time a string is used later on in the program. i made that choice for the sake of keeping related things in the same place, and i stand by it. i believe that this can be elegantly worked around with yet another macro, but unfortunately my only clear idea on that front so far would involve bringing in Syn for its ability to turn strings into identifiers. it's a hefty dependency for smth this minor, but the only alternative i can see at this point is using a hashmap so that we can use strings as identifiers, which i feel only moves that heftiness into runtime.upon closer inspection, it appears that i can do what i want without Syn, but it will still require a proc macro. not a huge deal, but it does add some complexity to the repo
I've been interested in GNU gettext(3) which is the GNU solution to localization. The underscore macro as a shortcut to gettext(3) is sprinkled throughout the coreutils (here it is in their true(1) abomination).
gettext(3) looks up the given message in a catalog and returns localizations. We could use UNIX locales and implement a gettext(3), mindful of the classic complaints about locales (reimplementing some C standard library functions, at least
<ctype.h>
). Or have an environment variableBONSAI_LOCALE
or something.The
tstrutils
solution is a lot easier for programmers but a massive source file with all the combinations of locale and string might be somewhat difficult for non-programmer translators and it's harder to audit big files. I would prefer at least a bonsai/i18n/lang database or something.Another thought I have is somehow adding errnos and strerrors, but that wouldn't be useful for non-errors (e.g.
Usage:
).I haven't considered efficiency for any of this and am out of time to write this comment but will come back to this thread later with ideas.
after banging my head against proc macros for the past 8 or so hours, i've decided to call it quits on that front.
here's a more reasonable solution with similar ease-of-use to GNU's
_()
:it isn't very good or optimized code, but it does work!
this assumes that
LANG
is set to a standard RFC5646 language tag (or the common-but-nonstandard variant where subtags are instead delimited by an underscore to work around programming language syntax), but it does gracefully ignore any.encoding
specifiers in order to maintain compatibility with normal Linux (*nix?) systemsi've been pondering this issue since you mentioned it, and the conclusion i've come to is this:
ultimately, the goal of this isn't translation—it's localization. localization isn't just blindly mapping words onto other words. it requires context, not only for the locale for which the text is being translated, but for the actual text itself. non-technical people who get scared off by fairly lightweight syntax (fwiw i could make the macro even lighter on this; the quotes are not strictly necessary) aren't going to provide localized strings that are of any more use than what google translate could give us.
i've seen the localization files for other large projects, and they aren't any better. since these are rust source files, as long as everything is imported under the
s
module, splitting them up should not actually be an issue.The standard is a combination of ISO 639 and ISO 3166 language and country identifiers, not IETF BCP 47 language tags.
oops, tysm for the correction :D