Skip to content

Strange interaction with locale and late loading Encode module #21746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ailin-nemui opened this issue Dec 22, 2023 · 8 comments
Closed

Strange interaction with locale and late loading Encode module #21746

ailin-nemui opened this issue Dec 22, 2023 · 8 comments

Comments

@ailin-nemui
Copy link

Module:

Description

Trying to use Encode after POSIX::setlocale has been changed to Polish, causes the following error:

Constants from lexical variables potentially modified elsewhere are no longer permitted at constant.pm line 41.
BEGIN failed--compilation aborted at constant.pm line 45.
Compilation failed in require at Encode.pm line 7.
BEGIN failed--compilation aborted at Encode.pm line 7.

Steps to Reproduce

(in case you have any of that set)

unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE
LANG=pl_PL.UTF-8 perl -E'
   BEGIN{
      use POSIX;
      setlocale(LC_ALL,"")
   }
   use Encode
'

you can also get the same error with:

LANG=pl_PL.utf8 perl -MPOSIX -E'
   setlocale(LC_ALL,"");
   eval q{
      use constant X => $]
   };
   print $@
'

Expected behavior

use Encode should not crash depending on the locale

Perl configuration

Summary of my perl5 (revision 5 version 39 subversion 6) configuration:
  Snapshot of: c5f88297e0985be19f66e995b527ee4a001fc028
  Platform:
    osname=linux
    osvers=6.6.3-1-default
    archname=x86_64-linux-thread-multi
    uname='linux d5421s.localdomain 6.6.3-1-default #1 smp preempt_dynamic wed nov 29 05:06:07 utc 2023 (d766c57) x86_64 x86_64 x86_64 gnulinux '
    config_args='-de -Dprefix=~/perl5/perlbrew/perls/perl-blead -Duseshrplib -Dusethreads -Dusemultiplicity -Dusedevel -Aeval:scriptdir=~/perl5/perlbrew/perls/perl-blead/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='13.2.1 20231130 [revision 741743c028dc00f27b9c8b1d5211c1f602f2fddd]'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/x86_64-suse-linux/lib /usr/lib /usr/lib64 /usr/local/lib64
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib64/libc.so.6
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.38'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,~/perl5/perlbrew/perls/perl-blead/lib/5.39.6/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_DEVEL
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Dec 22 2023 14:39:29
  %ENV:
    PERLBREW_HOME="~/.perlbrew"
    PERLBREW_MANPATH="~/perl5/perlbrew/perls/perl-blead/man"
    PERLBREW_PATH="~/perl5/perlbrew/bin:~/perl5/perlbrew/perls/perl-blead/bin"
    PERLBREW_PERL="perl-blead"
    PERLBREW_ROOT="~/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.98"
    PERLBREW_VERSION="0.98"
    PERLDOC_PAGER="less -s"
    PERL_LOCAL_LIB_ROOT="~/perl5/5.38.2"
    PERL_MB_OPT="--install_base "~/perl5/5.38.2""
    PERL_MM_OPT="INSTALL_BASE=~/perl5/5.38.2"
  @INC:
    ~/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.39.6/x86_64-linux-thread-multi
    ~/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.39.6
    ~/perl5/perlbrew/perls/perl-blead/lib/5.39.6/x86_64-linux-thread-multi
    ~/perl5/perlbrew/perls/perl-blead/lib/5.39.6
@jkeenan
Copy link
Contributor

jkeenan commented Dec 22, 2023

Steps to Reproduce

(in case you have any of that set)

unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE

It's not clear that that unset has the effect you think it does. For example, on Linux:

$ uname -mrs
Linux 5.10.0-25-amd64 x86_64
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

$ unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

@mauke
Copy link
Contributor

mauke commented Dec 22, 2023

@jkeenan That looks as expected to me. The locale system comes with various categories that can be configured individually, but they don't have to be: LANG is the fallback value used for all of them; LC_foo can be set to override a specific category foo individually; and LC_ALL overrides everything else.

By unsetting everything else, we make sure that the value specified in LANG is used for every category. (The output of locale reflects this.)

@jkeenan
Copy link
Contributor

jkeenan commented Dec 22, 2023 via email

@jkeenan
Copy link
Contributor

jkeenan commented Dec 22, 2023

Bisection with the following invocation:

$ export LANG=es_ES.UTF-8
$ perl Porting/bisect.pl --start=0c33882a943825845dde164b60900bf224b131cc --end=6cd93f2af51b2c45bd9e1c180d10bea1bfa226c7 -- ./perl -Ilib -e 'BEGIN { use POSIX; setlocale(LC_ALL,"") } use Encode'

... points to this commit:

commit 818cdb7aa9f85227c1c7313257c6204c872beb94
Author:     Karl Williamson <[email protected]>
AuthorDate: Sun Apr 11 05:57:07 2021 -0600
Commit:     Karl Williamson <[email protected]>
CommitDate: Thu Sep 1 09:02:04 2022 -0600

    locale.c: Skip code if will be a no-op
    
    The previous commits have fixed things up so that at this point in the
    code nothing has changed, and if nothing will change, we can just return

 locale.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

@khwilliamson, can you take a look? Thanks.

@jkeenan
Copy link
Contributor

jkeenan commented Dec 22, 2023

One of the notorious problems with the POSIX module is that it exports all functions by default. Its documentation states:

CAVEATS
*Everything is exported by default* (with a handful of exceptions). This
is an unfortunate backwards compatibility feature and its use is
strongly discouraged. You should either prevent the exporting (by saying
"use POSIX ();", as usual) and then use fully qualified names (e.g.
"POSIX::SEEK_END"), or give an explicit import list. If you do neither
and opt for the default (as in "use POSIX;"), you will import *hundreds
and hundreds* of symbols into your namespace.

The example provided by the OP in this ticket imports those hundreds of symbols into the namespace.

$ perl -v | head -2 | tail -1
This is perl 5, version 38, subversion 0 (v5.38.0) built for x86_64-linux

$ unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE

$ LANG=es_ES.UTF-8 perl -e 'BEGIN { use POSIX; setlocale(LC_ALL,"") } use Encode'
Constants from lexical variables potentially modified elsewhere are no longer permitted at /home/jkeenan/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/constant.pm line 41.
BEGIN failed--compilation aborted at /home/jkeenan/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/constant.pm line 45.
Compilation failed in require at /home/jkeenan/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/x86_64-linux/Encode.pm line 7.
BEGIN failed--compilation aborted at /home/jkeenan/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/x86_64-linux/Encode.pm line 7.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

However, when I import only the setlocale function from POSIX, I appear to get better results.

unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE

$ LANG=es_ES.UTF-8 perl -e 'BEGIN { use POSIX qw(setlocale); setlocale(LC_ALL,"") } use Encode'
[ no output ]

Let's also try an empty argument list.

$ unset LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE

$ LANG=es_ES.UTF-8 perl -e 'BEGIN { use POSIX (); POSIX::setlocale(LC_ALL,"") } use Encode'
[ no output ]

(I'm using the es_ES.UTF-8 locale because it's already installed on my machine, where the Polish locale was not.)

@ailin-nemui
Copy link
Author

In that case try &POSIX::LC_ALL to get the correct constant!

khwilliamson added a commit that referenced this issue Dec 31, 2023
khwilliamson added a commit to khwilliamson/perl5 that referenced this issue Dec 31, 2023
This fixes GH Perl#21746

Perl keeps the LC_NUMERIC category in a locale where the radix character
is a dot, regardless of what the user has requested.  This is because
much XS code has been written with the dot assumption.  When the user's
actual radix character is desired, the locale is briefly toggled to that
one for the duration of the operation.

When the user changes the LC_NUMERIC locale, the new one is noted, but
the attempted change is otherwise ignored unless its radix is a dot.
The new one will be briefly toggled into when appropriate.

The blamed commit contains a logic error

commit 818cdb7
Author:     Karl Williamson <[email protected]>
AuthorDate: Sun Apr 11 05:57:07 2021 -0600
Commit:     Karl Williamson <[email protected]>
CommitDate: Thu Sep 1 09:02:04 2022 -0600

    locale.c: Skip code if will be a no-op

It decided it was a no-op if the new locale that the user is changing to
is the same as the previous locale.  But it didn't consider that what
actually happens is that the new locale does actually get changed, and
this code is supposed to make sure that, before returning control to the
user, that a dot radix locale is in effect.

If the new locale is a dot radix locale, then no harm is done by
skipping the code, but otherwise things can go wrong.

I am chagrined that I made this logic error without noticing before it
got pushed, and am surprised that it took this long for the error to
surrface.  There must be something else intervening to make this not a
problem in most circumstances, but I haven't analyzed what it might be.

The details as to why it happened in this test case are pretty obscure.
The locale in effect is looking for a comma radix, but what is being
checked for is a Perl version number, like 5.0936.  When converting that
to a floating point number, the dot is not recognized, and only the
initial '5' is found.  The failing code in a module has different
actions depending on the current perl version it is being called from,
and the conditional got the answer wrong because 5 is less than 5.0936,
whereas the actual version is above that.  So it did the wrong thing and
caused an error.
@jkeenan
Copy link
Contributor

jkeenan commented Dec 31, 2023

@ailin-nemui, does #21786 address your problem?

khwilliamson added a commit that referenced this issue Jan 3, 2024
This fixes GH #21746

Perl keeps the LC_NUMERIC category in a locale where the radix character
is a dot, regardless of what the user has requested.  This is because
much XS code has been written with the dot assumption.  When the user's
actual radix character is desired, the locale is briefly toggled to that
one for the duration of the operation.

When the user changes the LC_NUMERIC locale, the new one is noted, but
the attempted change is otherwise ignored unless its radix is a dot.
The new one will be briefly toggled into when appropriate.

The blamed commit contains a logic error

commit 818cdb7
Author:     Karl Williamson <[email protected]>
AuthorDate: Sun Apr 11 05:57:07 2021 -0600
Commit:     Karl Williamson <[email protected]>
CommitDate: Thu Sep 1 09:02:04 2022 -0600

    locale.c: Skip code if will be a no-op

It decided it was a no-op if the new locale that the user is changing to
is the same as the previous locale.  But it didn't consider that what
actually happens is that the new locale does actually get changed, and
this code is supposed to make sure that, before returning control to the
user, that a dot radix locale is in effect.

If the new locale is a dot radix locale, then no harm is done by
skipping the code, but otherwise things can go wrong.

I am chagrined that I made this logic error without noticing before it
got pushed, and am surprised that it took this long for the error to
surrface.  There must be something else intervening to make this not a
problem in most circumstances, but I haven't analyzed what it might be.

The details as to why it happened in this test case are pretty obscure.
The locale in effect is looking for a comma radix, but what is being
checked for is a Perl version number, like 5.0936.  When converting that
to a floating point number, the dot is not recognized, and only the
initial '5' is found.  The failing code in a module has different
actions depending on the current perl version it is being called from,
and the conditional got the answer wrong because 5 is less than 5.0936,
whereas the actual version is above that.  So it did the wrong thing and
caused an error.
@ailin-nemui
Copy link
Author

thanks

ashutosh108 pushed a commit to ashutosh108/perl5 that referenced this issue Apr 29, 2024
This fixes GH Perl#21746

Perl keeps the LC_NUMERIC category in a locale where the radix character
is a dot, regardless of what the user has requested.  This is because
much XS code has been written with the dot assumption.  When the user's
actual radix character is desired, the locale is briefly toggled to that
one for the duration of the operation.

When the user changes the LC_NUMERIC locale, the new one is noted, but
the attempted change is otherwise ignored unless its radix is a dot.
The new one will be briefly toggled into when appropriate.

The blamed commit contains a logic error

commit 818cdb7
Author:     Karl Williamson <[email protected]>
AuthorDate: Sun Apr 11 05:57:07 2021 -0600
Commit:     Karl Williamson <[email protected]>
CommitDate: Thu Sep 1 09:02:04 2022 -0600

    locale.c: Skip code if will be a no-op

It decided it was a no-op if the new locale that the user is changing to
is the same as the previous locale.  But it didn't consider that what
actually happens is that the new locale does actually get changed, and
this code is supposed to make sure that, before returning control to the
user, that a dot radix locale is in effect.

If the new locale is a dot radix locale, then no harm is done by
skipping the code, but otherwise things can go wrong.

I am chagrined that I made this logic error without noticing before it
got pushed, and am surprised that it took this long for the error to
surrface.  There must be something else intervening to make this not a
problem in most circumstances, but I haven't analyzed what it might be.

The details as to why it happened in this test case are pretty obscure.
The locale in effect is looking for a comma radix, but what is being
checked for is a Perl version number, like 5.0936.  When converting that
to a floating point number, the dot is not recognized, and only the
initial '5' is found.  The failing code in a module has different
actions depending on the current perl version it is being called from,
and the conditional got the answer wrong because 5 is less than 5.0936,
whereas the actual version is above that.  So it did the wrong thing and
caused an error.
steve-m-hay pushed a commit that referenced this issue Dec 30, 2024
This fixes GH #21746

Perl keeps the LC_NUMERIC category in a locale where the radix character
is a dot, regardless of what the user has requested.  This is because
much XS code has been written with the dot assumption.  When the user's
actual radix character is desired, the locale is briefly toggled to that
one for the duration of the operation.

When the user changes the LC_NUMERIC locale, the new one is noted, but
the attempted change is otherwise ignored unless its radix is a dot.
The new one will be briefly toggled into when appropriate.

The blamed commit contains a logic error

commit 818cdb7
Author:     Karl Williamson <[email protected]>
AuthorDate: Sun Apr 11 05:57:07 2021 -0600
Commit:     Karl Williamson <[email protected]>
CommitDate: Thu Sep 1 09:02:04 2022 -0600

    locale.c: Skip code if will be a no-op

It decided it was a no-op if the new locale that the user is changing to
is the same as the previous locale.  But it didn't consider that what
actually happens is that the new locale does actually get changed, and
this code is supposed to make sure that, before returning control to the
user, that a dot radix locale is in effect.

If the new locale is a dot radix locale, then no harm is done by
skipping the code, but otherwise things can go wrong.

I am chagrined that I made this logic error without noticing before it
got pushed, and am surprised that it took this long for the error to
surrface.  There must be something else intervening to make this not a
problem in most circumstances, but I haven't analyzed what it might be.

The details as to why it happened in this test case are pretty obscure.
The locale in effect is looking for a comma radix, but what is being
checked for is a Perl version number, like 5.0936.  When converting that
to a floating point number, the dot is not recognized, and only the
initial '5' is found.  The failing code in a module has different
actions depending on the current perl version it is being called from,
and the conditional got the answer wrong because 5 is less than 5.0936,
whereas the actual version is above that.  So it did the wrong thing and
caused an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants