RT #113536: GetEnvironmentStrings() mess up the values for non-ascii strings

# Karthik Rajagopalan <kart...@sch...>

Wed, 06 Jun 2012 14:59:49 -0700
Hi, Before launching a sub-process using win32_spawnvp of win32.c, we try to get the environment block of the parent process by PerlEnv_get_children. This is defined to call CreateChildEnv(.)-> CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of win32 directory. This in turn call kernel32.dll's GetEnvironmentStrings(..). The ansi version of the function return the strings in OEM code page. So this mess up the string while passing to the subprocess if it has non-ascii characters. The patch attached ( wrt main branch 5.17 ) address the issue by calling GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in a single shot. It is then used to construct the new environment block for the sub-process. You will observe this issue in all versions of perl including 5.17 of your main branch. To reproduce the issue, following the below recipe - 0) Create a user account in windows with non-ascii character, say b�rbar 1) Open a cmd shell with this special usename account. 2) Download test1.c, test.bat.txt, test.pl to a directory, say X 3) cd to X. 4) rename test.bat.txt test.bat 5) Run vcvcars32.bat or vcvars64.bat to get Visual Studio environment ( I used VS 2010 to compile it ) 6) Run test.bat. This will give test1.exe binary. 7) Set your perl environment in PATH and run perl test.pl The program should successfully launch test1.exe and get the CSIDL_LOCAL_APPDATA of the user. Since the USERNAME, USERPROFILE env get messed up at perl side, you won't be able to retrieve the CSIDL_LOCAL_APPDATA of the user. With the patch, the program should successfully retrieve the value. The ANSI code page of my system while doing this test is 'windows-1252' and OEM code page is 'IBM437'. My System Details ---------------------------- --- Flags: category=core severity=critical --- Site configuration information for perl 5.14.2: Configured by rajagopa at Wed May 23 12:48:29 2012. Summary of my perl5 (revision 5 version 14 subversion 2) configuration: Platform: osname=MSWin32, osvers=6.1, archname=MSWin32-x64-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -Ox -GL -fp:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO', optimize='-MD -Zi -DNDEBUG -Ox -GL -fp:precise', cppflags='-DWIN32' ccversion='16.00.30319.01', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\perl\lib\CORE" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' libpth=\lib libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl514.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\perl\lib\CORE" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' Locally applied patches: --- @INC for perl 5.14.2: c:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/site/lib c:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/lib . --- Environment for perl 5.14.2: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=c:\msys\1.0\software\lib\Windows-x64\VS2010\perl-5.14.2\bin;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Windows\Microsoft.NET\Framework64\v3.5;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\VCPackages;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools;C:\Program Files (x86)\HTML Help Workshop;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools\x64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\x64;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\x64;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin;C:\Program Files (x86)\Windows Resource Kits\Tools\;C:\Program Files (x86)\Intel\Compiler\11.1\065\lib\Intel64;C:\Program Files (x86)\Intel\Compiler\11.1\065\lib\ia32;C:\Program Files (x86)\Intel\Compiler\11.1\065\mkl\em64t\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;c:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\Microsoft Windows Performance Toolkit\;C:\Program Files (x86)\AMD\CodeAnalyst\bin PERL_BADLANG (unset) SHELL (unset)
cl.exe /c /MD test1.c link.exe /NODEFAULTLIB test1.obj msvcrt.lib msvcprt.lib shell32.lib kernel32.lib
test1.c (480 bytes)
test.pl (389 bytes)
From 011f192c113c53705f2bab641e12e37c747a8146 Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <[email protected]> Date: Wed, 6 Jun 2012 17:05:11 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..) GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if if contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. --- win32/perlhost.h | 45 ++++++++++++++++++++++++++++++++++++++------- 1 files changed, 38 insertions(+), 7 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..244c4ac 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2258,11 +2258,26 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) { dTHX; LPSTR lpStr, lpPtr, lpEnvPtr, lpTmp, lpLocalEnv, lpAllocPtr; - DWORD dwSize, dwEnvIndex; + LPWSTR lpWAllocPtr, lpWTmp; + DWORD dwSize, dwEnvIndex, ansienv_size; int nLength, compVal; - // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + // Get the process environment strings + lpWAllocPtr = lpWTmp = (LPWSTR) GetEnvironmentStringsW(); + for(dwSize = 1; *lpWTmp != '\0'; lpWTmp += wcslen(lpWTmp) + 1) { + // calculate the size of the environment strings + dwSize += wcslen(lpWTmp) + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + ansienv_size = WideCharToMultiByte(CP_ACP, 0, lpWAllocPtr, dwSize, NULL, 0, 0, 0); + lpAllocPtr = lpTmp = (char *)calloc(ansienv_size, sizeof(char)); + + /* Convert the string from UTF-16 encoding to ANSI encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWAllocPtr, dwSize, lpTmp, ansienv_size, 0, 0); + + // release the process environment strings + FreeEnvironmentStringsW(lpWAllocPtr); // step over current directory stuff while(*lpTmp == '=') @@ -2337,8 +2352,6 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) *lpStr = '\0'; } - // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); return lpPtr; } @@ -2364,6 +2377,10 @@ CPerlHost::Clearenv(void) dTHX; char ch; LPSTR lpPtr, lpStr, lpEnvPtr; + LPWSTR lpWStr, lpWEnvPtr; + DWORD ansienv_size = 0; + DWORD dwSize = 0; + if (m_lppEnvList != NULL) { /* set every entry to an empty string */ for(DWORD index = 0; index < m_dwEnvCount; ++index) { @@ -2374,8 +2391,22 @@ CPerlHost::Clearenv(void) } } - /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + /* Get the process environment strings */ + lpWStr = lpWEnvPtr = (LPWSTR) GetEnvironmentStringsW(); + for(dwSize = 1; *lpWEnvPtr != '\0'; lpWEnvPtr += wcslen(lpWEnvPtr) + 1) { + // calculate the size of the environment strings + dwSize += wcslen(lpWEnvPtr) + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + ansienv_size = WideCharToMultiByte(CP_ACP, 0, lpWStr, dwSize, NULL, 0, 0, 0); + lpStr = lpEnvPtr = (char *)calloc(ansienv_size, sizeof(char)); + + /* Convert the string from UTF-16 encoding to ANSI encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWStr, dwSize, lpStr, ansienv_size, 0, 0); + + /* Free environment strings */ + FreeEnvironmentStringsW(lpWStr); /* step over current directory stuff */ while(*lpStr == '=') -- 1.7.7.1

# bulk88 <bulk...@hot...>

Thu, 07 Jun 2012 08:41:48 -0700
First issue, why are there 2 wcslens for the same data in the loop? Second issue, I'm not sure what the Perl Interp's standards are for the last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 aware/internally, maybe we should be using lpUsedDefaultChar and return a UTF8 string if it the flag is true after the length run of WideCharToMultiByte.

# The RT System itself <>

Thu, 07 Jun 2012 08:41:49 -0700
Status changed from new to open.

# Karthik Rajagopalan <kart...@sch...>

Thu, 07 Jun 2012 08:49:05 -0700
On Thu Jun 07 08:41:48 2012, bulk88. wrote: > First issue, why are there 2 wcslens for the same data in the loop? > I just maintained the same way it was handled earlier for GetEnvironmentStrings(..). The value could be saved and used for the next one just to avoid the two operations. > Second issue, I'm not sure what the Perl Interp's standards are for the > last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 > aware/internally, maybe we should be using lpUsedDefaultChar and return > a UTF8 string if it the flag is true after the length run of > WideCharToMultiByte. That would require a huge change in other functions AFAIK. This is the least change I can think of without disturbing other implementations. -Karthik

# Karthik Rajagopalan <kart...@sch...>

Thu, 07 Jun 2012 08:50:38 -0700
BTW, I need free ascii encdoing strings once they are processed. Once I hear from perl developers about my change, I will be able to make this change quickly. -Karthik On Thu Jun 07 08:49:05 2012, kartlee05 wrote: > On Thu Jun 07 08:41:48 2012, bulk88. wrote: > > First issue, why are there 2 wcslens for the same data in the loop? > > > > I just maintained the same way it was handled earlier for > GetEnvironmentStrings(..). The value could be saved and used for the > next one just to avoid the two operations. > > > Second issue, I'm not sure what the Perl Interp's standards are for > the > > last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 > > aware/internally, maybe we should be using lpUsedDefaultChar and > return > > a UTF8 string if it the flag is true after the length run of > > WideCharToMultiByte. > > That would require a huge change in other functions AFAIK. This is the > least change I can think of without disturbing other implementations. > > -Karthik >

# Steve Hay <stev...@goo...>

Thu, 07 Jun 2012 14:34:08 -0700
On 6 June 2012 22:59, Karthik Rajagopalan <[email protected]> wrote: > # New Ticket Created by  Karthik Rajagopalan > # Please include the string:  [perl #113536] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=113536 > > > > Hi, > > Before launching a sub-process using win32_spawnvp of win32.c, we try > to get the environment block of the parent process by > PerlEnv_get_children. This is defined to call CreateChildEnv(.)-> > CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of > win32 directory. This in turn call kernel32.dll's > GetEnvironmentStrings(..). The ansi version of the function return the > strings > in OEM code page. So this mess up the string while passing to the > subprocess if it has non-ascii characters. The patch attached ( wrt > main branch 5.17  ) address the issue by calling > GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in > a single shot. It is then used to construct the new environment block > for the sub-process. > > You will observe this issue in all versions of perl including 5.17 of > your main branch. > > To reproduce the issue, following the below recipe - > Thanks for the report. I have reproduced this with bleadperl and am interested in your fix, but I have several concerns: - You've added a large duplicated block of code in place of what was originally a GetEnvironmentStrings() call. You should factor out your new code into its own function and simply call that from each place concerned. - Where do you free() the memory that was calloc()ed? If you factor out your code into its own function then you could have a corresponding 'free' function for the callers to use when they are done with the return value, much like they currently call FreeEnvironmentStrings(). - Is calloc() definitely the right thing to call? I'm wondering if it should be win32_calloc(), or are they the same thing? (I always get lost in this maze of #defines...) - Perhaps your WideCharToMultiByte() calls should make use of the WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does (which I also notice calls win32_malloc() / win32_realloc() / win32_free() rather than just malloc() / realloc() / free()...).

# Karthik Rajagopalan <kart...@sch...>

Thu, 07 Jun 2012 15:20:49 -0700
Hi Steve, > > Thanks for the report. I have reproduced this with bleadperl and am > interested in your fix, but I have several concerns: > > - You've added a large duplicated block of code in place of what was > originally a GetEnvironmentStrings() call. You should factor out your > new code into its own function and simply call that from each place > concerned. Sounds good. > > - Where do you free() the memory that was calloc()ed? If you factor > out your code into its own function then you could have a > corresponding 'free' function for the callers to use when they are > done with the return value, much like they currently call > FreeEnvironmentStrings(). Just before the function CreateLocalEnvironmentStrings(..) and Cleanenv(..) return, I planned to have code - // release the ANSI encoding process environment strings // for CreateLocalEnvironmentStrings(..) if(lpAllocPtr) free((char*)lpAllocPtr); // for Clearenv(..) if(lpEnvPtr) free((char*)lpEnvPtr); You suggestion of moving to a separate function for caller to use, sounds good to me. > > - Is calloc() definitely the right thing to call? I'm wondering if it > should be win32_calloc(), or are they the same thing? (I always get > lost in this maze of #defines...) win32_calloc(..) in turn call calloc(..) ( see win32.c ). I can make use of win32_calloc(..) in new function. > > - Perhaps your WideCharToMultiByte() calls should make use of the > WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does > (which I also notice calls win32_malloc() / win32_realloc() / > win32_free() rather than just malloc() / realloc() / free()...). Since we are not going to deal just with PATH in environment strings, do you think this flag make sense in this case? I will submit a patch based on your suggestion by today. BTW, did you get a chance to try with my patch? Did it solve the issue? -Karthik

# Steve Hay <stev...@goo...>

Fri, 08 Jun 2012 13:17:34 -0700
On 7 June 2012 23:20, Karthik Rajagopalan via RT <[email protected]> wrote: >> >> - Perhaps your WideCharToMultiByte() calls should make use of the >> WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does >> (which I also notice calls win32_malloc() / win32_realloc() / >> win32_free() rather than just malloc() / realloc() / free()...). > > Since we are not going to deal just with PATH in environment strings, do > you think this flag make sense in this case? I was thinking about things in the environment (like usernames!...) which contain characters that are not representable in the current ANSI codepage. What is the best thing to do with them, given that WideCharToMultiByte() cannot do exactly what is asked of it in such cases? The win32_ansipath() function uses that flag, and MSDN suggests using it for usernames too: "For strings that require validation, such as file, resource, and user names, the application should always use the WC_NO_BEST_FIT_CHARS flag." > > I will submit a patch based on your suggestion by today. > > > BTW, did you get a chance to try with my patch? Did it solve the issue? Yes, your patch fixed the problem for me. Looking forward to a revised version :-)

# Karthik Rajagopalan <kart...@sch...>

Mon, 11 Jun 2012 12:57:53 -0700
Hi Shay, On Fri Jun 08 13:17:34 2012, shay wrote: > On 7 June 2012 23:20, Karthik Rajagopalan via RT > <[email protected]> wrote: > >> > >> - Perhaps your WideCharToMultiByte() calls should make use of the > >> WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does > >> (which I also notice calls win32_malloc() / win32_realloc() / > >> win32_free() rather than just malloc() / realloc() / free()...). > > > > Since we are not going to deal just with PATH in environment strings, do > > you think this flag make sense in this case? > > I was thinking about things in the environment (like usernames!...) > which contain characters that are not representable in the current > ANSI codepage. What is the best thing to do with them, given that > WideCharToMultiByte() cannot do exactly what is asked of it in such > cases? The win32_ansipath() function uses that flag, and MSDN suggests > using it for usernames too: "For strings that require validation, such > as file, resource, and user names, the application should always use > the WC_NO_BEST_FIT_CHARS flag." I was thinking for env other than username. This could have been taken by default by MS folks if it is really necessary to consider. Anyway, I now used the same in my patch. > > > > > > I will submit a patch based on your suggestion by today. > > > > > > BTW, did you get a chance to try with my patch? Did it solve the issue? > > Yes, your patch fixed the problem for me. Looking forward to a revised > version :-) > I now have your comments incorporated in my attached patch. Please go- through it and let me know your opinion. -Karthik
From bb7c5e61c26c2634be8e508bf0514f48056b54a9 Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <[email protected]> Date: Mon, 11 Jun 2012 15:50:34 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 34 ++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 40 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..1c4c05c 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,40 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + // Get the process environment strings + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + // calculate the size of the environment strings + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 08:13:09 -0700
On Mon Jun 11 12:57:53 2012, kartlee05 wrote: > I now have your comments incorporated in my attached patch. Please go- > through it and let me know your opinion. > > -Karthik > > __________________________ + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, aenvstrings_len, 0, 0); __________________________ I didn't run your code, but you didn't convert to UTF8. You converted to legacy high ASCII, and the conversion was done with character approximation since the 2nd WCTMB didn't get the no best fit flag and since you didn't pass the UTF8 CP, the function will fail if no best fit is passed. I'll post a known good perl to UTF16 windows converter for comparison (its from Win32 module), but its an apples oranges comparison. ________________________________ SV * wstr_to_sv(pTHX_ WCHAR *wstr) { int wlen = (int)wcslen(wstr)+1; BOOL use_default = FALSE; int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen, NULL, 0, NULL, NULL); SV *sv = sv_2mortal(newSV(len)); len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen, SvPVX(sv), len, NULL, &use_default); if (use_default) { len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0, NULL, NULL); sv_grow(sv, len); len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv), len, NULL, NULL); SvUTF8_on(sv); } /* Shouldn't really ever fail since we ask for the required length first, but who knows... */ if (len) { SvPOK_on(sv); SvCUR_set(sv, len-1); } return sv; } ______________________________________

# Karthik Rajagopalan <kart...@sch...>

Tue, 12 Jun 2012 08:29:31 -0700
Hi, On Tue Jun 12 08:13:09 2012, bulk88. wrote: > On Mon Jun 11 12:57:53 2012, kartlee05 wrote: > > I now have your comments incorporated in my attached patch. Please go- > > through it and let me know your opinion. > > > > -Karthik > > > > > __________________________ > + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ > + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, > aenvstrings_len, 0, 0); > __________________________ > I didn't run your code, but you didn't convert to UTF8. You converted to > legacy high ASCII, and the conversion was done with character > approximation since the 2nd WCTMB didn't get the no best fit flag and > since you didn't pass the UTF8 CP, the function will fail if no best fit > is passed. That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal conversion call from UTF-16 to ACP. I will revise the patch and sent it now. BTW, I don't think the win32 code would really work if UTF-8 is internally used for conversion. I see we use lot of calls toUpper, tolower which would not really work with utf-8 directly. So it is good if ACP is used. -Karthik >I'll post a known good perl to UTF16 windows converter for > comparison (its from Win32 module), but its an apples oranges comparison. > > ________________________________ > SV * > wstr_to_sv(pTHX_ WCHAR *wstr) > { > int wlen = (int)wcslen(wstr)+1; > BOOL use_default = FALSE; > int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, > wlen, NULL, 0, NULL, NULL); > SV *sv = sv_2mortal(newSV(len)); > > len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen, > SvPVX(sv), len, NULL, &use_default); > if (use_default) { > len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0, NULL, > NULL); > sv_grow(sv, len); > len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv), > len, NULL, NULL); > SvUTF8_on(sv); > } > /* Shouldn't really ever fail since we ask for the required length > first, but who knows... */ > if (len) { > SvPOK_on(sv); > SvCUR_set(sv, len-1); > } > return sv; > } > ______________________________________

# Karthik Rajagopalan <kart...@sch...>

Tue, 12 Jun 2012 08:41:15 -0700
> That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal > conversion call from UTF-16 to ACP. I will revise the patch and sent it > now. > > BTW, I don't think the win32 code would really work if UTF-8 is > internally used for conversion. I see we use lot of calls toUpper, > tolower which would not really work with utf-8 directly. So it is good > if ACP is used. > > -Karthik I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual conversion call. Please give a shot. -Karthik
From b77244181d5feea493dfc861434ea27a7b4825ef Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <[email protected]> Date: Tue, 12 Jun 2012 11:37:29 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 35 +++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..de7b9aa 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + // Get the process environment strings + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + // calculate the size of the environment strings + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, + aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 16:48:28 -0700
I've got a problem. I think your patch's code is unreachable. I compiled a blead 5.17.0 and applied your latest patch. win32_getenvironmentstrings is called by CPerlHost::CreateLocalEnvironmentStrings which is called by CreateChildEnv which is called by PerlEnvGetChildenv which is called by nothing and isn't in any file except perlhost.h.

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 16:51:32 -0700
On Tue Jun 12 16:48:28 2012, bulk88. wrote: > I've got a problem. I think your patch's code is unreachable. I compiled > a blead 5.17.0 and applied your latest patch. > win32_getenvironmentstrings is called by > CPerlHost::CreateLocalEnvironmentStrings which is called by > CreateChildEnv which is called by PerlEnvGetChildenv which is called by > nothing and isn't in any file except perlhost.h. _______________________________________ C:\p517\perl\win32>perl -V Summary of my perl5 (revision 5 version 17 subversion 0) configuration: Platform: osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -Od -MD -Zi -DDEBUGGING -DWIN32 -D_CONSOL E -DNO_STRICT -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_S YS -DUSE_PERLIO -D_USE_32BIT_TIME_T', optimize='-Od -MD -Zi -DDEBUGGING', cppflags='-DWIN32' ccversion='13.10.6030', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi ze=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -libpath:"c:\perl517\lib\ CORE" -machine:x86' libpth="C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\lib" libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.l ib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32 .lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt. lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg 32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws 2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msv crt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -libpath:"c:\p erl517\lib\CORE" -machine:x86' Characteristics of this binary (from libperl): Compile-time options: DEBUGGING HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_TRACK_MEMPOOL PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under MSWin32 Compiled at Jun 12 2012 19:06:01 %ENV: PERL_JSON_BACKEND="JSON::XS" PERL_YAML_BACKEND="YAML" @INC: C:/perl517/site/lib C:/perl517/lib . C:\p517\perl\win32> ___________________________

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 17:23:51 -0700
disregard the last 2 posts, i found the place where its called

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 22:54:32 -0700
You can't use CP_ACP unless your intending for your patch to only work with utf16 characters which are representable in the current system CP's high ascii. Is this what you want? From lpWStr ___________________________________________ 0x00010D40 5c 00 57 00 49 00 4e 00 44 00 4f 00 57 00 53 00 5c 00 54 00 45 00 4d 00 50 00 00 00 75 00 6e 00 \WINDOWS\TEMP␀un 0x00010D60 69 00 63 00 6f 00 64 00 65 00 3d 00 31 04 43 04 3b 04 3a 04 00 00 55 00 53 00 45 00 52 00 44 00 icode=булк␀USERD ___________________________________________ After conversion WideCharToMultiByte ___________________________________________ 0x00346B04 44 4f 57 53 5c 54 45 4d 50 00 75 6e 69 63 6f 64 DOWS\TEMP.unicod 0x00346B14 65 3d 3f 3f 3f 3f 00 55 53 45 52 44 4f 4d 41 49 e=????.USERDOMAI ___________________________________________ those "?" are real "?", 0x3f. The unicode does NOT pass through to the child process for me. I'll try more tomorrow.
env.pl (1685 bytes)

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 22:57:14 -0700
The attachment is a accidentally posted work in progress debugging scripts for the patch, it uses a release of Win32::API. I'll have better test scripts tomorrow.

# bulk88 <bulk...@hot...>

Tue, 12 Jun 2012 22:57:57 -0700
On Tue Jun 12 22:57:14 2012, bulk88. wrote: > The attachment is a accidentally posted work in progress debugging > scripts for the patch, it uses a FUTURE release of Win32::API. I'll have better > test scripts tomorrow.

# Steve Hay <stev...@goo...>

Wed, 13 Jun 2012 00:22:43 -0700
On Tue Jun 12 08:41:15 2012, kartlee05 wrote: > > > That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal > > conversion call from UTF-16 to ACP. I will revise the patch and sent > it > > now. > > > > BTW, I don't think the win32 code would really work if UTF-8 is > > internally used for conversion. I see we use lot of calls toUpper, > > tolower which would not really work with utf-8 directly. So it is good > > if ACP is used. > > > > -Karthik > > I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual > conversion call. Please give a shot. > I think there is a bug in this and the previous patch (but not your original one): the calculation of 'wenvstrings_len' wrongly sets 'size' to 1 at the start of the for-loop, but never uses 'size' again. It should have set 'wenvstrings_len' to 1 instead, otherwise that ends up with the wrong value and causes many, many tests to fail when running the full regression test suite. After changing for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { to for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { the patch looks good to me code-wise, but I will await any further input from bulk88 before committing. Could you also fix up a couple of the comments, though, please? The comment about converting from UTF-16 to UTF-8 should say *to the ANSI codepage* instead. You also use C++ style comments in two places, but perl always uses C style comments since most files are C and not all C compilers accept C++ style comments.

# bulk88 <bulk...@hot...>

Wed, 13 Jun 2012 12:37:40 -0700
Okay, here is the problem with the patch, the env var is NOT corrupted, only, if the env var is a utf8 char, that can be represented in the current system high ascii CP. If the utf8 char can not be represented in the current system high ascii CP, it is converted to 0x3F. On pre-patch Perls, the previous case results in a different letter, but not "?" appearing in the child process. I can't explain the letter Z from WE A. I made 2 perl files a parent and a reader and ran it on Perl 5.12 and 5.17 with OP's latest patch (shays code not included). The parent has a option to switch between a western europe char, and a cyrillic char. My system is CP 1252 normally. Make sure your console is chcp 65001 and Lucinda font is set. The parent (env2) was run as console as "perl -C7 env2.pl 1> resp512.txt 2>&1". All the files attached are in UTF8 not high ascii. Must open in a editor and manual switch to UTF8. Notepad and Write dont do it right. Note, that with WE A, 5.17 correctly passed through the WE A, but CYR b turned to ? substitution in the child process. Is this the correct behavior or the behavior we want or should the WideCharToMultiBytes be done to UTF8 and not system default CP? Then the issue is, do we want to pass a non UTF-16 narrow-ish env array with UTF8 through the ANSI/narrow CreateProcess that Perl uses? I decided NOT to study how %ENV works just to post this. I saw some strange behavior with %ENV and I wasn't sure whats correct and whats wrong behavior so I dropped the %ENV issue. If someone knows how %ENV is "supposed" to behave on windows with perl, or writes a "known good" script (on Unix or Windows) that uses %ENV for me to compare my results to, I can investigate the %ENV behavior I saw.
env2.pl (2705 bytes)
Active code page: 65001 SV = PV(0xb17c5c) at 0xac764c REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa9138c "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141d0b1414141" to console bytes "AAAбAAA" $VAR1 = "AAA\320\261AAA"; to console Perl "AAAбAAA" $VAR1 = "AAA\x{431}AAA"; perl ver parent v5.17.0 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAA?AAA perl ver reader v5.17.0
Active code page: 65001 SV = PV(0xb17c5c) at 0xac764c REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa9138c "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141c384414141" to console bytes "AAAÄAAA" $VAR1 = "AAA\303\204AAA"; to console Perl "AAAÄAAA" $VAR1 = "AAA\x{c4}AAA"; perl ver parent v5.17.0 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAAÄAAA perl ver reader v5.17.0
Active code page: 65001 SV = PV(0x9d820c) at 0xa39a14 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa46574 "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141d0b1414141" to console bytes "AAAбAAA" $VAR1 = "AAA\320\261AAA"; to console Perl "AAAбAAA" $VAR1 = "AAA\x{431}AAA"; perl ver parent v5.12.2 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAA?AAA perl ver reader v5.12.2
envread.pl (2216 bytes)
Active code page: 65001 SV = PV(0x9d820c) at 0xa39a14 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa46574 "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141c384414141" to console bytes "AAAÄAAA" $VAR1 = "AAA\303\204AAA"; to console Perl "AAAÄAAA" $VAR1 = "AAA\x{c4}AAA"; perl ver parent v5.12.2 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAAZAAA perl ver reader v5.12.2

# "Steve Hay" <Stev...@ver...>

Thu, 14 Jun 2012 00:56:57 -0700
bulk88 via RT wrote on 2012-06-13: > Okay, here is the problem with the patch, the env var is NOT > corrupted, only, if the env var is a utf8 char, that can be > represented in the current system high ascii CP. If the utf8 char can > not be represented in the current system high ascii CP, it is > converted to 0x3F. On pre- patch Perls, the previous case results in a > different letter, but not "?" > appearing in the child process. Thanks for the sample programs. I intend to have a look through them when I have more time, but if I understand you correctly the problem here is only that things are still not working if the environment contains characters which cannot be represented in the current ANSI codepage? That is to be expected, given the widechar (UTF-16) -> ANSI codepage conversion being done, and the ? character, of course, comes from the WideCharToMultiByte() calls when they fail to map a unicode character to the target encoding. On pre-patch perls something different happens because GetEnvironmentStringsA() was being used which returns things in the OEM codepage and must presumably have some algorithm of its own for handling unicode characters that cannot be mapped to that encoding, hence different characters (but still the wrong ones!) appearing instead of ?. I still think the patch is worthwhile (with the fix I posted previously) because it fixes things for the common case where the environment contains characters which *are* in the current ANSI codepage but either not in the current OEM codepage (because the GetEnvironmentStringsA() conversion from unicode to OEM would have failed) or else (more commonly) in the OEM codepage but at a different byte position (because perl then treated the OEM bytes as if they were the ANSI bytes which are normally used). So the only problem remaining is what happens in the case when characters outside of the current ANSI codepage appear in the environment, and that would be worth logging as a separate bug. It isn't such a common case, and there are likely to be problems in other areas of the code in that case anyway, e.g. you can't open a file whose name contains characters outside of the ANSI codepage without jumping through some hoops rather than using the built-in open() function. If you agree and have no other issues with the patch then I'll apply it and raise the non-ANSI characters problem as a separate bug (and attach your scripts etc).

# bulk88 <bulk...@hot...>

Thu, 14 Jun 2012 09:02:31 -0700
On Thu Jun 14 00:56:57 2012, [email protected] wrote: > So the > only problem remaining is what happens in the case when characters > outside of the current ANSI codepage appear in the environment, and > that would be worth logging as a separate bug. It isn't such a > common case, and there are likely to be problems in other areas of > the code in that case anyway, e.g. you can't open a file whose name > contains characters outside of the ANSI codepage without jumping > through some hoops rather than using the built-in open() function. > If you agree and have no other issues with the patch then I'll > apply it and raise the non-ANSI characters problem as a separate > bug (and attach your scripts etc). I agree. You can apply the patch. Ultimately one day all of %INC support in perlhost.h will have have to be made to use the wide apis so all wide char env vars roundtrip correctly between processes and through %INC. Note, to test this patch I used the raw wide win API since %INC was unusable. Currently perlhost.h (which implements %INC) either queries its own cache of env, or goes out to windows API. http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2402 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2084 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2134 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/win32.c#l1653 Plus hv.c will need some fixing to add the utf8 flag to the %INC if the env var isnt low ascii. Anyways, I'm done. You can apply the patch, and there should be enough notes in this ticket for someone in the future to work on the remainder of the problems.

# Karthik Rajagopalan <kart...@sch...>

Thu, 14 Jun 2012 09:20:33 -0700
On Wed Jun 13 00:22:43 2012, shay wrote: > On Tue Jun 12 08:41:15 2012, kartlee05 wrote: > > > > > That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal > > > conversion call from UTF-16 to ACP. I will revise the patch and sent > > it > > > now. > > > > > > BTW, I don't think the win32 code would really work if UTF-8 is > > > internally used for conversion. I see we use lot of calls toUpper, > > > tolower which would not really work with utf-8 directly. So it is good > > > if ACP is used. > > > > > > -Karthik > > > > I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual > > conversion call. Please give a shot. > > > > I think there is a bug in this and the previous patch (but not your > original one): the calculation of 'wenvstrings_len' wrongly sets 'size' > to 1 at the start of the for-loop, but never uses 'size' again. It > should have set 'wenvstrings_len' to 1 instead, otherwise that ends up > with the wrong value and causes many, many tests to fail when running > the full regression test suite. > > After changing > > for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > to > > for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > the patch looks good to me code-wise, but I will await any further input > from bulk88 before committing. I fixed the bug in my patch which failed to account for last '\0' char in the environment strings. > > Could you also fix up a couple of the comments, though, please? The > comment about converting from UTF-16 to UTF-8 should say *to the ANSI > codepage* instead. You also use C++ style comments in two places, but > perl always uses C style comments since most files are C and not all C > compilers accept C++ style comments. This is also fixed now. The revised patch is attached now. Please take a look. -Karthik
From f5bbd57635ff725971d736aed43832a57845491b Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <[email protected]> Date: Thu, 14 Jun 2012 12:16:15 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 35 +++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..e496837 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + /* Get the process environment strings */ + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + /* calculate the size of the environment strings */ + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to ACP encoding */ + WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, + aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1

# Karthik Rajagopalan <kart...@sch...>

Thu, 14 Jun 2012 09:22:20 -0700
On Tue Jun 12 22:54:32 2012, bulk88. wrote: > You can't use CP_ACP unless your intending for your patch to only work > with utf16 characters which are representable in the current system CP's > high ascii. Is this what you want? Yes, there are lot of places in win32 which still depend on ACP calls. So I just went ahead to do the same. -Karthik

# Steve Hay <stev...@goo...>

Fri, 15 Jun 2012 00:56:02 -0700
On Thu Jun 14 09:02:31 2012, bulk88. wrote: > On Thu Jun 14 00:56:57 2012, [email protected] wrote: > > So the > > only problem remaining is what happens in the case when > characters > > outside of the current ANSI codepage appear in the environment, > and > > that would be worth logging as a separate bug. > > I agree. [...] Anyways, I'm done. You can apply the patch, > and > there should be enough notes in this ticket for someone in the future > to > work on the remainder of the problems. I added a note to perltodo.pod in commit 799c141b0d referring to this ticket since the problem of Unicode in the environment (and in filenames etc) is already a known problem.

# Steve Hay <stev...@goo...>

Fri, 15 Jun 2012 00:57:16 -0700
On Thu Jun 14 09:20:33 2012, kartlee05 wrote: > The revised patch is attached now. Please take a look. > Thanks, your patch is now applied as commit 4f46e52b00.

# Steve Hay <stev...@goo...>

Fri, 15 Jun 2012 00:57:16 -0700
Status changed from open to resolved.

# Karthik Rajagopalan <kart...@sch...>

Fri, 15 Jun 2012 07:13:34 -0700
Hi Steve, On Fri Jun 15 00:57:16 2012, shay wrote: > On Thu Jun 14 09:20:33 2012, kartlee05 wrote: > > The revised patch is attached now. Please take a look. > > > > Thanks, your patch is now applied as commit 4f46e52b00. Thanks for applying this change. Will this be in 5.17? BTW, do you want me to take a look and suggest changes that we would need to fully support wide version for win32? -Karthik

# Karthik Rajagopalan <kart...@sch...>

Fri, 15 Jun 2012 07:15:52 -0700
On Fri Jun 15 07:13:34 2012, kartlee05 wrote: > Hi Steve, > > On Fri Jun 15 00:57:16 2012, shay wrote: > > On Thu Jun 14 09:20:33 2012, kartlee05 wrote: > > > The revised patch is attached now. Please take a look. > > > > > > > Thanks, your patch is now applied as commit 4f46e52b00. > > Thanks for applying this change. Will this be in 5.17? I meant in next minor version for it? >BTW, do you want > me to take a look and suggest changes that we would need to fully > support wide version for win32? > > -Karthik >

# Karthik Rajagopalan <kart...@sch...>

Sat, 16 Jun 2012 21:37:52 -0700
Sent from my iPhone On Jun 13, 2012, at 3:22 AM, "Steve Hay via RT" <[email protected]> wrote: > On Tue Jun 12 08:41:15 2012, kartlee05 wrote: >> >>> That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal >>> conversion call from UTF-16 to ACP. I will revise the patch and sent >> it >>> now. >>> >>> BTW, I don't think the win32 code would really work if UTF-8 is >>> internally used for conversion. I see we use lot of calls toUpper, >>> tolower which would not really work with utf-8 directly. So it is good >>> if ACP is used. >>> >>> -Karthik >> >> I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual >> conversion call. Please give a shot. >> > > I think there is a bug in this and the previous patch (but not your > original one): the calculation of 'wenvstrings_len' wrongly sets 'size' > to 1 at the start of the for-loop, but never uses 'size' again. It > should have set 'wenvstrings_len' to 1 instead, otherwise that ends up > with the wrong value and causes many, many tests to fail when running > the full regression test suite. > > After changing > > for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > to > > for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { I agree. I have not taken care for the additional null char to indicate the end of environment strings in my last two patches. I will add this now and resend. > > the patch looks good to me code-wise, but I will await any further input > from bulk88 before committing. > > Could you also fix up a couple of the comments, though, please? The > comment about converting from UTF-16 to UTF-8 should say *to the ANSI > codepage* instead. You also use C++ style comments in two places, but > perl always uses C style comments since most files are C and not all C > compilers accept C++ style comments. Sure, I will do that and send you in an hour. -Karthik

# Steve Hay <stev...@goo...>

Mon, 18 Jun 2012 00:26:30 -0700
On Fri Jun 15 07:13:34 2012, kartlee05 wrote: > Hi Steve, > > On Fri Jun 15 00:57:16 2012, shay wrote: > > On Thu Jun 14 09:20:33 2012, kartlee05 wrote: > > > The revised patch is attached now. Please take a look. > > > > > > > Thanks, your patch is now applied as commit 4f46e52b00. > > Thanks for applying this change. Will this be in 5.17? BTW, do you want > me to take a look and suggest changes that we would need to fully > support wide version for win32? > The change will indeed be in 5.17.1, due out in a couple of days. Support for wide api calls on Win32 was actually removed some time ago (5.8.1) since it was not working correctly and it was deemed, IIRC, that it would be more worthwhile to address the wider issue of supporting Unicode in any filesystems that support it rather than fixing the broken support for just Windows NTFS, hence the items in Porting/todo.pod about "Unicode in Filenames", "Unicode in %ENV" and "Unicode and glob()". For sure, the solution on Windows must involve the wide apis, but what we really need first is a framework for hooking those calls into as described in the item "Virtualize operating system access" in Porting/todo.pod.

# bulk88 <bulk...@hot...>

Tue, 13 May 2014 04:22:09 -0700
On Mon Jun 18 00:26:30 2012, shay wrote: > > The change will indeed be in 5.17.1, due out in a couple of days. > This ticket probably caused the leak reported at https://rt.perl.org/Ticket/Display.html?id=121676 -- bulk88 ~ bulk88 at hotmail.com