summaryrefslogtreecommitdiff
path: root/src/backend/utils/mb
AgeCommit message (Collapse)Author
2013-07-19Add checks for valid multibyte character length in UtfToLocal, LocalToUtf.Tom Lane
This is mainly to suppress "uninitialized variable" warnings from very recent versions of gcc. But it seems like a good robustness thing anyway, not to mention that we might someday decide to support 6-byte UTF8. Per report from Karol Trzcionka. No back-patch since there's no reason at the moment to think this is more than cosmetic.
2013-06-26Renovate display of non-ASCII messages on Windows.Noah Misch
GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-12-16Tidy up from frontend Assert change.Andrew Dunstan
Quiet compiler warnings noted by Peter Eisentraut.
2012-08-30Remove configure flag --disable-shared, as it is no longer used by anyBruce Momjian
port. The last use was QNX, per Peter Eisentraut.
2012-07-10Fix ASCII case in pg_wchar2mule_with_len.Tom Lane
Also some cosmetic improvements for wchar-to-mblen patch.
2012-07-06Fix failure of new wchar->mb functions to advance from pointer.Robert Haas
Bug spotted by Tom Lane.
2012-07-05Run newly-configured perltidy script on Perl files.Bruce Momjian
Run on HEAD and 9.2.
2012-07-04Add wchar -> mb conversion routines.Robert Haas
This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me
2012-07-04Improve documentation about MULE encoding.Tom Lane
This commit improves the comments in pg_wchar.h and creates #define symbols for some formerly hard-coded values. No substantive code changes. Tatsuo Ishii and Tom Lane
2012-06-10Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian
commit-fest.
2012-04-24Lots of doc corrections.Robert Haas
Josh Kupershmidt
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-10-30Further improvement of make_greater_string.Tom Lane
Make sure that it considers all the possibilities that the old code did, instead of trying only one possibility per character position. To keep the runtime in bounds, instead tweak the character incrementers to not try every possible multibyte character code. Remove unnecessary logic to restore the old character value on failure. Additional comment and formatting cleanup.
2011-10-29Improve make_greater_string() with encoding-specific incrementers.Robert Haas
This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.
2011-09-11Remove many -Wcast-qual warningsPeter Eisentraut
This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.
2011-09-06Avoid possibly accessing off the end of memory in SJIS2004 conversion.Tom Lane
The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when only one remained in the string. Since conversion functions aren't supposed to assume null-terminated input, this poses a small risk of fetching past the end of memory and incurring SIGSEGV. No such crash has been identified in the field, but we've certainly seen the equivalent happen in other code paths, so patch this one all the way back. Report and patch by Noah Misch.
2011-09-05Improve "invalid byte sequence for encoding" messagePeter Eisentraut
It used to say ERROR: invalid byte sequence for encoding "UTF8": 0xdb24 Change this to ERROR: invalid byte sequence for encoding "UTF8": 0xdb 0x24 to make it clear that this is a byte sequence and not a code point. Also fix the adjacent "character has no equivalent" message that has the same issue.
2011-04-23Fix char2wchar/wchar2char to support collations properly.Tom Lane
These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.
2011-04-17foreach() and list_delete() don't mix.Tom Lane
Fix crash when releasing duplicate entries in the encoding conversion cache list, caused by releasing the current entry of the list being chased by foreach(). We have a standard idiom for handling such cases, but this loop wasn't using it. This got broken in my recent rewrite of GUC assign hooks. Not sure how I missed this when testing the modified code, but I did. Per report from Peter.
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-04-07Revise the API for GUC variable assign hooks.Tom Lane
The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
2011-02-21Fix pg_server_to_client, that was broken in the previous commit.Itagaki Takahiro
2011-02-21Add ENCODING option to COPY TO/FROM and file_fdw.Itagaki Takahiro
File encodings can be specified separately from client encoding. If not specified, client encoding is used for backward compatibility. Cases when the encoding doesn't match client encoding are slower than matched cases because we don't have conversion procs for other encodings. Performance improvement would be be a future work. Original patch by Hitoshi Harada, and modified by me.
2011-02-08Per-column collation supportPeter Eisentraut
This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-12-03Remove unnecessary string null-termination in pg_convert.Itagaki Takahiro
We can directly verify the unterminated input with pg_verify_mbstr_len.
2010-11-23Remove useless whitespace at end of linesPeter Eisentraut
2010-11-12Improved parallel make supportPeter Eisentraut
Replace for loops in makefiles with proper dependencies. Parallel make can now span across directories. Also, make -k and make -q work properly. GNU make 3.80 or newer is now required.
2010-09-22Convert cvsignore to gitignore, and add .gitignore for build targets.Magnus Hagander
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-09-19Replace last remaining $Id$ with $PostgreSQL$.Tom Lane
2010-08-19Remove extra newlines at end and beginning of files, add missing newlinesPeter Eisentraut
at end of files.
2010-08-18Rename utf2ucs() to utf8_to_unicode(), and export it so it can be usedTom Lane
elsewhere. Similarly rename the version in mbprint.c, not because this affects anything but just to keep the two copies in exact sync. There was some discussion of having only one copy in src/port/ instead, but this function is so small and unlikely to change that that seems like overkill. Slightly editorialized version of a patch by Joseph Adams. (The bug-fix aspect of his patch was applied separately, and back-patched.)
2010-07-07Adjust mbutils.c so it won't get broken by future pgindent runs.Tom Lane
To do that, replace L'\0' by (WCHAR) 0. Perhaps someday we should teach pgindent about wide-character literals, but so long as this is the only use-case in the entire Postgres sources, a workaround seems easier.
2010-07-06Undo pgindent breakage (again). Per buildfarm.Tom Lane
2010-07-06pgindent run for 9.0, second runBruce Momjian
2010-02-27Undo some more pgindent breakage. Per buildfarm.Tom Lane
2010-02-26pgindent run for 9.0Bruce Momjian
2010-02-16Remove personal copyright now that file has been rewritten usingBruce Momjian
existing *.pl conversion script. Andreas 'ads' Scherbaum
2010-02-14Wrap calls to SearchSysCache and related functions using macros.Robert Haas
The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
2010-01-04Remove sometimes inaccurate error hint about source of wrongly encoded data.Andrew Dunstan
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-11-12Make initdb behave sanely when the selected locale has codeset "US-ASCII".Tom Lane
Per discussion, this should result in defaulting to SQL_ASCII encoding. The original coding could not support that because it conflated selection of SQL_ASCII encoding with not being able to determine the encoding. Adjust pg_get_encoding_from_locale()'s API to distinguish these cases, and fix callers appropriately. Only initdb actually changes behavior, since the other callers were perfectly content to consider these cases equivalent. Per bug #5178 from Boh Yap. Not going to bother back-patching, since no one has complained before and there's an easy workaround (namely, specify the encoding you want).
2009-11-04Rename some encoding conversion modules to keep pathnames in our sourceTom Lane
tarballs under 100 characters. This should avoid failures with certain untarring tools (WinZip and Midnight Commander have been mentioned as likely suspects). Per my proposal of yesterday. catversion bumped since the initial contents of pg_proc change.
2009-10-17Fix typo in previous release as reported by Itagaki Takahiro, but missedMagnus Hagander
by me.
2009-10-17Write to the Windows eventlog in UTF16, converting the message encodingMagnus Hagander
as necessary. Itagaki Takahiro with some changes from me
2009-08-26Update of install-sh, mkinstalldirs, and associated configuryPeter Eisentraut
Update install-sh to that from Autoconf 2.63, plus our Darwin-specific changes (which I simplified a bit). install-sh is now able to install multiple files in one run, so we could simplify our makefiles sometime. install-sh also now has a -d option to create directories, so we don't need mkinstalldirs anymore. Use AC_PROG_MKDIR_P in configure.in, so we can use mkdir -p when available instead of install-sh -d. For consistency with the rest of the world, the corresponding make variable has been renamed from $(mkinstalldirs) to $(MKDIR_P).
2009-08-07Expand test coverage support to entire treePeter Eisentraut
Test coverage support now covers the entire source tree, including contrib, instead of just src/backend. In a related but independent development, the commands make coverage and make coverage-html can be run in any directory. This turned out to be much easier than feared. Besides a few ad hoc fixes to pass the make target down the tree, change all affected makefiles to list their directories in the SUBDIRS variable, changed from variants like DIRS and WANTED_DIRS. MSVC build fix was attempted as well.