From: "alanwu (Alan Wu)" Date: 2022-06-27T16:40:47+00:00 Subject: [ruby-core:109073] [Ruby master Bug#18882] File.read cuts off a text file with special characters when reading it on MS Windows Issue #18882 has been updated by alanwu (Alan Wu). Huh, Ruby might be exposing the C runtime behavior here. Assuming I'm reading the call graph right, and there isn't some Ruby level defaulting going on, a `File.read()` with just a path and no mode maps to a call to `_wopen()` with neither `_O_TEXT` nor `_O_BINARY` set. In this case the runtime uses a global `_fmode` to decide the actual mode, according to the docs here: https://docs.microsoft.com/en-us/cpp/c-runtime-library/text-and-binary-mode-file-i-o?view=msvc-170 So potentially, someone could use Fiddle or some C extension to call `_set_fmode(_O_BINARY)` to change the behavior of plain `File.read()` calls. Anyways clearly I'm out of my depth here as you can tell how I got the issue wrong initially. I agree that it's surprising that `File.read` doesn't read every byte, though. I will add this to the next dev meeting because I'm interested in what other people think. --- The call graph from `File.read()` to `_wopen()` (again, it might be wrong): ``` rb_io_s_read open_key_args rb_io_open rb_io_open_generic rb_sysopen rb_sysopen_internal sysopen_func rb_cloexec_open open [(rb_w32_uopen when _WIN32)](https://github.com/ruby/ruby/blob/aba804ef91a5b2aa88efdd74205026aca3f943b2/io.c#L178-L181) _wopen ``` ---------------------------------------- Bug #18882: File.read cuts off a text file with special characters when reading it on MS Windows https://bugs.ruby-lang.org/issues/18882#change-98218 * Author: magynhard (Matth��us Johannes Beyrle) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x64-mingw-ucrt] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- When using File.read to read a text file (in this case a javascript file) with special characters, the content is cut off at special characters. It occurs only when running ruby on Windows, tried several versions, including the latest. Does not occur on Linux or WSL (Windows Subsystem for Linux). I created a github repo including a test script and the source file as the result inside a file as well: https://github.com/grob-net4industry/ruby_win_file_bug ---Files-------------------------------- copy_pdfmake.min.js (582 KB) pdfmake.min.js (1.29 MB) diff.png (55.9 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: