-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
tests/extmod/vfs_fat_finaliser.py: Fix on unix qemu-arm (WIP) #7659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/extmod/vfs_fat_finaliser.py: Fix on unix qemu-arm (WIP) #7659
Conversation
Since #7649 was merged the following (unrelated?) test now fails on unix qemu-arm:
And the following (also unrelated?) on Windows (AppVeyor) release x86:
|
Happened to me before that seemingly unrelated changes cause a finaliser test to fail. Assuming that's because binary layout changed just enough for certain pieces of code to now require more on the Python/C stack to push everything off, or a similar principle. But that MemoryError is something different. |
I can replicate this locally, but only when run via run-test.py (extra logging output showing the command line).
fails, with
If I run
(which is the correct output). Disabling N = 4
def foo():
for i in range(N):
n = "x%d" % i
f = vfs.open(n, "w")
f.write(n)
f = None # release f without closing
[0, 1, 2, 3] # use up Python stack so f is really gone
foo() makes it fail in the same way (only in run-test.py, but fine if run directly). Adding some more locals makes it pass. So this agrees with what @stinos is saying... will require a bit more work to trace the gc collection and figure out exactly what's making the GC think the block is still live.. Different alignments result in different files not being finalised. @dpgeorge Given that I can make this test fail with f-strings disabled, I don't think there's a new regression here. It might be worth just disabling this test for now on unix_qemu_arm. |
Had a look into what extreme_exc.py actually does (only reproduces on Appveyor VMs, not on my machine), and after all it isn't a different problem: when
instead of the current Note: the test code says 'should use the heap and truncate the message' but that's a bit confusing; what is being tested there is that when the MemoryError gets allocated it actually doesn't use the heap (because that is exhausted) but instead falls back to using the emergency exception buffer instead, right? |
Yes. I am going to fix this today...didn't get any time over the weekend. |
(Note: For consistently, I'm going to refer to the stack "backwards", i.e. by memory layout -- the bottom of the stack is the lowest memory address and the most recently pushed items, and the top is MP_STATE_THREAD(stack_top) and the oldest items). So the exact cause of the This is of course a known limation of the MicroPython garbage collector -- it has false positives. The problem goes away with very small changes to the layout, it's just unlucky that the change to enable f-strings made the lexer struct bigger etc. The simplest fix is to just explicitly NULL this pointer in So I think the best way to fix this is to ensure that the allocation for the file is at an address in the heap that will not have been used for a previous allocation, and the easiest way to do that is to allocate more blocks than were used in the parse/compile stage. Note that these must be made of lots of single-block allocations rather than e.g. Adding for i in range(1024):
[] fixes the test. But wait a minute... that's pretty unlucky to have the stack grow back down and not manage to overwrite that value. And... what was that about a stack depth of 1400 words. And how come it takes 20 seconds of CPU time to objdump this binary. And why is the stack pointer at It seems that So the second fix is to disable I checked the arm-linux-gnueabi-gcc for ubuntu:bionic, ubuntu:focus (ubuntu:latest), ubuntu:hirsuite (ubuntu:rolling) and they all miss this optimisation:
Note that ubuntu:bionic's arm-none-eabi-gcc, even at a much older gcc version, has no trouble with it.
|
This compiler is unable to optimise out the giant strcmp match generated by MP_MATCH_COMPRESSED. See github.com/micropython/pull/7659#issuecomment-899479793 Signed-off-by: Jim Mussared <[email protected]>
Prevents the finaliser from being missed if there's a dangling reference on the stack to one of the blocks for the files (that this test checks that they get finalised). See github.com/micropython/pull/7659#issuecomment-899479793 Signed-off-by: Jim Mussared <[email protected]>
Hmm, that's peculiar. But good find! |
Great find! Is there a way we can detect if the compiler cannot optimise away the strcmp? |
I knew you'd ask that. :) The simplest thing I can think of is to pick a symbol (e.g. in this case |
More specifically
gives |
Is this due to a recent change in Micropython or something to do with Ubuntu gcc configuration, because I'm on 10.2.1 and it can optimize away the strcmp tree. Maybe this can be helpful: Configured with: /mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641
/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-
48_20201124_1606180641/install-native --libexecdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-
pipeline-48_20201124_1606180641/install-native/lib --infodir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-
GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace
/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-
eabi/man --htmldir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641
/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-
pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-
plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp
--disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib
--with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-10-
pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/arm-none-eabi --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-
48_20201124_1606180641/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-
GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-10-
pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr --with-isl=/mnt/workspace
/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr --with-
libelf=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native
/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Arm Embedded Toolchain 10-2020-q4-major' --with-multilib-list=rmprofile,aprofile |
@iabdalkader Is this arm-linux-gnueabi-gcc or arm-none-eabi-gcc ? I only see the issue on the former. |
Ah I see, it's arm-none-eabi-gcc sorry. |
So this issue would be fixed also by #4723. |
Ok. Do you think it's worth adding this check and printing a warning? If you're building uPy to actually use then it's most likely you will notice it being very large, but only if you are familiar with what its "usual" size is (which is not going to be well known on a new target, eg RISC-V). |
This compiler is unable to optimise out the giant strcmp match generated by MP_MATCH_COMPRESSED. See github.com//pull/7659#issuecomment-899479793 Signed-off-by: Jim Mussared <[email protected]>
Prevents the finaliser from being missed if there's a dangling reference on the stack to one of the blocks for the files (that this test checks that they get finalised). See github.com//pull/7659#issuecomment-899479793 Signed-off-by: Jim Mussared <[email protected]>
Yes. I believe so.
Yes, I think so. The main concern is that it would have to run on every build, but that's probably OK. I will send a PR. |
With the Windows/AppVeyor failure, it's the same issue (dangling stack pointer into the heap). I was suprised (given how the test works where it creates a linked list filling the entire heap) that a single pointer could pin the entire list. However what happens is that the vast majority of the list is allocated starting at the current heap mark, then a handful more get allocated after a collection, right in the area where previous allocations have been. This maximises the likelyhood that one of the most recent allocations will be where a potential dangling pointer might be.
In the above diagram, A is the dangling pointer, B is the current So the fix suggested by @stinos is good. Also just adding a gc.collect() before the first list element is allocated will also solve it, as it will ensure that the most recent allocation is at the end of the heap. However, I was curious as to why the f-string PR causes this (i.e. why is the heap in a different enough state to make this problematic), and realised that it leaves the fstring_args vstr for the GC, when all the other lexer & parser state is explcitly free'd. So #7679 fixes this, and also fixes this test in the process. I think the test should also be fixed but @dpgeorge noticed that this test isn't actually testing the code path that it's supposed to (because it's triggering the emergency exception buf, not heap allocation), so for now we'll leave it for and re-write the test. Thanks to @stinos for the advice on how to debug this on AppVeyor via RDP. |
Signed-off-by: Damien George <[email protected]>
000ae2d
to
04f11b0
Compare
Update gifio and displayio documentation
No description provided.