Skip to content

Fix GH-10398: Fibers are broken on alpine x86 with clang #10407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Jan 21, 2023

Fixes GH-10398, maybe fixes GH-9357 too.

@nielsdos nielsdos changed the base branch from master to PHP-8.1 January 21, 2023 19:51
@nielsdos nielsdos marked this pull request as ready for review January 21, 2023 21:16
@devnexen devnexen requested a review from trowski January 22, 2023 08:07
@devnexen
Copy link
Member

devnexen commented Jan 22, 2023

Note that the upstream project for those assembly codes come from here

@nielsdos
Copy link
Member Author

Note that the upstream project for those assembly codes come from here

Unless I'm reading over it, it seems that the trampoline part isn't in the original?

@nielsdos
Copy link
Member Author

Note that the upstream project for those assembly codes come from here

Unless I'm reading over it, it seems that the trampoline part isn't in the original?

Nvm I found it.

@nielsdos
Copy link
Member Author

nielsdos commented Jan 22, 2023

Just to be clear:
It also works if you reserve additional stack space, e.g. by adding subl $0x8, %esp at the start of the trampoline: code.
So the trampoline code was overwriting a part of the data passed to the function, or the C code actually does some kind of alignment thing in its machine code. By changing to pushes like I did in this PR, we don't risk overwriting part of the data.

Edit: I originally wrote addl above instead of subl, mistake when typing over the code

@devnexen
Copy link
Member

Your alternative solution has my personal preference, but nevermind you can perfectly defend your case upstream in due time, I have only contributed once a while ago but they re pretty nice as far as I remember.

@trowski
Copy link
Member

trowski commented Jan 25, 2023

@nielsdos Could you please submit this fix upstream to boost/context first? I prefer to get their insight and approval before applying changes to the bundled assembly files here.

@nielsdos
Copy link
Member Author

@trowski I spent some more time figuring out this issue, and it finally clicked. The patch in this PR only fixes the issue by accident. There is no overwrite issue, it's a stack alignment issue...
Upon entering trampoline, the stack isn't 16-byte aligned. This alignment is requires by the SYSV calling convention. The crash then occurs in zend_fiber_trampoline where a SSE instruction is used: movaps %xmm0, 0x10(%esp) but because the stack isn't aligned and SSE requires alignment, the CPU traps.

The fix is simple, this code:

leal -0x28(%eax), %eax

Should actually become:

leal  -0x30(%eax), %eax

This works because 0x30 is a multiple of 16 and the stack was already 16-byte aligned at this point. Other assembly files for i386 may require a modification as well.

I'll make PR to the boost context repo soon-ish.

@nielsdos
Copy link
Member Author

My fix got merged in the boost/context repo.
I can update this PR in one of two possible ways:
(1) We only update the i386 asm files
(2) We update all asm files
I guess we should do (2) as that will bring us up to date with all the changes of upstream?

Fixes phpGH-10398

The stack was misaligned upon entering the trampoline function [1], this
causes a CPU trap when the SSE instruction is executed to copy data from
the stack. This was fixed upstream [2]. This commit syncs all upstream
changes from the boost/context assembly files to our copy.

[1] php#10407 (comment)
[2] boostorg/context#219
@nielsdos
Copy link
Member Author

I've gone for strategy (2) and updated all assembly files.

@trowski
Copy link
Member

trowski commented Jan 28, 2023

Merging in all the changes from boost/context has brought in the the changes made for shadow stack support in #9283, but this is fine I think because the flag isn't set in 8.1 and 8.2, so those sections are skipped anyway.

There's also changes adding a BOOST_CONTEXT_TLS_STACK_PROTECTOR flag from this PR boostorg/context#205. Do we have support for -fstack-protector? If so, some changes may be needed to add this flag (probably not in this PR, maybe only in master).

@nielsdos
Copy link
Member Author

I don't see an option for stack protection in the configure script. I think that the stack protector is off in standard GCC/Clang, but some distros turn it on by default (and also the fortify options iirc). But yeah that can probably go in a separate PR.

Copy link
Member

@trowski trowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Let's merge this then and look at what needs to be done for stack protection and defining BOOST_CONTEXT_TLS_STACK_PROTECTOR in another PR.

@andypost
Copy link
Contributor

andypost commented Feb 3, 2023

FYI I've applied PR as patch and x86-CI no loner reports failed fiber tests https://gitlab.alpinelinux.org/alpine/aports/-/jobs/965476

@andypost
Copy link
Contributor

andypost commented Feb 3, 2023

btw uncommented 3 tests that failed on ppc64le https://gitlab.alpinelinux.org/alpine/aports/-/jobs/965496


Zend/tests/fibers/no-switch-force-close-finally.phpt
Zend/tests/fibers/suspend-in-force-close-fiber-after-shutdown.phpt
Zend/tests/fibers/throw-in-multiple-destroyed-fibers-after-shutdown.phpt

@andypost
Copy link
Contributor

andypost commented Feb 3, 2023

bit they are fails

TEST 2620/15985 [Zend/tests/fibers/no-switch-force-close-finally.phpt]
========DIFF========
     finally
002- 
003- Fatal error: Uncaught FiberError: Cannot switch fibers in current execution context in %sno-switch-force-close-finally.php:%d
004- Stack trace:
005- #0 %sno-switch-force-close-finally.php(%d): Fiber->start()
006- #1 [internal function]: {closure}()
007- #2 {main}
008-   thrown in %sno-switch-force-close-finally.php on line %d
========DONE========
FAIL Cannot start a new fiber in a finally block in a force-closed fiber [Zend/tests/fibers/no-switch-force-close-finally.phpt] 
...
TEST 2636/15985 [Zend/tests/fibers/suspend-in-force-close-fiber-after-shutdown.phpt]
========DIFF========
     done
002- 
003- Fatal error: Uncaught FiberError: Cannot suspend in a force-closed fiber in %ssuspend-in-force-close-fiber-after-shutdown.php:%d
004- Stack trace:
005- #0 %ssuspend-in-force-close-fiber-after-shutdown.php(%d): Fiber::suspend()
006- #1 [internal function]: {closure}()
007- #2 {main}
008-   thrown in %ssuspend-in-force-close-fiber-after-shutdown.php on line %d
========DONE========
FAIL Suspend in force-closed fiber after shutdown [Zend/tests/fibers/suspend-in-force-close-fiber-after-shutdown.phpt] 
...
TEST 2642/15985 [Zend/tests/fibers/throw-in-multiple-destroyed-fibers-after-shutdown.phpt]
========DIFF========
     done
002- 
003- Fatal error: Uncaught Exception: test1 in %sthrow-in-multiple-destroyed-fibers-after-shutdown.php:%d
004- Stack trace:
005- #0 [internal function]: {closure}()
006- #1 {main}
007- 
008- Next Exception: test2 in %sthrow-in-multiple-destroyed-fibers-after-shutdown.php:%d
009- Stack trace:
010- #0 [internal function]: {closure}()
011- #1 {main}
012-   thrown in %sthrow-in-multiple-destroyed-fibers-after-shutdown.php on line %d
========DONE========
FAIL Throw in multiple destroyed fibers after shutdown [Zend/tests/fibers/throw-in-multiple-destroyed-fibers-after-shutdown.phpt] 

@nielsdos
Copy link
Member Author

nielsdos commented Feb 3, 2023

Yeah I only spent time debugging and fixing x86, I might look at the PPC cases sometime later, but that's unrelated to this PR though.

@devnexen devnexen closed this in 49551d7 Feb 5, 2023
@andypost
Copy link
Contributor

andypost commented Feb 5, 2023

Thank you!

@nielsdos
Copy link
Member Author

nielsdos commented Feb 5, 2023

Thank you!

My pleasure :)

@devnexen Thanks for merging. This also fixes GH-9357 as confirmed here: #10407 (comment). Could you please close that issue too? :)

@andypost
Copy link
Contributor

andypost commented Feb 5, 2023

Follow-up for fibers on ppc64le #10512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fibers are broken on alpine x86 with clang
4 participants