-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Improvements to _php_stream_copy_to_stream_ex() #10603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The initialization is not necessary, and the function is already called in the if condition below.
copy_file_range can return early without copying all the data. This is not necessarily an error, the copy could've just been interrupted. In that case it makes sense to keep using the file_copy_range syscall to have the best performance. Since the copy is now repeated, I had to make some changes to our file_copy_range interposer such that we can limit the amount of times it is called. I also added two variants such that both the fallback and repeated copy are tested now.
54b0213
to
866120a
Compare
… a too large length Some filesystems will cause failures if the max length is greater than the file length. We therefore use a stat call to get the actual file size. Since stat calls are cached this shouldn't have much impact on performance. This addresses a performance concern for CIFS filesystems and is described in phpGH-10548. In case the cached result is less than the actual size, the code will just copy more in the next iteration. In case it is too large, it will at worst fall into the EIO error case and therefore fall back to the mmap or regular copy. In case of races the reasoning is the same as for a wrong cached result. Therefore, this won't cause problems.
cc @knowsshit I also pushed the performance improvement described in #10548 (comment). I thought I'd tag you because I know you care about the performance for your usecase. |
@@ -1563,73 +1563,93 @@ PHPAPI zend_result _php_stream_copy_to_stream_ex(php_stream *src, php_stream *de | |||
src->writepos == src->readpos) { | |||
/* both php_stream instances are backed by a file descriptor, are not filtered and the | |||
* read buffer is empty: we can use copy_file_range() */ | |||
int src_fd, dest_fd, dest_open_flags = 0; | |||
int src_fd, dest_fd, dest_open_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might have been some warning in of the tool about potentially uninitialised memory but not 100% sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Should be cheap anyway, so I'll restore this.
|
||
/* get dest open flags to check if the stream is open in append mode */ | ||
php_stream_parse_fopen_modes(dest->mode, &dest_open_flags); | ||
/* Some filesystems will cause failures if the max length is greater than the file length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just a minor use case? Does it make sense to optimize for that and penalise (even though it might be minor penalisation but it's still extra syscall) majority of use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the stat would be pretty cheap in comparison to the file IO, but you might be right about this. I'll think about it a bit more
Kind of obsolete, and the complexity after bailing out of copy_file_range doesn't seem realy worth it given that it should be a rare occurrence. |
As I promised in #10440 and #10551, here are some improvements to _php_stream_copy_to_stream_ex()
It removes some redundant code, and it also places copy_file_range in a loop.
copy_file_range can return early without copying all the data. This is
not necessarily an error, the copy could've just been interrupted. In
that case it makes sense to keep using the file_copy_range syscall to
have the best performance.
Since the copy is now repeated, I had to make some changes to our
file_copy_range interposer such that we can limit the amount of times it
is called. I also added two variants such that both the fallback and
repeated copy are tested now.
This implements the performance improvement described in copy() fails on cifs mounts because of incorrect length (cfr_max) specified in streams.c:1584 copy_file_range() #10548 (comment)
/cc @arnaud-lb Since he reviewed my previous 2 stream-related PRs.