Skip to content

Improvements to _php_stream_copy_to_stream_ex() #10603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Feb 15, 2023

As I promised in #10440 and #10551, here are some improvements to _php_stream_copy_to_stream_ex()

  • It removes some redundant code, and it also places copy_file_range in a loop.

  • copy_file_range can return early without copying all the data. This is
    not necessarily an error, the copy could've just been interrupted. In
    that case it makes sense to keep using the file_copy_range syscall to
    have the best performance.
    Since the copy is now repeated, I had to make some changes to our
    file_copy_range interposer such that we can limit the amount of times it
    is called. I also added two variants such that both the fallback and
    repeated copy are tested now.

  • This implements the performance improvement described in copy() fails on cifs mounts because of incorrect length (cfr_max) specified in streams.c:1584 copy_file_range() #10548 (comment)

/cc @arnaud-lb Since he reviewed my previous 2 stream-related PRs.

The initialization is not necessary, and the function is already called
in the if condition below.
copy_file_range can return early without copying all the data. This is
not necessarily an error, the copy could've just been interrupted. In
that case it makes sense to keep using the file_copy_range syscall to
have the best performance.
Since the copy is now repeated, I had to make some changes to our
file_copy_range interposer such that we can limit the amount of times it
is called. I also added two variants such that both the fallback and
repeated copy are tested now.
… a too large length

Some filesystems will cause failures if the max length is greater than
the file length. We therefore use a stat call to get the actual file size.
Since stat calls are cached this shouldn't have much impact on performance.
This addresses a performance concern for CIFS filesystems and is described in phpGH-10548.
In case the cached result is less than the actual size, the code will just copy more
in the next iteration. In case it is too large, it will at worst fall into the EIO error
case and therefore fall back to the mmap or regular copy. In case of races the reasoning
is the same as for a wrong cached result. Therefore, this won't cause problems.
@nielsdos
Copy link
Member Author

cc @knowsshit I also pushed the performance improvement described in #10548 (comment). I thought I'd tag you because I know you care about the performance for your usecase.

@Girgias Girgias requested a review from arnaud-lb February 23, 2023 15:10
@@ -1563,73 +1563,93 @@ PHPAPI zend_result _php_stream_copy_to_stream_ex(php_stream *src, php_stream *de
src->writepos == src->readpos) {
/* both php_stream instances are backed by a file descriptor, are not filtered and the
* read buffer is empty: we can use copy_file_range() */
int src_fd, dest_fd, dest_open_flags = 0;
int src_fd, dest_fd, dest_open_flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might have been some warning in of the tool about potentially uninitialised memory but not 100% sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Should be cheap anyway, so I'll restore this.


/* get dest open flags to check if the stream is open in append mode */
php_stream_parse_fopen_modes(dest->mode, &dest_open_flags);
/* Some filesystems will cause failures if the max length is greater than the file length.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just a minor use case? Does it make sense to optimize for that and penalise (even though it might be minor penalisation but it's still extra syscall) majority of use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the stat would be pretty cheap in comparison to the file IO, but you might be right about this. I'll think about it a bit more

@nielsdos
Copy link
Member Author

nielsdos commented Apr 2, 2025

Kind of obsolete, and the complexity after bailing out of copy_file_range doesn't seem realy worth it given that it should be a rare occurrence.

@nielsdos nielsdos closed this Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants