Skip to content

Fix #69181: READ_CSV|DROP_NEW_LINE drops newlines within fields #7618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Oct 28, 2021

One may argue that DROP_NEW_LINE does not make sense in combination
with READ_CSV, but without DROP_NEW_LINE, SKIP_EMPTY does not
skip empty lines at all. We could fix that, but do not for BC reasons.
Instead we no longer drop newlines in spl_filesystem_file_read() when
reading CSV, but handle that in spl_filesystem_file_read_csv() by
treating lines with only (CR)LF as being empty as well.


@nyamsprod, would that cause any BC issues? Also, @Girgias may want to have a look at this.

@Girgias
Copy link
Member

Girgias commented Oct 28, 2021

Sounds reasonable to me.

@g105b
Copy link

g105b commented Oct 28, 2021

Great, it works perfectly as far as I can tell. BTW, I tried fixing this via a contribution to php-src 6 years ago, but I didn't know what I was doing and ended up having a lot of breaking changes/other side effects.

@cmb69
Copy link
Member Author

cmb69 commented Oct 28, 2021

@g105b, yeah, I've re-checked PR #1148, and your idea was actually quite close to the solution; the missing point was to "handle that in spl_filesystem_file_read_csv() by treating lines with only (CR)LF as being empty as well". :)

@g105b
Copy link

g105b commented Oct 28, 2021

That makes sense. Thanks for the explanation and great work.

@@ -2045,7 +2045,7 @@ static int spl_filesystem_file_read(spl_filesystem_object *intern, int silent) /
intern->u.file.current_line = estrdup("");
intern->u.file.current_line_len = 0;
} else {
if (SPL_HAS_FLAG(intern->flags, SPL_FILE_OBJECT_DROP_NEW_LINE)) {
if (SPL_HAS_FLAG(intern->flags, SPL_FILE_OBJECT_DROP_NEW_LINE) && !SPL_HAS_FLAG(intern->flags, SPL_FILE_OBJECT_READ_CSV)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this also affect the behavior of the fgets() method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! (in case ::READ_CSV is set)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the !SplFileObject::READ_CSV case is an issue, since ::fgetcsv() exhibits the same broken behavior (namely, that it drops the first linebreak in a quoted field): https://3v4l.org/OEiFq. This should now be fixed.

@nyamsprod
Copy link
Contributor

nyamsprod commented Nov 3, 2021

@cmb69 sorry for the late reply. From League CSV stand point I removed the use of DROP_NEW_LINE a while ago I think around v8 release because the behaviour was buggy so I would approve removing that flag for CSV usage. If that removal is silent (ie the code keep using the rest of the flag) then the BC break will only be found in weird set up I think or if someone copy/paste a code on the internet suggesting to use the flag while it fact this is an incorrect suggestion.

So TL;DR:

  • overall a 👍🏾 from me even if I can not comment on the actual implementation.
  • BC break should remain low if the flag is silently no longer taken into account or if the first time used an E_DEPRECATED is issued with the flag being silently ignored

Hope that comment will help

@cmb69 cmb69 marked this pull request as draft November 9, 2021 17:30
@Girgias
Copy link
Member

Girgias commented Jun 20, 2022

@cmb69 is this bug still present?

@cmb69
Copy link
Member Author

cmb69 commented Jun 20, 2022

Yes, the bug is still present, but this PR needs more work, and I'm reluctant to fix this in any of the stable branches. I'll have a closer look soonish.

One may argue that `DROP_NEW_LINE` does not make sense in combination
with `READ_CSV`, but without `DROP_NEW_LINE`, `SKIP_EMPTY` does not
skip empty lines at all.  We could fix that, but do not for BC reasons.
Instead we no longer drop newlines in `spl_filesystem_file_read_ex()`
when reading CSV, but handle that in `spl_filesystem_file_read_csv()`
by treating lines with only (CR)LF as being empty as well.
@cmb69 cmb69 changed the base branch from PHP-7.4 to master July 24, 2022 15:11
@cmb69
Copy link
Member Author

cmb69 commented Jul 24, 2022

I've rebased onto "master" (after all, this is SPL), and fixed the issue for SplFileObject::fgetcsv() as well.

@cmb69 cmb69 marked this pull request as ready for review July 24, 2022 16:01
@cmb69 cmb69 requested a review from Girgias July 24, 2022 16:01
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of the test looks correct to me. Just a couple of comment to clarify stuff

@@ -1887,7 +1887,7 @@ static zend_result spl_filesystem_object_cast(zend_object *readobj, zval *writeo
}
/* }}} */

static zend_result spl_filesystem_file_read_ex(spl_filesystem_object *intern, bool silent, zend_long line_add) /* {{{ */
static zend_result spl_filesystem_file_read_ex(spl_filesystem_object *intern, bool silent, zend_long line_add, bool csv) /* {{{ */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was thinking was instead of a csv bool arg, to pass the SPL flags as an argument and spl_filesystem_file_read_csv() to add the SPL_FILE_OBJECT_READ_CSV flag when calling this function. Not sure if this is cleaner tho

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess we could even temporarily add the SPL_FILE_OBJECT_READ_CSV flag. But I'm afraid it's somewhat messy either way.

@cmb69 cmb69 closed this in 5d52d47 Jul 26, 2022
@cmb69 cmb69 deleted the cmb/69181 branch July 26, 2022 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants