Skip to content

Fix PDO OCI Bug #60994 (Reading a multibyte CLOB caps at 8192 charact… #5233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

gschc1
Copy link

@gschc1 gschc1 commented Mar 4, 2020

…ers)

@cmb69
Copy link
Member

cmb69 commented Mar 4, 2020

@cjbj could you have a look at this, please?

@cmb69 cmb69 added the Bug label Mar 4, 2020
@cjbj
Copy link
Contributor

cjbj commented Mar 4, 2020

@cmb69 I'll take a look

@cjbj
Copy link
Contributor

cjbj commented Mar 18, 2020

I've only had a brief chance to look at this. It would be useful to know the intent and whether NCLOB testing was done. Also, it looked like self->csfrm could be unitialized.

@gschc1
Copy link
Author

gschc1 commented Mar 18, 2020

The intent is to fix an issue we encountered in a production environment reading data from CLOB fields. We store Unicode character-based data in CLOB fields. When the data contains Unicode characters other than single byte characters, partial data may be returned from the read (OCILobRead). After determining the underlying problem and fixing it for our build of PHP, I saw that bug60994 had been opened for this same issue years ago and thought others may benefit if this fix were upstreamed. The underlying problem is that OCI calls like OCILobRead can't distinguish bytes vs. characters for scenarios like this, which may be one of the reasons Oracle added the OCILobRead2 function (and other '2' functions).

We don't use NCLOBs and I did not test that. We use CLOBs with data encoded in AL32UTF8 (the database character set). It would take me a few days to get to it, but I can test NCLOBs if you think that should be part of this issue.

You're right about self->csfrm not being initialized. According to the Oracle documentation that field should not be used for BLOBs (i.e. self->csid=0, which is also why I didn't make the OCILobCharSetForm query in that case) in the associated OCI calls, but you are right that I should initialize it to zero, particularly if it were to be used in other calls in the future, and I will submit another PR. Let me know your thoughts on whether NCLOBs should be a part of this issue, and if so I'll change the tests to also create NCLOB fields and associated tests. Thanks

@gschc1
Copy link
Author

gschc1 commented Mar 21, 2020

I looked at NCLOB processing. For NCLOBs to work, the last parameter to Read/Read2 needs to be SQLCS_NCHAR instead of SQL_IMPLICIT. I tested that out (hardcoded) and it works, at least with our setup, database char set AL32UTF8 and national char set AL16UTF16. But there isn't currently a clean way to toggle that dynamically that I know of, so I just set it to SQLCS_IMPLICIT like the original code had. This also pointed out that I didn't need self->csfrm or the call to set it, and in fact shouldn't have done that. I've added a new commit. I believe it would take a larger change to set the IMPLICIT/NCHAR flag dynamically but you may know of some way to be able to determine this dynamically, and if so I'll look into that. Thanks

@gschc1
Copy link
Author

gschc1 commented Mar 21, 2020

I think I now have it working for both CLOBs and NCLOBs but need to do more testing. I'll likely push another commit after I have completed the testing (may have to be early next week).

@cjbj
Copy link
Contributor

cjbj commented Mar 23, 2020

Thanks for digging more into this. One thing I wanted to review in context of this change was when/why pdo_oci_fread_1.phpt started failing; this seems to pre-date your PR.

@gschc1
Copy link
Author

gschc1 commented Mar 24, 2020

This fix should now be working for nclobs as well. I tested with instantclient 10.2.0.5, 11.2.04, and 12.1.0.2. I also tested with no charset on the DSN but setting NLS_LANG in the environment.

Also I ran the test you mentioned above (fread_1) with and without this fix against each of those three clients, changing the DSN to have charset=WE8MSWIN1252. In the environment I tested with (Linux, Oracle EE 11.2 with database charset AL32UTF8, national character set AL16UTF16) the fread_1 test passed each time. Not that that helps you too much but just a datapoint. If you can let me know your setup where you get this failure I can try to help debug.

@mvorisek
Copy link
Contributor

I can confirm this bug still present. Can someone review and merge?

@cjbj cjbj self-assigned this Nov 22, 2021
@mvorisek
Copy link
Contributor

@cjbj can you please prioritize this fix? We need this for atk4/data framework, currently all 8+KB tests are failing and it seems there is no workaround on the PDO fetch once data are selected. Thanks.

@cjbj
Copy link
Contributor

cjbj commented Dec 10, 2021

@mvorisek I'll see if I can find a developer who can allocate time.

mvorisek added a commit to mvorisek/php-src that referenced this pull request Feb 1, 2022
mvorisek added a commit to mvorisek/php-src that referenced this pull request Feb 1, 2022
mvorisek added a commit to mvorisek/php-src that referenced this pull request Feb 24, 2022
@ramsey
Copy link
Member

ramsey commented May 31, 2022

@gschc1, please update this to target PHP-8.0.

@cjbj, reminder to have someone review this 😃

@mvorisek
Copy link
Contributor

mvorisek commented Jun 1, 2022

@ramsey you can close in favor of #8018

@cmb69
Copy link
Member

cmb69 commented Sep 16, 2022

Indeed, closing in favor of #8018.

@cmb69 cmb69 closed this Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants