-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
AttributeError: 'numpy.ndarray' object has no attribute '_get_repr' with np.abs on a DatetimeIndex #2948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
try updating to master the printing is just a red herring its an exception on the subtraction |
I downloaded the latest version from http://pandas.pydata.org/pandas-build/dev/ yesterday - looking at the times of that pull request, it looks like it should have been included. Is there a way to check from within pandas which version I am running? |
print pandas.version which file? |
all of the windows version looks pretty recent, so should be good if you are windows, then this should work, |
The output of Yes, I am using Windows with Python 2.7. |
yep...that looks right |
Does it seem that this wasn't fixed in #2899 then? Is there any other information you'd like me to provide to help fix this issue? |
I would need a test case that reproduces the error |
Hmm...I'm not sure entirely where the problem lies - but it definitely seems to be related to the length of the DataFrame. I've tried creating a generic sequence of dates and running
Also, the results look fairly sensible - differences are a maximum of 2 days. However, if I run the same code with Results where n = 100 Results where n = 101 Something strange definitely seems to be going on there! Going back to the example that I provided when opening this issue (shown in the gist at https://gist.github.com/robintw/631b53fa0cd9dbdabb36), the results seem to be sensible (a max difference of 3 days) when run with the
That makes me suspect that the results given by the Does that help? |
You're substracting a Short Answer: use Other then that, think you've got your date format wrong, most of the data is on consecutive days you need to specigy a date format or store your date strings as iso8601 which is preferable, exacly |
this is actually a very subtle error this 'works', but the final answer you are looking for 'a' I think numpy 1.6.2 does the wrong thing on an np.abs(of a timedelta64).... what should your final answer look like?
|
together with #2888, this is pointing at low test coverage for large integer cases. is there a way |
Thanks for all the investigation of this. @y-p: Using aeronet.time - pd.Timestamp(image_time) gives the same error for me. I hadn't noticed that after doing Basically, the results given by @jreback in the final line of his example seem right (apart from the fact that the datatype has been switched to us from ns). Once I've got the Interestingly, there also seems to be a problem with the final bit of my requirement: getting the minimum value. I suspect it is related to the other problems that you've identified:
On a separate note: once we've sorted these problems, would it be worse adding a |
@robintw the np.abs is a numpy bug, its prob fixed in 1.7, I have a work around for you, min/max should work, will check argmin/max as well..... I'll update master later today and let you know |
ah yes, pd.Timestamp has no effect, my bad. |
I'm running 1.6.1 it seems. I wasn't aware there was a newer version actually. Shall I try updating to 1.7 and see if that solves some of these problems? (Or do you want me to stay on 1.6.1 or go to 1.6.2 for debugging purposes?). I have a suspicion that |
it won't solve most of these.,but you should use at least 1.6.2 in any event you'll have to way for me to merge to master and then the dev builds...(prob tomorrow or next day)...they are auto-generated I get windows builds from here for all python stuff |
Travis provides numpy preinstalled, 1.6.1 is apperently what's bundled with precise. |
README says 1.6.1 is supported. if that needs to be changed, you need to ask wes. |
no...numpy 1.6.1 is fine its good that we test with this (but only on 3.2)...prob not worth having a 2.7 with 1.6.1 thxs all good now |
@robintw ok...i pushed to master....but as you are on windows, check the lastest dev builds for updates Iit should have this commit: 819e0ad (or later) in it give it a whirl and report back also http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas has an update (I think after 5pm EST) with some more supported ops |
Unfortunately I can't test this on my Windows machine at the moment as the builds available at http://pandas.pydata.org/pandas-build/dev/ haven't been updated since the 25th Feb. Any ideas what's going on there? I'll try and test on my Mac in a bit and get back to you. |
these builds are updated once a day, not exactly sure when...check back later for the windows updates |
From the list of commits it looks like there have been changes on the 26th and 28th Feb, but no new builds since the 25th. It also says at http://pandas.pydata.org/getpandas.html: "Stable windows binaries are built on a rotating basis every hour, as long as there have been code changes on github since the previous build. You can find the latest builds here." Is there someone I should contact to either (a) change the text on the Get Pandas page or (b) see if the automated Windows builds have stopped being built for some reason? |
I'll check the build box when I get home (it's sitting in my apartment) |
Thanks @wesm. @jreback: I've installed master from source on my Mac and the code sample I gave originally works without errors - which is great :-) Using the original code sample, I can't get Using the workaround in #2957 I can get
|
use idxmin instead (same idea but handles numpy bugs) |
also post your code again pls |
Ahh brilliant - Which code did you want to see? |
where u use min I think should work directly rather than u have to do a conversion |
Gives an error saying: Replacing the (Just FYI, this is with the latest master of pandas, and numpy 1.6.2.) |
ahh....that is correct, you need my workaround for that....fyi min should work on a timedelta series in any event thanks for the debug... let me know if anything else not working, or updated docs (prob available later today)... |
Thank you all for the help - as soon as I can get hold of the Windows builds all my problems will have been solved (well...all of my Pandas-related problems anyway). Thank you very much to everyone who contributed for helping with the debugging and investigation. Is it worth suggesting a Once I've checked this all works on Windows, I will close this issue. |
yes...pls post a new issue (with your code as revised) |
I'm trying to install pandas from source on Windows to test this, as the windows builds online don't seem to have been updated yet, but am running into a lot of errors like:
Do you have any idea what might be causing this problem? It looks like it isn't able to link into the basic Python stuff properly, but I'm not sure why. Any ideas? |
FYI looks like windows builds are updated |
Thanks :-) I've installed the Windows builds and everything seems to be working fine - it seems to have fixed all of the problems that I've reported, and upgrading to NumPy 1.7 seems to have solved some of the NumPy issues. Thanks for all your help everyone - it's wonderful to find a great open-source community like this. |
np....also good to have users who can troubleshoot :) |
Just one more quick question: A Python library I have released now depends on this bugfix in Pandas. Do you have any idea when this fix will appear in a formal release? I found the development roadmap, but it didn't have any approximate dates for the next formal release - any ideas? |
should be this month sometime |
I am trying to find the row of a DataFrame that is closest to a certain datetime. I have my data in a DataFrame which has a DatetimeIndex, and I can't use functions like
asof
because I need to get the closest row (either before or after the time), so I am trying to implement the standard way of doing this which is getting the absolute difference between each index time and the time I want to find, then finding the row with the minimum difference.However, there seems to be a strange bug with using np.abs with the pandas index, and it seems to be in the printing of the results stage (relating to
__get_repr
). Pandas prints things differently depending on how many rows the DataFrame has, and I think this is related here - as running the code on a 200 row DataFrame fails, but on a 100 row DataFrame it works! I've checked that its not some strange value in the DataFrame which is causing the problem, as a 100 row segment from anywhere in my original input file works, whereas a 200 row segment from anywhere fails.Two input files (aeronet_large.txt with 200 rows and aeronet.txt with 100 rows) and the code I've been using are available at this gist: https://gist.github.com/robintw/631b53fa0cd9dbdabb36, and I asked a StackOverflow question about this (http://stackoverflow.com/questions/15115547/find-closest-row-of-dataframe-to-given-time-in-pandas) where I was recommended to raise an issue.
The text was updated successfully, but these errors were encountered: