-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Problem with pandas.Series.std() #2888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I am new to github as a whole. Thanks, |
I believe this is the usual 1/sqrt(N) vs. 1/sqrt(N-1) issue. a=pd.Series(np.array(range(10)))
In [6]: a.std()
Out[6]: 3.0276503540974917
In [7]: a.values.std(ddof=1)
Out[7]: 3.0276503540974917
In [8]: a.values.std()
Out[8]: 2.8722813232690143
In [9]: a.std(ddof=0)
Out[9]: 2.8722813232690143 |
hey actually I loaded his dataset, it's apparently an In [5]: S.astype('float64').std()
Out[5]: 17570.097551905212
In [6]: S.astype('int64').std()
Out[6]: 39192.660667875185 still figuring out exactly where the overflow/roundoff is happening. |
got it, in "nanops.py" with some print statements added: X = _ensure_numeric(values.sum(axis))
XX = _ensure_numeric((values ** 2).sum(axis))
print type(X)
print X
print X ** 2 yields
this would work if i can do a patch, but I'm not sure if the correct solution is to cast anyone have any thoughts? |
actually...I think the only safe thing to do is to cast everything to In [6]: np.asarray([4063418664], dtype='int64')
Out[6]: array([4063418664], dtype=int64)
In [7]: np.asarray([4063418664], dtype='int64') ** 2
Out[7]: array([-1935372834766006720], dtype=int64)
In [8]: np.asarray([2**63-1,1], dtype='int64')
Out[8]: array([9223372036854775807, 1], dtype=int64)
In [9]: np.asarray([2**63-1,1], dtype='int64').sum()
Out[9]: -9223372036854775808 |
(or we can just avoid calling |
Upcast to float is fine because stdev should always yield a floating point number. |
okay, i apparently |
Thanks, I totally missed his intention with the examples. |
Wow, thanks a lot for such a quick response! Thank you, |
Hi,
I ran the following commands on "pandas.Series" object named "S":
Correct standard deviation is 17570.02589
I should have got the same value in all 4 cases.
This seems to be a bug in pandas.Series
Pickle file for "S" object is here:
https://gist.github.com/yashoteja/4971970
Thank you,
Yashoteja
The text was updated successfully, but these errors were encountered: