
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
BUG: np.mean(pd.Series) != np.mean(pd.Series.values) Β· Issue #42878 Β· pandas-dev/pandas
Description:
I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. (optional) I have confirmed this bug exists on the master branch of pandas. Code Sample, a copy-pastable exam...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Technology & Computing
- Mobile Technology & AI
- Books & Literature
Content Management System {π}
What CMS is github.com built with?
Github.com employs WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 4,989,889 paying customers.
The estimated monthly recurring revenue (MRR) is $20,957,532.
The estimated annual recurring revenues (ARR) are $251,490,385.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
pandas, sebasv, error, numpy, assert, float, commented, issue, npfloat, contributor, bug, bottleneck, precision, author, code, fpandas, relative, kahan, temp, sign, issues, npmeanpdseriesvalues, expected, output, printfrelative, casting, errors, edited, edits, man, jreback, library, dont, projects, npmeanpdseries, problem, python, npmeana, impl, printfabsolute, absolute, member, summation, kahanc, react, uninstall, dependency, navigation, pandasdev, pull,
Topics {βοΈ}
f'absolute error f'relative error 210e-04 absolute error numpy/pandas f32 204e-06 relative error ergo implicit f64 f32 precision equal personal information bug pandas-side solution call tracing set warning message move increased memory usage comment metadata assignees note `sum` returns python call tracing relative error underlying numpy ndarray pandas f64 pandas dependency type numpy f64 temp2 - kahan pip uninstall bottleneck bad summation routine numpy impl assert conda remove bottleneck f32 precision pd import numpy type projects '\nrelative error triage issue error compounds projects milestone import pandas pandas impl uninstall pandas bug exists required dependency pandas object pandas users numpy impl kahan_c = np 066e-14 errors reproducibility issues output errors kahan = np numpy float naive summation compensated summation code sample numerical precision
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
- If I draft up a PR to add mean to the not-bottleneck list, will it be considered for merging?
- Shouldn't the answer be "do not use that library"?
- What would be the preferred action for pandas?
- Assert isinstance(b, float) # PYTHON float, not numpy float?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:BUG: np.mean(pd.Series) != np.mean(pd.Series.values)
articleBody:- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandas.
- [x] (optional) I have confirmed this bug exists on the master branch of pandas.
---
#### Code Sample, a copy-pastable example
```python
import pandas as pd
import numpy as np
a = pd.Series(np.random.normal(scale=0.1, size=(1_000_000,)).astype(np.float32)).pow(2)
assert isinstance(np.mean(a), float)
assert isinstance(np.mean(a.values), np.float32)
assert abs(1 - np.mean(a)/np.mean(a.values)) > 4e-4
```
#### Problem description
1. `pd.DataFrame.mean`/`pd.Series.mean`/`np.mean(pd.Series)` outputs a Python float instead of a numpy float. Since `np.mean(pd.Series.values)` does return an np float, I'm assuming for now that this should be fixed in pandas
2. if `dtype==np.float32`, then calling `mean` on a pandas object gives a significantly different result vs calling `mean` on the underlying numpy ndarray.
#### Expected Output
The output of `np.mean(a)` should be the same as `np.mean(a.values)`.
additional tests
```python
# both b and c ~1e-2
b = a.mean() # the pandas impl of mean
assert isinstance(b, float) # PYTHON float, not numpy float? Ergo implicit f64
h = np.mean(a)
assert isinstance(h, float)
assert h == b
c = a.values.mean() # the numpy impl of mean
assert isinstance(c, np.float32) # as exprected
print('\nerrors between pandas mean and numpy mean')
print(f'relative error: {abs(1-b/c):.3e}') # ~ 5e-4
print(f'absolute error: {abs(b -c):.3e}') # ~ 5e-6
print(f'relative error after casting: {abs(1-np.float32(b)/c):.3e}') # ~ 5e-4
print(f'absolute error after casting: {abs(np.float32(b) -c):.3e}') # ~ 5e-6
d = a.sum() / len(a)
assert isinstance(d, np.float64) # expected, because division. Note `sum` returns an np.float32
e = a.values.sum() / len(a)
assert isinstance(e, np.float64) # expected, because division
# these methods are equivalent
assert d==e
# and up to f32 precision equal to the numpy impl
assert d.astype(np.float32) == c
# the cherry on the cake
f = a.astype(np.float64).mean()
assert isinstance(f, float) # still not ideal, should be np.float64
g = a.astype(np.float64).values.mean()
print('\nrelative error between pandas f64 mean and numpy f64 mean')
print(f'relative error numpy f64/pandas f64: {abs(1-g/f):.3e}') # ~ 1e-14 -- 1e-16, not bad but I would have expected equality
print('\nerrors between pandas f64 mean and numpy/pandas f32 mean')
print(f'relative error pandas f32/pandas f64: {abs(1-b/f):.3e}') # ~ 5e-4
print(f'absolute error numpy f32/pandas f64: {abs(1-c/f):.3e}') # ~ 1e-7 -- 1e-9
# finally...
h = np.mean(a)
assert isinstance(h, float)
assert h == b
```
output
```python
errors between pandas mean and numpy mean
relative error: 5.210e-04
absolute error: 5.204e-06
relative error after casting: 5.210e-04
absolute error after casting: 5.204e-06
relative error between pandas f64 mean and numpy f64 mean
relative error numpy f64/pandas f64: 1.066e-14
errors between pandas f64 mean and numpy/pandas f32 mean
relative error pandas f32/pandas f64: 5.214e-04
absolute error numpy f32/pandas f64: 2.399e-07
```
#### Output of ``pd.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-80-generic
Version : #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.1
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 3.0.0
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2
</details>
author:
url:https://github.com/sebasv
type:Person
name:sebasv
datePublished:2021-08-04T09:37:14.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:14
url:https://github.com/42878/pandas/issues/42878
context:https://schema.org
headline:BUG: np.mean(pd.Series) != np.mean(pd.Series.values)
articleBody:- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandas.
- [x] (optional) I have confirmed this bug exists on the master branch of pandas.
---
#### Code Sample, a copy-pastable example
```python
import pandas as pd
import numpy as np
a = pd.Series(np.random.normal(scale=0.1, size=(1_000_000,)).astype(np.float32)).pow(2)
assert isinstance(np.mean(a), float)
assert isinstance(np.mean(a.values), np.float32)
assert abs(1 - np.mean(a)/np.mean(a.values)) > 4e-4
```
#### Problem description
1. `pd.DataFrame.mean`/`pd.Series.mean`/`np.mean(pd.Series)` outputs a Python float instead of a numpy float. Since `np.mean(pd.Series.values)` does return an np float, I'm assuming for now that this should be fixed in pandas
2. if `dtype==np.float32`, then calling `mean` on a pandas object gives a significantly different result vs calling `mean` on the underlying numpy ndarray.
#### Expected Output
The output of `np.mean(a)` should be the same as `np.mean(a.values)`.
additional tests
```python
# both b and c ~1e-2
b = a.mean() # the pandas impl of mean
assert isinstance(b, float) # PYTHON float, not numpy float? Ergo implicit f64
h = np.mean(a)
assert isinstance(h, float)
assert h == b
c = a.values.mean() # the numpy impl of mean
assert isinstance(c, np.float32) # as exprected
print('\nerrors between pandas mean and numpy mean')
print(f'relative error: {abs(1-b/c):.3e}') # ~ 5e-4
print(f'absolute error: {abs(b -c):.3e}') # ~ 5e-6
print(f'relative error after casting: {abs(1-np.float32(b)/c):.3e}') # ~ 5e-4
print(f'absolute error after casting: {abs(np.float32(b) -c):.3e}') # ~ 5e-6
d = a.sum() / len(a)
assert isinstance(d, np.float64) # expected, because division. Note `sum` returns an np.float32
e = a.values.sum() / len(a)
assert isinstance(e, np.float64) # expected, because division
# these methods are equivalent
assert d==e
# and up to f32 precision equal to the numpy impl
assert d.astype(np.float32) == c
# the cherry on the cake
f = a.astype(np.float64).mean()
assert isinstance(f, float) # still not ideal, should be np.float64
g = a.astype(np.float64).values.mean()
print('\nrelative error between pandas f64 mean and numpy f64 mean')
print(f'relative error numpy f64/pandas f64: {abs(1-g/f):.3e}') # ~ 1e-14 -- 1e-16, not bad but I would have expected equality
print('\nerrors between pandas f64 mean and numpy/pandas f32 mean')
print(f'relative error pandas f32/pandas f64: {abs(1-b/f):.3e}') # ~ 5e-4
print(f'absolute error numpy f32/pandas f64: {abs(1-c/f):.3e}') # ~ 1e-7 -- 1e-9
# finally...
h = np.mean(a)
assert isinstance(h, float)
assert h == b
```
output
```python
errors between pandas mean and numpy mean
relative error: 5.210e-04
absolute error: 5.204e-06
relative error after casting: 5.210e-04
absolute error after casting: 5.204e-06
relative error between pandas f64 mean and numpy f64 mean
relative error numpy f64/pandas f64: 1.066e-14
errors between pandas f64 mean and numpy/pandas f32 mean
relative error pandas f32/pandas f64: 5.214e-04
absolute error numpy f32/pandas f64: 2.399e-07
```
#### Output of ``pd.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-80-generic
Version : #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.1
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 3.0.0
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2
</details>
author:
url:https://github.com/sebasv
type:Person
name:sebasv
datePublished:2021-08-04T09:37:14.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:14
url:https://github.com/42878/pandas/issues/42878
Person:
url:https://github.com/sebasv
name:sebasv
url:https://github.com/sebasv
name:sebasv
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:14
interactionType:https://schema.org/CommentAction
userInteractionCount:14
External Links {π}(2)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net