
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
DEPR: DataFrameGroupBy numeric_only defaulting to True Β· Issue #46072 Β· pandas-dev/pandas
Description:
Context A summary of this behavior and the consensus thus far that DataFrameGroupBy will have numeric_only default to False in 2.0 can be found here: #42395 (comment). In #41475, the silent dropping of nuisance columns was deprecated. In...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Graphic Design
- Video & Online Content
- Family & Parenting
Content Management System {π}
What CMS is github.com built with?
Github.com is built with WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 4,989,889 paying customers.
The estimated monthly recurring revenue (MRR) is $20,957,532.
The estimated annual recurring revenues (ARR) are $251,490,385.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
columns, behavior, numericonly, rhshadrach, column, nuisance, raise, warning, code, depr, dataframegroupby, defaulting, false, numericonlytrue, typeerror, raised, jbrockmendel, input, nonnumeric, aggregated, change, commented, sign, pandas, true, issue, numeric, correct, emit, result, member, pull, requests, projects, consensus, default, changed, empty, remain, agg, sum, transform, case, current, drirv, groupby, mentioned, consistent, bug, navigation,
Topics {βοΈ}
dropping nuisance columns user specifies numeric_only=true personal information depr nuisance columns comment metadata assignees dataframegroupby numeric_only defaulting silent dropping passing args/kwargs rhshadrach mentioned projects milestone 1 numeric_only=true code current type projects groupby operations columns ['b' numeric_only defaulting args/kwargs numeric_only unspecified treating numeric_only parameter numeric_only dataframegroupby apply issue numeric columns columns ['c' columns ['c'] numeric_only=false columns ['d'] numeric_only default head 90 degrees contained lists bottom half bullet point fully respect 100% complete relationships includes apply apply type successfully aggregated groupby behavior change change behavior true futurewarning today single column reverted back github 5 closed empty result case today correct result nonnumeric column
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:DEPR: DataFrameGroupBy numeric_only defaulting to True
articleBody:### Context
A summary of this behavior and the consensus thus far that DataFrameGroupBy will have `numeric_only` default to False in 2.0 can be found here: https://github.com/pandas-dev/pandas/issues/42395#issuecomment-908690066.
In #41475, the silent dropping of nuisance columns was deprecated.
In #43154, the behavior was changed so that when a DataFrame has `numeric_only` unspecified and subsetting to numeric only columns would leave the DataFrame empty, internally pandas treats `numeric_only` as `False`.
Even though there is consensus that `numeric_only` should default to False, because of the above changes I wanted to make sure there is a consensus on how to go about doing so before proceeding.
For the discussion below, it is useful to have three types of columns in mind:
- Numeric: Columns that remain in the input when `numeric_only=True`.
- Nonnumeric, can agg: Columns that do not remain in the input when `numeric_only=True` but can still be successfully aggregated; e.g. strings with `sum`.
- Nonnumeric, can't agg: Columns that do not remain in the input when `numeric_only=True` and cannot be successfully aggregated; e.g. `object`.
### Code
To investigate this on 1.4.x, I have been using the following code. In this code, I am using `.sum()`. However the results for any reduction or transform, whether it be string or callable, should have the same behavior (though that is not the case today). This includes apply and using axis=1 (for which you may want to tilt your head 90 degrees to the left).
<details>
<summary>Code</summary>
```
numeric = [1, 1]
nonnumeric_noagg = [object, object]
nonnumeric_agg = ["2", "2"]
for has_numeric, has_nonnumeric_agg, has_nonnumeric_noagg in it.product([True, False], repeat=3):
for numeric_only in [True, False, lib.no_default]:
print(has_numeric, has_nonnumeric_agg, has_nonnumeric_noagg, numeric_only)
df = pd.DataFrame({"A": [1, 1]})
if has_numeric:
df["B"] = numeric
if has_nonnumeric_agg:
df["C"] = nonnumeric_agg
if has_nonnumeric_noagg:
df["D"] = nonnumeric_noagg
warning_msg = ""
try:
with warnings.catch_warnings(record=True) as w:
result = df.groupby("A").sum(numeric_only=numeric_only)
if len(w) > 0:
assert len(w) == 1
assert issubclass(w[-1].category, FutureWarning)
warning_msg = str(w[-1].message)
except TypeError:
print(" TypeError")
else:
print(" Columns:", result.columns.tolist(), "Warning:", warning_msg[:20])
```
</details>
### Current and Future behavior
#### `numeric_only=True`
Current behavior appears entirely correct and will go unchanged in 1.5/2.0. In particular, when there are no numeric columns in the input, the output is empty as well.
#### `numeric_only=False`
Current behavior appears entirely correct, in that if there are to be any behavior changes in 2.0, we already emit the appropriate FutureWarning today. The only case where there will be a behavior change from 1.4.x to 2.0 is if the frame contains a nonnumeric column that can't be aggregated. 1.4.x drops the column whereas 2.0 will raise a TypeError.
#### `numeric_only` unspecified (`lib.no_default`)
I'll refer to the columns as in the code above:
- B: Numeric column
- C: Nonnumeric column that can be aggregated
- D: Nonnumeric column that cannot be aggregated
1. Columns `['B', 'C', 'D']`
- In 1.4.x we get column B and no warning is raised. In 2.0 this will raise a TypeError.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
2. Columns `['B', 'C']`
- In 1.4.x we get column B and no warning is raised. In 2.0 will we get both B and C in the result.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
3. Columns `['B', 'D']`
- In 1.4.x we get column B and no warning is raised. In 2.0 we will raise a TypeError.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
4. Columns `['C', 'D']`
- In 2.0 we will raise a TypeError, and 1.4.x currently warns that this will happen.
- No change.
5. Columns `['C']`
- In 1.4.x we get column C and no warning is raised. This is the correct result on 2.0, but in my opinion is not the correct result on 1.4.x where we should be treating `numeric_only` as True.
- No change. It is not worth it to change behavior and raise a FutureWarning that the behavior will go back to what it is now.
6. Columns `['D']`
- On 1.4.x this returns an empty result and warns that it will raise a TypeError. In 2.0, it will raise a TypeError.
- No change.
cc @jreback @jbrockmendel @jorisvandenbossche @simonjayhawkins @Dr-Irv
author:
url:https://github.com/rhshadrach
type:Person
name:rhshadrach
datePublished:2022-02-19T16:30:13.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:7
url:https://github.com/46072/pandas/issues/46072
context:https://schema.org
headline:DEPR: DataFrameGroupBy numeric_only defaulting to True
articleBody:### Context
A summary of this behavior and the consensus thus far that DataFrameGroupBy will have `numeric_only` default to False in 2.0 can be found here: https://github.com/pandas-dev/pandas/issues/42395#issuecomment-908690066.
In #41475, the silent dropping of nuisance columns was deprecated.
In #43154, the behavior was changed so that when a DataFrame has `numeric_only` unspecified and subsetting to numeric only columns would leave the DataFrame empty, internally pandas treats `numeric_only` as `False`.
Even though there is consensus that `numeric_only` should default to False, because of the above changes I wanted to make sure there is a consensus on how to go about doing so before proceeding.
For the discussion below, it is useful to have three types of columns in mind:
- Numeric: Columns that remain in the input when `numeric_only=True`.
- Nonnumeric, can agg: Columns that do not remain in the input when `numeric_only=True` but can still be successfully aggregated; e.g. strings with `sum`.
- Nonnumeric, can't agg: Columns that do not remain in the input when `numeric_only=True` and cannot be successfully aggregated; e.g. `object`.
### Code
To investigate this on 1.4.x, I have been using the following code. In this code, I am using `.sum()`. However the results for any reduction or transform, whether it be string or callable, should have the same behavior (though that is not the case today). This includes apply and using axis=1 (for which you may want to tilt your head 90 degrees to the left).
<details>
<summary>Code</summary>
```
numeric = [1, 1]
nonnumeric_noagg = [object, object]
nonnumeric_agg = ["2", "2"]
for has_numeric, has_nonnumeric_agg, has_nonnumeric_noagg in it.product([True, False], repeat=3):
for numeric_only in [True, False, lib.no_default]:
print(has_numeric, has_nonnumeric_agg, has_nonnumeric_noagg, numeric_only)
df = pd.DataFrame({"A": [1, 1]})
if has_numeric:
df["B"] = numeric
if has_nonnumeric_agg:
df["C"] = nonnumeric_agg
if has_nonnumeric_noagg:
df["D"] = nonnumeric_noagg
warning_msg = ""
try:
with warnings.catch_warnings(record=True) as w:
result = df.groupby("A").sum(numeric_only=numeric_only)
if len(w) > 0:
assert len(w) == 1
assert issubclass(w[-1].category, FutureWarning)
warning_msg = str(w[-1].message)
except TypeError:
print(" TypeError")
else:
print(" Columns:", result.columns.tolist(), "Warning:", warning_msg[:20])
```
</details>
### Current and Future behavior
#### `numeric_only=True`
Current behavior appears entirely correct and will go unchanged in 1.5/2.0. In particular, when there are no numeric columns in the input, the output is empty as well.
#### `numeric_only=False`
Current behavior appears entirely correct, in that if there are to be any behavior changes in 2.0, we already emit the appropriate FutureWarning today. The only case where there will be a behavior change from 1.4.x to 2.0 is if the frame contains a nonnumeric column that can't be aggregated. 1.4.x drops the column whereas 2.0 will raise a TypeError.
#### `numeric_only` unspecified (`lib.no_default`)
I'll refer to the columns as in the code above:
- B: Numeric column
- C: Nonnumeric column that can be aggregated
- D: Nonnumeric column that cannot be aggregated
1. Columns `['B', 'C', 'D']`
- In 1.4.x we get column B and no warning is raised. In 2.0 this will raise a TypeError.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
2. Columns `['B', 'C']`
- In 1.4.x we get column B and no warning is raised. In 2.0 will we get both B and C in the result.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
3. Columns `['B', 'D']`
- In 1.4.x we get column B and no warning is raised. In 2.0 we will raise a TypeError.
- We should emit a warning in 1.5 about `numeric_only` defaulting to False in 2.0.
4. Columns `['C', 'D']`
- In 2.0 we will raise a TypeError, and 1.4.x currently warns that this will happen.
- No change.
5. Columns `['C']`
- In 1.4.x we get column C and no warning is raised. This is the correct result on 2.0, but in my opinion is not the correct result on 1.4.x where we should be treating `numeric_only` as True.
- No change. It is not worth it to change behavior and raise a FutureWarning that the behavior will go back to what it is now.
6. Columns `['D']`
- On 1.4.x this returns an empty result and warns that it will raise a TypeError. In 2.0, it will raise a TypeError.
- No change.
cc @jreback @jbrockmendel @jorisvandenbossche @simonjayhawkins @Dr-Irv
author:
url:https://github.com/rhshadrach
type:Person
name:rhshadrach
datePublished:2022-02-19T16:30:13.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:7
url:https://github.com/46072/pandas/issues/46072
Person:
url:https://github.com/rhshadrach
name:rhshadrach
url:https://github.com/rhshadrach
name:rhshadrach
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:7
interactionType:https://schema.org/CommentAction
userInteractionCount:7
External Links {π}(2)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net