
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
BUG: groupby transform doesn't respect Series index anymore Β· Issue #45648 Β· pandas-dev/pandas
Description:
Pandas version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. I have confirmed this bug exists on the main branch of pandas. Reproducible Example df ...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Video & Online Content
- Events
- Family & Parenting
Content Management System {π}
What CMS is github.com built with?
Github.com relies on WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 13,918,967 visitors per month in the current month.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 6,945,396 paying customers.
The estimated monthly recurring revenue (MRR) is $29,170,665.
The estimated annual recurring revenues (ARR) are $350,047,976.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
foo, bar, transform, behavior, dfgroupbyactransformpdseriessortvalues, rhshadrach, ben, pandas, index, option, series, commented, result, issue, member, bug, expect, jbrockmendel, return, groupby, groups, apply, indexed, sign, sort, defined, isnt, team, author, aligning, docs, object, consistency, discussion, projects, doesnt, version, pddataframe, convertdtypes, dataframe, sortvalues, expected, added, case, type, words, current, edited, edits, original,
Topics {βοΈ}
api/behavior apply apply rhshadrach edits member map bug groupby personal information bug discussion requires discussion comment metadata assignees full intended op action regression functionality document casting behavior dataframe type difference public docs transform method returns type projects latest version core team triage issue api/behavior bug exists projects milestone 1 apply pandas pandas 1 5 closed issue range index main branch works properly expected output drop=true necessarily agree numpy array parenthetical remarks change meaning points immediately applies equally create inconsistencies 100% complete relationships expected behavior current behavior input index obvious series series form groups defined docs written assuming special cases github frame result object indexed equal df
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
- Can you explain this?
- To return the same result (besides the obvious Series vs DataFrame type difference)?
- Transform should return an object indexed the same as the original"?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:BUG: groupby transform doesn't respect Series index anymore
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [X] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
df = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar', 'bar', 'foo', 'foo'],
'C': [2.1, 1.9, 3.6, 4.0, 1.9, 7.8, 2.8]
}).convert_dtypes()
print(df)
A C
0 foo 2.1
1 bar 1.9
2 foo 3.6
3 bar 4.0
4 bar 1.9
5 foo 7.8
6 foo 2.8
# sort C within groups defined by A
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 3.6
3 4.0
4 1.9
5 7.8
6 2.8
```
### Issue Description
Suppose I have the following DataFrame
```
df = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar', 'bar', 'foo', 'foo'],
'C': [2.1, 1.9, 3.6, 4.0, 1.9, 7.8, 2.8]
}).convert_dtypes()
print(df)
A C
0 foo 2.1
1 bar 1.9
2 foo 3.6
3 bar 4.0
4 bar 1.9
5 foo 7.8
6 foo 2.8
```
I used to be able to sort `C`'s values within the groups defined by `A` like this ([proof](https://youtu.be/n681ajrAuVE?t=547)):
```
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
(This was sometime around Pandas 1.0.0)
However pandas 1.4.0 produces the following
```
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 3.6
3 4.0
4 1.9
5 7.8
6 2.8
```
It's as if the `sort_values()` function isn't even being applied.
Note that `df.groupby('A')[['C']].transform(pd.Series.sort_values)` works properly.
```
df.groupby('A')[['C']].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
### Expected Behavior
I would expect `df.groupby('A')['C'].transform(pd.Series.sort_values)` to actually sort C's values within the groups defined by A (as it once did). My expected output in this example would be a Series like this.
```
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : bb1f651536508cdfef8550f93ace7849b00046ee
python : 3.10.1.final.0
python-bits : 64
OS : Darwin
OS-release : 21.2.0
Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.4.0
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/ben519
type:Person
name:ben519
datePublished:2022-01-26T22:58:16.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:15
url:https://github.com/45648/pandas/issues/45648
context:https://schema.org
headline:BUG: groupby transform doesn't respect Series index anymore
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [X] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
df = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar', 'bar', 'foo', 'foo'],
'C': [2.1, 1.9, 3.6, 4.0, 1.9, 7.8, 2.8]
}).convert_dtypes()
print(df)
A C
0 foo 2.1
1 bar 1.9
2 foo 3.6
3 bar 4.0
4 bar 1.9
5 foo 7.8
6 foo 2.8
# sort C within groups defined by A
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 3.6
3 4.0
4 1.9
5 7.8
6 2.8
```
### Issue Description
Suppose I have the following DataFrame
```
df = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar', 'bar', 'foo', 'foo'],
'C': [2.1, 1.9, 3.6, 4.0, 1.9, 7.8, 2.8]
}).convert_dtypes()
print(df)
A C
0 foo 2.1
1 bar 1.9
2 foo 3.6
3 bar 4.0
4 bar 1.9
5 foo 7.8
6 foo 2.8
```
I used to be able to sort `C`'s values within the groups defined by `A` like this ([proof](https://youtu.be/n681ajrAuVE?t=547)):
```
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
(This was sometime around Pandas 1.0.0)
However pandas 1.4.0 produces the following
```
df.groupby('A')['C'].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 3.6
3 4.0
4 1.9
5 7.8
6 2.8
```
It's as if the `sort_values()` function isn't even being applied.
Note that `df.groupby('A')[['C']].transform(pd.Series.sort_values)` works properly.
```
df.groupby('A')[['C']].transform(pd.Series.sort_values)
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
### Expected Behavior
I would expect `df.groupby('A')['C'].transform(pd.Series.sort_values)` to actually sort C's values within the groups defined by A (as it once did). My expected output in this example would be a Series like this.
```
C
0 2.1
1 1.9
2 2.8
3 1.9
4 4.0
5 3.6
6 7.8
```
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : bb1f651536508cdfef8550f93ace7849b00046ee
python : 3.10.1.final.0
python-bits : 64
OS : Darwin
OS-release : 21.2.0
Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.4.0
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/ben519
type:Person
name:ben519
datePublished:2022-01-26T22:58:16.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:15
url:https://github.com/45648/pandas/issues/45648
Person:
url:https://github.com/ben519
name:ben519
url:https://github.com/ben519
name:ben519
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:15
interactionType:https://schema.org/CommentAction
userInteractionCount:15
External Links {π}(4)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net