
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
API/PERF: Don't reorder categoricals when grouping by an unordered categorical and `sort=False` Β· Issue #48749 Β· pandas-dev/pandas
Description:
xref: dask/dask#9486 (comment) TLDR: When calling df.groupby(key=categocial<order=False>, sort=True, observed=False) the resulting CategoricalIndex will have it's values and categories un...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Technology & Computing
- Education
- Social Networks
Content Management System {π}
What CMS is github.com built with?
Github.com employs WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 4,989,889 paying customers.
The estimated monthly recurring revenue (MRR) is $20,957,532.
The estimated annual recurring revenues (ARR) are $251,490,385.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
categories, categorical, groupby, sortfalse, order, mroeschke, rhshadrach, performance, result, grouping, issue, values, commented, member, codes, sign, code, reorder, type, mentioned, projects, unordered, daskdask, comment, categoricalindex, orderedfalse, groups, cat, data, reordering, libs, labels, returned, navigation, open, pandas, pull, requests, actions, security, apiperf, dont, categoricals, sorttrue, resulting, dfgroupbycol, observedfalsefirstindex, dtypecategory, namerange, sorted,
Topics {βοΈ}
pandas/pandas/core/groupby/categorical personal information api/perf rhshadrach edits member respect sort=true/false nice performance benefit recode categorical codes type projects assess issue discussion requires discussion extra time cost _libs groupby code groupby result disagrees comment metadata assignees rhshadrach mentioned categorical codes [2 unordered categorical called labels projects milestone perf difference groupby code data core team grouping labels sort=true sort=false pandas ] = categorical categorical observed=false ordered=false extra work info clarification behavior needed sorting imposes ~2x slower ~10x slower perf slap grouper incorrect results desired output milestone relationships categories unordered integer categories string categories codes[cat encountered order reorder categoricals dtype='category' take_codes = np github
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
- Do we have an idea what kind of extra time cost the sorting imposes for something sensible, like 10 000 rows and a 100 categories?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:API/PERF: Don't reorder categoricals when grouping by an unordered categorical and `sort=False`
articleBody:xref: https://github.com/dask/dask/pull/9486#issue-1372066649
TLDR: When calling `df.groupby(key=categocial<order=False>, sort=True, observed=False)` the resulting `CategoricalIndex` will have it's values _and categories_ unordered.
```
In [1]: df = DataFrame(
...: [
...: ["(7.5, 10]", 10, 10],
...: ["(7.5, 10]", 8, 20],
...: ["(2.5, 5]", 5, 30],
...: ["(5, 7.5]", 6, 40],
...: ["(2.5, 5]", 4, 50],
...: ["(0, 2.5]", 1, 60],
...: ["(5, 7.5]", 7, 70],
...: ],
...: columns=["range", "foo", "bar"],
...: )
In [2]: col = "range"
In [3]: df["range"] = Categorical(df["range"], ordered=False)
In [4]: df.groupby(col, sort=True, observed=False).first().index
Out[4]: CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', '(7.5, 10]'], categories=['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', '(7.5, 10]'], ordered=False, dtype='category', name='range')
In [5]: df.groupby(col, sort=False, observed=False).first().index
Out[5]: CategoricalIndex(['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', '(0, 2.5]'], categories=['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', '(0, 2.5]'], ordered=False, dtype='category', name='range')
```
It's reasonable that the values are not sorted, but a lot of extra work can be spent un-ordering the _categories_ in:
https://github.com/pandas-dev/pandas/blob/44a4f1619ff5031e59a970a61fac94c3745e4433/pandas/core/groupby/categorical.py#L77-L92
May have been an outcome of fixing https://github.com/pandas-dev/pandas/issues/8868, but if grouping and `sort=False` the values can be achieved without reordering the categories, there would probably be a nice performance benefit.
author:
url:https://github.com/mroeschke
type:Person
name:mroeschke
datePublished:2022-09-23T20:45:55.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:9
url:https://github.com/48749/pandas/issues/48749
context:https://schema.org
headline:API/PERF: Don't reorder categoricals when grouping by an unordered categorical and `sort=False`
articleBody:xref: https://github.com/dask/dask/pull/9486#issue-1372066649
TLDR: When calling `df.groupby(key=categocial<order=False>, sort=True, observed=False)` the resulting `CategoricalIndex` will have it's values _and categories_ unordered.
```
In [1]: df = DataFrame(
...: [
...: ["(7.5, 10]", 10, 10],
...: ["(7.5, 10]", 8, 20],
...: ["(2.5, 5]", 5, 30],
...: ["(5, 7.5]", 6, 40],
...: ["(2.5, 5]", 4, 50],
...: ["(0, 2.5]", 1, 60],
...: ["(5, 7.5]", 7, 70],
...: ],
...: columns=["range", "foo", "bar"],
...: )
In [2]: col = "range"
In [3]: df["range"] = Categorical(df["range"], ordered=False)
In [4]: df.groupby(col, sort=True, observed=False).first().index
Out[4]: CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', '(7.5, 10]'], categories=['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', '(7.5, 10]'], ordered=False, dtype='category', name='range')
In [5]: df.groupby(col, sort=False, observed=False).first().index
Out[5]: CategoricalIndex(['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', '(0, 2.5]'], categories=['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', '(0, 2.5]'], ordered=False, dtype='category', name='range')
```
It's reasonable that the values are not sorted, but a lot of extra work can be spent un-ordering the _categories_ in:
https://github.com/pandas-dev/pandas/blob/44a4f1619ff5031e59a970a61fac94c3745e4433/pandas/core/groupby/categorical.py#L77-L92
May have been an outcome of fixing https://github.com/pandas-dev/pandas/issues/8868, but if grouping and `sort=False` the values can be achieved without reordering the categories, there would probably be a nice performance benefit.
author:
url:https://github.com/mroeschke
type:Person
name:mroeschke
datePublished:2022-09-23T20:45:55.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:9
url:https://github.com/48749/pandas/issues/48749
Person:
url:https://github.com/mroeschke
name:mroeschke
url:https://github.com/mroeschke
name:mroeschke
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:9
interactionType:https://schema.org/CommentAction
userInteractionCount:9
External Links {π}(2)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net