
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
BUG: read_excel skiprows callable can result in infinite loop Β· Issue #45585 Β· pandas-dev/pandas
Description:
Pandas version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. I have confirmed this bug exists on the main branch of pandas. Reproducible Example imp...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Video & Online Content
- Technology & Computing
- Mobile Technology & AI
Content Management System {π}
What CMS is github.com built with?
Github.com relies on WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 4,989,889 paying customers.
The estimated monthly recurring revenue (MRR) is $20,957,532.
The estimated annual recurring revenues (ARR) are $251,490,385.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
bug, issue, pandas, loop, readexcel, skiprows, callable, row, sign, infinite, projects, closed, bram, import, fix, added, navigation, pull, requests, actions, security, result, description, version, confirmed, exists, pddataframerow, expecteddf, test, triage, reviewed, team, member, excel, toexcel, milestone, github, type, footer, skip, content, menu, product, solutions, resources, open, source, enterprise, pricing, search,
Topics {βοΈ}
read_excel skiprows callable personal information bug comment metadata assignees issue description false values coming skiprows callable latest version projects milestone 1 loop infinitely loop forever type projects skiprows=lambda bug exists triage issue index=false pandas main branch df = pd read_excel expected_df = pd unit test fix expected behavior test provided 5 closed 100% complete relationships bug issue github to_excel type row index source_df assert_frame_equal pd tempfile df expected_df to_excel sign row skip jump result checked reported confirmed reproducible dataframe namedtemporaryfile suffix=
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:BUG: read_excel skiprows callable can result in infinite loop
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [X] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
import tempfile
import pandas as pd
from pandas.testing import assert_frame_equal
source_df = pd.DataFrame([{"row": 0}, {"row": 1}, {"row": 2}, {"row": 3}])
with tempfile.NamedTemporaryFile("w", suffix=".xlsx") as xlsx_fd:
source_df.to_excel(xlsx_fd.name, index=False)
df = pd.read_excel(
xlsx_fd.name, skiprows=lambda x: x not in [0, 2]
) # infinite loop here
expected_df = pd.DataFrame([{"row": 1}, {"row": 3}])
assert_frame_equal(df, expected_df)
```
### Issue Description
In certain cases the `skiprows` callable on `read_excel` will loop infinitely. It's related to the sequence of `True` and `False` values coming out of the function. I've already written a unit test that triggers the issue and a fix to address it. Creating this bug to track the fix against.
### Expected Behavior
The test provided above should pass, where currently it will loop forever.
To put it another way; `skiprows` callable should never be called with a row index that is greater than the length of the document.
### Installed Versions
<details>
β― ls bug.py | entr -c python bug.py
/Users/bram/Code/nca/pandas/pandas/compat/_optional.py:149: UserWarning: Pandas requires version '1.3.1' or newer of 'bottleneck' (version '1.2.1' currently installed).
warnings.warn(msg, UserWarning)
INSTALLED VERSIONS
------------------
commit : 04f5721617a622912f493f42f160139e073d0c2f
python : 3.9.9.final.0
python-bits : 64
OS : Darwin
OS-release : 21.2.0
Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.5.0.dev0+130.g04f5721617
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.1.0
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.36.0
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/bram2000
type:Person
name:bram2000
datePublished:2022-01-24T10:34:58.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:0
url:https://github.com/45585/pandas/issues/45585
context:https://schema.org
headline:BUG: read_excel skiprows callable can result in infinite loop
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [X] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
import tempfile
import pandas as pd
from pandas.testing import assert_frame_equal
source_df = pd.DataFrame([{"row": 0}, {"row": 1}, {"row": 2}, {"row": 3}])
with tempfile.NamedTemporaryFile("w", suffix=".xlsx") as xlsx_fd:
source_df.to_excel(xlsx_fd.name, index=False)
df = pd.read_excel(
xlsx_fd.name, skiprows=lambda x: x not in [0, 2]
) # infinite loop here
expected_df = pd.DataFrame([{"row": 1}, {"row": 3}])
assert_frame_equal(df, expected_df)
```
### Issue Description
In certain cases the `skiprows` callable on `read_excel` will loop infinitely. It's related to the sequence of `True` and `False` values coming out of the function. I've already written a unit test that triggers the issue and a fix to address it. Creating this bug to track the fix against.
### Expected Behavior
The test provided above should pass, where currently it will loop forever.
To put it another way; `skiprows` callable should never be called with a row index that is greater than the length of the document.
### Installed Versions
<details>
β― ls bug.py | entr -c python bug.py
/Users/bram/Code/nca/pandas/pandas/compat/_optional.py:149: UserWarning: Pandas requires version '1.3.1' or newer of 'bottleneck' (version '1.2.1' currently installed).
warnings.warn(msg, UserWarning)
INSTALLED VERSIONS
------------------
commit : 04f5721617a622912f493f42f160139e073d0c2f
python : 3.9.9.final.0
python-bits : 64
OS : Darwin
OS-release : 21.2.0
Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.5.0.dev0+130.g04f5721617
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.1.0
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.36.0
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/bram2000
type:Person
name:bram2000
datePublished:2022-01-24T10:34:58.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:0
url:https://github.com/45585/pandas/issues/45585
Person:
url:https://github.com/bram2000
name:bram2000
url:https://github.com/bram2000
name:bram2000
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:0
interactionType:https://schema.org/CommentAction
userInteractionCount:0
External Links {π}(3)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net