
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
BUG: Large number of PerformanceWarnings when attempting to read .xpt file Β· Issue #48595 Β· pandas-dev/pandas
Description:
Pandas version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. I have confirmed this bug exists on the main branch of pandas. Reproducible Example imp...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Technology & Computing
- Video & Online Content
- Careers
Content Management System {π}
What CMS is github.com built with?
Github.com relies on WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,634,219 visitors per month in the current month.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 5,306,347 paying customers.
The estimated monthly recurring revenue (MRR) is $22,286,656.
The estimated annual recurring revenues (ARR) are $267,439,867.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
pandas, issue, bug, file, performance, sign, large, number, read, leeping, readsas, sas, projects, performancewarnings, attempting, xpt, warnings, dataframe, navigation, pull, requests, actions, security, closed, description, version, confirmed, exists, printed, times, expected, behavior, added, triage, reviewed, team, member, phofl, memory, execution, speed, github, type, milestone, footer, skip, content, menu, product, solutions,
Topics {βοΈ}
gov/nchs/nhanes/2017-2018/demo_j gov/nchs/nhanes/2017-2018/dr1tot_j personal information bug read_sas performance memory comment metadata assignees issue description de-fragmented frame xpt file downloaded latest version mroeschke closed type projects bug exists import pandas poor performance projects milestone triage issue highly fragmented calling frame `newframe = frame pandas large number main branch index='seqn' expected behavior avoid fragmentation milestone relationships issue repeated 68 times github cdc website 'dr1tot_j xpt' file xpt pd read_sas times cdc sign skip jump performancewarnings attempting read checked reported confirmed reproducible printed warnings
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:BUG: Large number of PerformanceWarnings when attempting to read .xpt file
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [ ] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
import pandas as pd
diet = pd.read_sas('DR1TOT_J.XPT',index='SEQN')
```
### Issue Description
Hi there,
When attempting to read in a `.XPT` file downloaded from the CDC website using `pandas.read_sas`, a large number of PerformanceWarnings is printed. The file that causes the warnings to appear is this one: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT .
```<stdin>:1: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()````
The above warning is repeated 68 times. A different file that doesn't produce any warnings is: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DEMO_J.XPT
I'm not familiar with the inner workings of `read_sas` but I can only assume that it's creating the DataFrame in a way that is causing another part of pandas to complain.
Thanks a lot!
### Expected Behavior
The expected behavior is for the file to be read in without a large number of warnings being printed. :)
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : ca60aab7340d9989d9428e11a51467658190bb6b
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.4
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : None
fsspec : 2022.8.2
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.3
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.41
tables : 3.6.1
tabulate : 0.8.10
xarray : 2022.6.0
xlrd : None
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/leeping
type:Person
name:leeping
datePublished:2022-09-16T23:48:40.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:1
url:https://github.com/48595/pandas/issues/48595
context:https://schema.org
headline:BUG: Large number of PerformanceWarnings when attempting to read .xpt file
articleBody:### Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.
- [ ] I have confirmed this bug exists on the main branch of pandas.
### Reproducible Example
```python
import pandas as pd
diet = pd.read_sas('DR1TOT_J.XPT',index='SEQN')
```
### Issue Description
Hi there,
When attempting to read in a `.XPT` file downloaded from the CDC website using `pandas.read_sas`, a large number of PerformanceWarnings is printed. The file that causes the warnings to appear is this one: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT .
```<stdin>:1: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()````
The above warning is repeated 68 times. A different file that doesn't produce any warnings is: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DEMO_J.XPT
I'm not familiar with the inner workings of `read_sas` but I can only assume that it's creating the DataFrame in a way that is causing another part of pandas to complain.
Thanks a lot!
### Expected Behavior
The expected behavior is for the file to be read in without a large number of warnings being printed. :)
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : ca60aab7340d9989d9428e11a51467658190bb6b
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.4
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : None
fsspec : 2022.8.2
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.3
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.41
tables : 3.6.1
tabulate : 0.8.10
xarray : 2022.6.0
xlrd : None
xlwt : None
zstandard : None
</details>
author:
url:https://github.com/leeping
type:Person
name:leeping
datePublished:2022-09-16T23:48:40.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:1
url:https://github.com/48595/pandas/issues/48595
Person:
url:https://github.com/leeping
name:leeping
url:https://github.com/leeping
name:leeping
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:1
interactionType:https://schema.org/CommentAction
userInteractionCount:1
External Links {π}(5)
- Income figures for https://github.blog
- How much does https://pandas.pydata.org/docs/whatsnew/index.html earn?
- How much revenue does https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT bring in?
- What's the monthly money flow for https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DEMO_J.XPT?
- How much does https://www.githubstatus.com/ generate monthly?
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net