
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
the fileSizeInBytes of orc and parquet are inconsistent Β· Issue #1666 Β· apache/iceberg
Description:
I found that the value stored in the variable fileSizeInBytes of DataFile, orc and parquet format are inconsistent. The orc format stores the deserialized data size, while the parquet stores the file size. This will cause a problem. In R...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Video & Online Content
- Social Networks
- Technology & Computing
Content Management System {π}
What CMS is github.com built with?
Github.com operates using WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,653,780 visitors per month in the current month.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 5,316,107 paying customers.
The estimated monthly recurring revenue (MRR) is $22,327,651.
The estimated annual recurring revenues (ARR) are $267,931,808.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
size, orc, data, parquet, file, format, deserialized, sign, filesizeinbytes, datafile, method, length, projects, inconsistent, issue, zhangjunx, found, read, rdblue, return, raw, footer, navigation, source, code, pull, requests, actions, security, closed, stores, rewritedatafilesaction, meets, expectations, commented, contributor, shardulm, getrawdatasize, writer, hdfs, github, labels, type, milestone, skip, content, menu, product, solutions, resources,
Topics {βοΈ}
source code raw data size deserialized data size affect scan planning comment metadata assignees return file size file footer orc format stores type projects projects milestone parquet format meets file size orc data shardulm94 mentioned orc format parquet stores parquet format rewrite action fsck command assigned labels labels type milestone relationships personal information hdfs file getrawdatasize github length obtained orc writer variable filesizeinbytes generated datafile size inconsistent meets orc parquet method orcfileappender writer hdfs length sign filesizeinbytes datafile skip jump found stored problem rewritedatafilesaction default
Payment Methods {π}
- Braintree
Questions {β}
- @shardulm94, can you take a look at the ORC file size metric?
- Already have an account?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:the fileSizeInBytes of orc and parquet are inconsistent
articleBody:I found that the value stored in the variable fileSizeInBytes of DataFile, orc and parquet format are inconsistent. The orc format stores the deserialized data size, while the parquet stores the file size.
This will cause a problem. In RewriteDataFilesAction, the default value of the targetSizeInBytes is 128MοΌif it is orc format, , after rewrite action,the size of the datafile is only 10M. Because in RewriteDataFilesAction ,we read the orc data according to the deserialized data size ,not the file size ,so the size of the new generated datafile is not enough to 128M.
The parquet format is normal and meets my expectations.
author:
url:https://github.com/zhangjun0x01
type:Person
name:zhangjun0x01
datePublished:2020-10-27T05:51:31.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:2
url:https://github.com/1666/iceberg/issues/1666
context:https://schema.org
headline:the fileSizeInBytes of orc and parquet are inconsistent
articleBody:I found that the value stored in the variable fileSizeInBytes of DataFile, orc and parquet format are inconsistent. The orc format stores the deserialized data size, while the parquet stores the file size.
This will cause a problem. In RewriteDataFilesAction, the default value of the targetSizeInBytes is 128MοΌif it is orc format, , after rewrite action,the size of the datafile is only 10M. Because in RewriteDataFilesAction ,we read the orc data according to the deserialized data size ,not the file size ,so the size of the new generated datafile is not enough to 128M.
The parquet format is normal and meets my expectations.
author:
url:https://github.com/zhangjun0x01
type:Person
name:zhangjun0x01
datePublished:2020-10-27T05:51:31.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:2
url:https://github.com/1666/iceberg/issues/1666
Person:
url:https://github.com/zhangjun0x01
name:zhangjun0x01
url:https://github.com/zhangjun0x01
name:zhangjun0x01
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:2
interactionType:https://schema.org/CommentAction
userInteractionCount:2
External Links {π}(2)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net