GITHUB . COM {}

Detected CMS Systems:

Wordpress (2 occurrences)

Analyzed Page
Matching Content Categories
CMS
Monthly Traffic Estimate
How Does Github.com Make Money
How Much Does Github.com Make
Wordpress Themes And Plugins
Keywords
Topics
Payment Methods
Questions
Schema
External Links
Analytics And Tracking
Libraries
Hosting Providers

We are analyzing https://github.com/w3c/epub-specs/issues/1888.

Title:
What base URLs to use for URL parsing in EPUB? · Issue #1888 · w3c/epub-specs
Description:
(or: The Big Mystery of Spooky EPUB Relative URLs 👻🎃) TL;DR: EPUB 3.3 now normatively references the URL Standard. But URL parsing is ambiguous in some cases, because base URLs are not clearly defined. ⚠️ EPUB references in the problem s...
Website Age:
17 years and 9 months (reg. 2007-10-09).

Matching Content Categories {📚}

Technology & Computing
Telecommunications
Video & Online Content

Content Management System {📝}

What CMS is github.com built with?

Github.com employs WORDPRESS.

Traffic Estimate {📈}

What is the average monthly size of github.com audience?

🚀🌠 Tremendous Traffic: 10M - 20M visitors per month

Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.

check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush

How Does Github.com Make Money? {💸}

Subscription Packages {💳}

We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.

How Much Does Github.com Make? {💰}

Subscription Packages {💳}

Prices on github.com are in US Dollars ($). They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 5,013,426 paying customers.
The estimated monthly recurring revenue (MRR) is $21,056,387.
The estimated annual recurring revenues (ARR) are $252,676,649.

Wordpress Themes and Plugins {🎨}

What WordPress theme does this site use?

It is strange but we were not able to detect any theme on the page.

What WordPress plugins does this website use?

It is strange but we were not able to detect any plugins on the page.

Keywords {🔍}

url, epub, urls, directory, base, rdeltour, package, root, unique, relative, document, strings, container, string, parsing, defined, file, commented, scheme, resources, parsed, secret, solution, zip, member, author, app, resource, examples, docxhtml, standard, host, unambiguous, publication, issue, description, current, documents, instance, contained, originsafe, features, open, problem, fragment, resulting, system, path, define, possibly,

Topics {✒️}

title=thetitle&isbn=theisbn&epubcfi=/6/4[chapter01] navigation document identifies org/acme-publishing/mobydick path-absolute url strings path-relative url strings meta-inf directory epubcfi fragment means ocf abstract container html-encoded title org/secret features unambiguous security epub-specific url scheme org/acme/tomsawyer epub ocf container type projects projects milestone relative url strings fork org/acme/mobydick 3 revision spec-epub3 forge url strings fragment identifiers linking comment metadata assignees unique port description increasingly specific urls de-facto standard open container format fragment identifier part privacy vulnerability current situation description org/acme/nav considered good practice reserved special url chosen special url relative url string org/video/cat query form imho relative url references p5music edits base url description unique instance id space possibly owned legit relative url url string representing parsing relative urls current specification leaves issues rdeltour mentioned org/epub/package //1234/secret features unambiguous

Payment Methods {📊}

Braintree

Questions {❓}

Already have an account?
Does it make sense?
Does it make sense?
I understand this is similar to solution 5 define above?
IRI of the Package Document": what is this exactly?
Is that approach (using a port number as the way to differentiate an EPUB instance) OK or is it not considered good practice?
Maybe examples would help?
What base URLs to use for URL parsing in EPUB?
What if the RS doesn't unzip the root but only a subdirectory?
What if we replace the fragment identifier part with the ZIP path (…) am I missing another problem?
What is the URL of the root directory?
Would that still require us to get into schemes?

Schema {🗺️}

DiscussionForumPosting:
      context:https://schema.org
      headline:What base URLs to use for URL parsing in EPUB?
      articleBody:(or: The Big Mystery of Spooky EPUB Relative URLs 👻🎃) **TL;DR**: EPUB 3.3 now normatively references the [URL Standard](https://url.spec.whatwg.org). But URL parsing is ambiguous in some cases, because base URLs are not clearly defined. ⚠️ EPUB references in the problem statement below point to a dated version of the EPUB 3.3 working draft (Oct 29, 2021). Do not copy out of context! 😉 ## Current situation In an EPUB, files reference each other via relative URL strings (see [Relative URLs, in Open Container Format](https://www.w3.org/TR/2021/WD-epub-33-20211029/#sec-container-iri)). In the URL standard, to parse a relative URL string into URL records, the [URL parser](https://url.spec.whatwg.org/#concept-url-parser) needs a base URL. The base URL used to parse a URL string is defined by host languages (like in [CSS](https://www.w3.org/TR/css-values-4/#relative-urls), or [HTML](https://html.spec.whatwg.org/multipage/urls-and-fetching.html#resolving-urls)). Typically, it is the URL of the document containing the URL string. EPUB defines what base URL to use for URL parsing in two cases: - relative URL strings found in documents located in the `META-INF` directory - relative URL strings in the Package Documents ### Parsing a URL in documents located in the `META-INF` directory For documents in the `META-INF` directory, URL strings must be parsed using the root directory as the base URL (see [Relative URLs, in Open Container Format](https://www.w3.org/TR/2021/WD-epub-33-20211029/#sec-container-iri)). The problem is that [*Root Directory*](https://www.w3.org/TR/2021/WD-epub-33-20211029/#dfn-root-directory) is not defined as a URL, but quite abstractly as "the base of the OCF Abstract Container". The spec also says the root directory is "virtual in nature". In fact, RS may or may not generate a physical directory for the root directory (see [OCF ZIP Container RS processing](https://www.w3.org/TR/2021/WD-epub-rs-33-20211029/#confreq-zip-rootdir)). ### Parsing a URL in the Package Document For Package Documents, URL strings must be parsed uses the URL of the Package Document as the base URL (see [Parsing Relative URLs, in Package Documents RS processing](https://www.w3.org/TR/2021/WD-epub-rs-33-20211029/#sec-pkg-doc-relative-urls)). Here again, the URL of the Package Document is not well-defined. But the spec says (in the same section) that for zipped EPUBs, the URL of the package document is obtained "from the URL of the EPUB Container together with a fragment identifier that specifies the path to Package Document (relative to the Root Directory)". ## Problems ### The URL of the container’s root directory is undefined The current specification leaves many questions unanswered: * What is the URL of the root directory? Is it the URL of the ZIP file? or extracted directory? or constructed based on the URL of the ZIP file? how? or it's up to the RS to define it? * The RS may generate a physical directory for the container's Root Direcotry if it unzips the EPUB. What if the RS doesn't unzip the root but only a subdirectory? What if the EPUB is not unzipped as a whole? (but streamed on demand). ### The current way to obtain the URL of the Package Document is flawed Parsing a relative URL in the Package Document always results in a URL of a resource outside the container. #### Examples: For instance, for an EPUB `mobydick.epub` located at `https://example.org/acme-publishing/mobydick.epub` , the URL of the Package Document would be something like `https://example.org/acme-publishing/mobydick.epub#path=/EPUB/package.opf`. So this is how a few relative URL string examples are parsed: | # | URL string | Base EPUB | Resulting URL | | --- | --- | --- | --- | | 1 | `nav.xhtml` | `https://example.org/acme/mobydick.epub#path=/EPUB/package.opf` | `http://example.org/acme/nav.xhtml` | | 2 | `nav.xhtml` | `https://example.org/acme/tomsawyer.epub#package-doc=/EPUB/package.opf` | `https://example.org/acme/nav.xhtml` | | 3 | `../video/cat.mp4` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/video/cat.mp4` | | 4 | `/secret` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/secret` | | 5 | `../../../secret` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/secret` | * example 1 shows that the parsed URL of a navigation document identifies a (possibly existing) resource outside the EPUB. * example 1 and 2 show that the URLs of two documents from two different EPUBs are parsed into the same URL. * example 3 shows that a legit relative URL of an in-container video resource is parsed as the URL that: * may conflict with the URL of another legit remote resource (remote resources are allowed for video content). * leaks outside the container, and points to a space possibly owned by another publisher * example 4 and 5 show that it is very easy to forge URL strings that are parsed to arbitrary files on a server or file system. This is true not only for path-absolute URL strings like 4, but also of for path-relative URL strings like 5. #### To summarize: * the current way Package Document URLs are defined is flawed (potential conflicts between 2 legit URL strings) * the current way Package Document URLs is possibly a security or privacy vulnerability ## Possible Solutions The ideal solution would ensure parsed URLs would be: * **unambiguous**: the results of parsing two URL strings should not be two identical URLs for one processor and two different URLs for another processor. * **Why?** because otherwise it is impossible to tell if an EPUB is conforming (it may be for a processor and not for another) * **contained**: the result of parsing a relative URL string should not be the URL of a resource outside of the container. At least, a URL string representing a legit in-container resource should not be parsed to a URL of a remote resource. * **Why?** To avoid conflicts between publication resources and remote resources. To avoid possible vulnerabilities. * **unique**: the result of parsing two relative URL strings from two different EPUBs should not be two identical URLs. * **Why?** To avoid conflicts within a RS implementation (to be confirmed) * **origin-safe**: the URLs parsed from two relative URL strings from two different EPUB instances should not be [same-origin](https://html.spec.whatwg.org/multipage/origin.html#same-origin). If possible, the URLs parsed from two relative URL strings in the same EPUB should be same-origin. * **Why?** resources within the same publication share the same trusted authority, resources within different publicaitons (or copies of the same publication) do not. **Note:** the ideal solution might not exist, or might not be practical to use, to implement, or to specify. But the goals listed above may help us evaluate _a_ solution. Possible solutions will be listed below as individual comments, for easier referencing in the discussion. Comments and ideas welcome! 😊 I may have missed important things…
      author:
         url:https://github.com/rdeltour
         type:Person
         name:rdeltour
      datePublished:2021-10-30T01:07:31.000Z
      interactionStatistic:
         type:InteractionCounter
         interactionType:https://schema.org/CommentAction
         userInteractionCount:48
      url:https://github.com/1888/epub-specs/issues/1888
      context:https://schema.org
      headline:What base URLs to use for URL parsing in EPUB?
      articleBody:(or: The Big Mystery of Spooky EPUB Relative URLs 👻🎃) **TL;DR**: EPUB 3.3 now normatively references the [URL Standard](https://url.spec.whatwg.org). But URL parsing is ambiguous in some cases, because base URLs are not clearly defined. ⚠️ EPUB references in the problem statement below point to a dated version of the EPUB 3.3 working draft (Oct 29, 2021). Do not copy out of context! 😉 ## Current situation In an EPUB, files reference each other via relative URL strings (see [Relative URLs, in Open Container Format](https://www.w3.org/TR/2021/WD-epub-33-20211029/#sec-container-iri)). In the URL standard, to parse a relative URL string into URL records, the [URL parser](https://url.spec.whatwg.org/#concept-url-parser) needs a base URL. The base URL used to parse a URL string is defined by host languages (like in [CSS](https://www.w3.org/TR/css-values-4/#relative-urls), or [HTML](https://html.spec.whatwg.org/multipage/urls-and-fetching.html#resolving-urls)). Typically, it is the URL of the document containing the URL string. EPUB defines what base URL to use for URL parsing in two cases: - relative URL strings found in documents located in the `META-INF` directory - relative URL strings in the Package Documents ### Parsing a URL in documents located in the `META-INF` directory For documents in the `META-INF` directory, URL strings must be parsed using the root directory as the base URL (see [Relative URLs, in Open Container Format](https://www.w3.org/TR/2021/WD-epub-33-20211029/#sec-container-iri)). The problem is that [*Root Directory*](https://www.w3.org/TR/2021/WD-epub-33-20211029/#dfn-root-directory) is not defined as a URL, but quite abstractly as "the base of the OCF Abstract Container". The spec also says the root directory is "virtual in nature". In fact, RS may or may not generate a physical directory for the root directory (see [OCF ZIP Container RS processing](https://www.w3.org/TR/2021/WD-epub-rs-33-20211029/#confreq-zip-rootdir)). ### Parsing a URL in the Package Document For Package Documents, URL strings must be parsed uses the URL of the Package Document as the base URL (see [Parsing Relative URLs, in Package Documents RS processing](https://www.w3.org/TR/2021/WD-epub-rs-33-20211029/#sec-pkg-doc-relative-urls)). Here again, the URL of the Package Document is not well-defined. But the spec says (in the same section) that for zipped EPUBs, the URL of the package document is obtained "from the URL of the EPUB Container together with a fragment identifier that specifies the path to Package Document (relative to the Root Directory)". ## Problems ### The URL of the container’s root directory is undefined The current specification leaves many questions unanswered: * What is the URL of the root directory? Is it the URL of the ZIP file? or extracted directory? or constructed based on the URL of the ZIP file? how? or it's up to the RS to define it? * The RS may generate a physical directory for the container's Root Direcotry if it unzips the EPUB. What if the RS doesn't unzip the root but only a subdirectory? What if the EPUB is not unzipped as a whole? (but streamed on demand). ### The current way to obtain the URL of the Package Document is flawed Parsing a relative URL in the Package Document always results in a URL of a resource outside the container. #### Examples: For instance, for an EPUB `mobydick.epub` located at `https://example.org/acme-publishing/mobydick.epub` , the URL of the Package Document would be something like `https://example.org/acme-publishing/mobydick.epub#path=/EPUB/package.opf`. So this is how a few relative URL string examples are parsed: | # | URL string | Base EPUB | Resulting URL | | --- | --- | --- | --- | | 1 | `nav.xhtml` | `https://example.org/acme/mobydick.epub#path=/EPUB/package.opf` | `http://example.org/acme/nav.xhtml` | | 2 | `nav.xhtml` | `https://example.org/acme/tomsawyer.epub#package-doc=/EPUB/package.opf` | `https://example.org/acme/nav.xhtml` | | 3 | `../video/cat.mp4` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/video/cat.mp4` | | 4 | `/secret` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/secret` | | 5 | `../../../secret` | `https://example.org/acme/mobydick.epub#package-doc=/EPUB/package.opf` | `https://example.org/secret` | * example 1 shows that the parsed URL of a navigation document identifies a (possibly existing) resource outside the EPUB. * example 1 and 2 show that the URLs of two documents from two different EPUBs are parsed into the same URL. * example 3 shows that a legit relative URL of an in-container video resource is parsed as the URL that: * may conflict with the URL of another legit remote resource (remote resources are allowed for video content). * leaks outside the container, and points to a space possibly owned by another publisher * example 4 and 5 show that it is very easy to forge URL strings that are parsed to arbitrary files on a server or file system. This is true not only for path-absolute URL strings like 4, but also of for path-relative URL strings like 5. #### To summarize: * the current way Package Document URLs are defined is flawed (potential conflicts between 2 legit URL strings) * the current way Package Document URLs is possibly a security or privacy vulnerability ## Possible Solutions The ideal solution would ensure parsed URLs would be: * **unambiguous**: the results of parsing two URL strings should not be two identical URLs for one processor and two different URLs for another processor. * **Why?** because otherwise it is impossible to tell if an EPUB is conforming (it may be for a processor and not for another) * **contained**: the result of parsing a relative URL string should not be the URL of a resource outside of the container. At least, a URL string representing a legit in-container resource should not be parsed to a URL of a remote resource. * **Why?** To avoid conflicts between publication resources and remote resources. To avoid possible vulnerabilities. * **unique**: the result of parsing two relative URL strings from two different EPUBs should not be two identical URLs. * **Why?** To avoid conflicts within a RS implementation (to be confirmed) * **origin-safe**: the URLs parsed from two relative URL strings from two different EPUB instances should not be [same-origin](https://html.spec.whatwg.org/multipage/origin.html#same-origin). If possible, the URLs parsed from two relative URL strings in the same EPUB should be same-origin. * **Why?** resources within the same publication share the same trusted authority, resources within different publicaitons (or copies of the same publication) do not. **Note:** the ideal solution might not exist, or might not be practical to use, to implement, or to specify. But the goals listed above may help us evaluate _a_ solution. Possible solutions will be listed below as individual comments, for easier referencing in the discussion. Comments and ideas welcome! 😊 I may have missed important things…
      author:
         url:https://github.com/rdeltour
         type:Person
         name:rdeltour
      datePublished:2021-10-30T01:07:31.000Z
      interactionStatistic:
         type:InteractionCounter
         interactionType:https://schema.org/CommentAction
         userInteractionCount:48
      url:https://github.com/1888/epub-specs/issues/1888
Person:
      url:https://github.com/rdeltour
      name:rdeltour
      url:https://github.com/rdeltour
      name:rdeltour
InteractionCounter:
      interactionType:https://schema.org/CommentAction
      userInteractionCount:48
      interactionType:https://schema.org/CommentAction
      userInteractionCount:48

External Links {🔗}(14)

Analytics and Tracking {📊}

Site Verification - Google

Libraries {📚}

Clipboard.js
D3.js
GSAP
Lodash

Emails and Hosting {✉️}

Mail Servers:

aspmx.l.google.com
alt1.aspmx.l.google.com
alt2.aspmx.l.google.com
alt3.aspmx.l.google.com
alt4.aspmx.l.google.com

Name Servers:

dns1.p08.nsone.net
dns2.p08.nsone.net
dns3.p08.nsone.net
dns4.p08.nsone.net
ns-1283.awsdns-32.org
ns-1707.awsdns-21.co.uk
ns-421.awsdns-52.com
ns-520.awsdns-01.net

9.29s.