Blog About About Threepress Help

For publishers: Bookworm and the ePub standard

Sign in

About ePub

ePub is an open format of the The International Digital Publishing Forum (IDPF). It is based on XML and XHTML, re-using existing standards for digital books.

The technical details and specification are available from http://openebook.org/.

There's a lot of good, up-to-date information about where to buy ePubs and which devices and software support it at ePub Books.

Public domain ePub logos are available from threepress.org.

Table of contents

  1. XHTML support
  2. NCX Table of Contents
  3. Conformance
  4. Images and graphics
  5. DTBook
  6. Fonts
  7. Invalid ePub handling
  8. Limitations
  9. Other open-source ePub

Resources

This page is meant for developers, publishers and ebook enthusiasts who are interested in the ePub standard and how Bookworm uses it.

Tutorial

A complete tutorial on using the ePub format is available from IBM DeveloperWorks: Build a digital book with EPUB. Examples from Bookworm are included.

A comprehensive list of ePub resource was written by Keith Fahlgren at O'Reilly Media: ePub Resources and Guides.

ePub

XHTML support

Because the most common content format for ePub is XHTML, ePub and the web browser are a natural fit. This allows Bookworm to support advanced layout and markup that standalone reading software often does not, including:

  • Tables
  • Lists
  • Fixed-width fonts

These features are especially important in non-fiction works such as technical titles or scientific papers.

Cascading Style Sheets (CSS) and JavaScript

Bookworm will render any CSS stylesheets that are packaged with an ebook. In order to prevent interference with Bookworm's native CSS, an ePub's included stylesheets are slightly modified to restrict their effects to the book's content area.

Any external JavaScript files that might be bundled with the ePub are not loaded, and any <script> tags embedded in the content are removed. This is a security feature to prevent malicious scripts from being uploaded to the site.

Bookworm includes its own stylesheets specially-formatted for the printed page. Try using your browser's "Print Preview" feature while reading an ePub to see how it would appear when printed.

NCX Table of Contents

Expanded navigation

The table of contents metadata contain chunks at levels other than the chapter. It might also include chunks at the "part" or sub-chapter level, organized into a nested hierarchy.

Bookworm will expand nested items in the Table of Contents when the user moves through them.

When multiple sub-sections exist, a small arrow will appear next to the sub-section that is being read.

Reading system conformance

In accordance with the ePub specification, Bookworm will ‘open’ the ePub to the first item in the NCX file with the attribute linear set to ‘yes’ Counterintuitively, this may mean that in some titles the default page for the ePub will not be the first item in the NCX file.

Bookworm is not yet a truly conforming reading system; see the issues page for the status on several bugs related to ePub conformance issues.

Images and graphics

SVG logo

As a browser-based application, Bookworm supports illustrations and photos in all the formats that ePub does: JPEG, GIF and PNG. Scalable Vector Graphics (SVG) if images are handled by linking to an external SVG file, as many browsers do not have inline-SVG support via the <img> tag.

DTBook

The OPS specification indicates that valid ePubs may contain content in either XHTML or DTBook.

Bookworm supports DTBook-formatted ePubs by automatically converting their content to XHTML. This is done using XSLT derived from the DAISY pipeline.

You can always get back to the original DTBook content by downloading the ePub from any book page.

Fonts

Bookworm will support any font declarations made using CSS 1 or CSS 2. In theory it could support embedded fonts as defined by CSS 3 if the browser does, but this has not been tested (samples welcome!).

How Bookworm treats invalid ePub

One of the design goals of Bookworm is that it should render ePub documents which do not completely follow the specification. However, there are many cases in which it will not be able to correctly parse all the parts of the file and will reject it.

Any ePub book which is found to be seriously malformed will be automatically handed off to the threepress.org epubcheck service. The results from epubcheck are reported to the user who is uploading the book. It is hoped that the messages from Bookworm and epubcheck will clearly identify the problem with the ePub file.

Limitations

Obviously, Bookworm is limited by the features available in the user's browser. It is recommended to be used with recent versions of Firefox, Safari or Chrome, however Internet Explorer 6 and 7 are also supported.

Some extremely long-form ebooks, such as single 'page' comics or unbroken novel-length texts, are not appropriate for web-based reading. Document authors are advised to break up long texts into individual XHTML files if at all appropriate. This is especially important when creating ePub documents for mobile devices, which often have problems with large file sizes. (The Sony Reader cannot read XHTML files larger than 300Kb in size, for example.)

Other open-source ePub

There are other open-source readers available that support the ePub format. The most actively developed is FBReader, which is compatible with Windows desktop, Linux (and devices which run it, such as Zaurus and Nokia tablets) and, unofficially, iLiad. A complete list of open-source ePub tools can be found in the O'Reilly Labs ePub Resources and Guides article.