Using Hexidecimal Entities to Display Special Characters, or How A&W broke your ebook

Any type designer worth her weight knows the difference between “this” and “this,” and why a dash is fine for 9-5, but when you want a little drama— there’s nothing like an em-dash. But does she know the raison d’être of HTML entities, or why when Blimpy orders a Papa Burger in the fifth chapter, her file won’t validate?

When to use entities

Using entities in your code allows you to gain access to a world of unicode characters outside of the seven-bit ASCII available when coding plain-text XHTML, including international characters and design characters like smart quotes and em- and en-dashes. Entities also help you make clear what the code might confuse: if you have an ampersand in your text, a browser or ereader may interpret it as code— the opening of an entity, ironically. To make it clear when an & is only an &, use the entity & in your HTML.

If you want unusual white space in your text, sprinkle some   in your code, (that’s the entity for a non-breaking space.) Ereaders will know exactly          what          to          do.

(We dropped ten non-breaking spaces between each word.)

How to use entities

As the computer scientist Andy Tanenbaum once said, “The nice thing about standards is that you have so many to choose from.” There are a number of ways to encode characters, but we’re focusing here on hexadecimal codes as a stable method for ereading devices.

Hexadecimal entities take the form &#x__;, replacing the underscores with the unique hexadecimal number assigned to the unicode character you’re after. Using them is a piece of cake. Just drop the correct entity into your HTML code where you want the character to appear in the text. The code for an ampersand is &, so the sentence “He sauntered up to the cashier at the A&W.” looks like this in your HTML file:

He sauntered up to the cashier at the A&W.

Rather than defaulting to dashes and dumb quotes, we’ve used entities throughout this article. Feel free to have a look at the source code to see entities in action.

A handy guide to entities for Canadian books

For reference, we’ve pulled together a chart of the unicode entities we think you’ll use most often in your books.

‘ ’
“ ”
– —
© © & &
« « » »
À À Œ Œ ê ê
  ٠٠ë ë
Ä Ä Û Û î î
Ç Ç Ü Ü ï ï
È È Ÿ Ÿ ô ô
É É à à œ œ
Ê Ê â â ù ù
Ë Ë ä ä û û
Î Î ç ç ü ü
Ï Ï è è ÿ ÿ
Ô Ô é é
Non-breaking space  

Find a complete listing of unicode characters, and their hexadecimal numbers, at http://unicode.org/charts.

Don’t forget to test everywhere

The entities above are probably pretty safe bets, but remember when using entities: just because a character has a unicode number, doesn’t mean all ereaders will display it. The question of which characters to support is left to the discretion of device manufacturers, so if you’re using a character you’re uncertain of, don’t forget to test your ebook on every device you can get your hands on.

How well does your browser display these characters?

Geometric Shapes Thai Script Dingbats
■▰◀◐◠◰ □▱◁◑◡◱ ▢▲◂◒◢◲ ▣△◃◓◣◳ ▤▴◄◔◤◴ ▥▵◅◕◥◵ ▦▶◆◖◦◶ ▧▷◇◗◧◷ ▨▸◈◘◨◸ ▩▹◉◙◩◹ ▪►◊◚◪◺ ▫▻○◛◫◻ ▬▼◌◜◬◼ ▭▽◍◝◭◽ ▮▾◎◞◮◾ ▯▿●◟◯◿ ฐภะเ๐ กฑมัแ๑ ขฒยาโ๒ ฃณรำใ๓ คดฤิไ๔ ฅตลีๅ๕ ฆถฦึๆ๖ งทวื็๗ จธศุ่๘ ฉนษู้๙ ชบสฺ๊๚ ซปห๋๛ ฌผฬ์ ญฝอํ ฎพฮ๎ ฏฟฯ฿๏ ✐✠✰❀❐❠❰➀➐➠➰ ✁✑✡✱❁❑❡❱➁➑➡➱ ✂✒✢✲❂❒❢❲➂➒➢➲ ✃✓✣✳❃❓❣❳➃➓➣➳ ✄✔✤✴❄❔❤❴➄➔➤➴ ✅✕✥✵❅❕❥❵➅➕➥➵ ✆✖✦✶❆❖❦❶➆➖➦➶
✇✗✧✷❇❗❧❷➇➗➧➷ ✈✘✨✸❈❘❨❸➈➘➨➸ ✉✙✩✹❉❙❩❹➉➙➩➹ ✊✚✪✺❊❚❪❺➊➚➪➺ ✋✛✫✻❋❛❫❻➋➛➫➻ ✌✜✬✼❌❜❬❼➌➜➬➼ ✍✝✭✽❍❝❭❽➍➝➭➽ ✎✞✮✾❎❞❮❾➎➞➮➾ ✏✟✯✿❏❟❯❿➏➟➯➿

Want more?

  • For an overview of the basics of HTML and EPUB, check out
    An Introduction to HTML and CSS for EPUB.
  • It won’t help you much with unicode entities, but if you’re interested in geeking out on the hexadecimal number system, we recommend How to Count, the first book in the Programming for Mere Mortals series by Steven Frank.