Horns, breves and carons: using non-standard HTML entity codes

I’ve been working on a web site for a Vietnamese restaurant, and the owners want the full Vietnamese names on the menu. The Word document they sent doesn’t translate to the web; I need HTML entity codes for these characters.

The good news is that Vietnamese is basically a Latin alphabet, but with non-standard Latin diacritical marks. For example, typical characters might be:

?

?

The first item is a “small letter u with a horn and a tilde above”. The a has a hook above.

The characters are a part of the Unicode specification, but finding the characters can be difficult. However, there is an excellent web site (not so easy to use, but information-rich): http://www.fileformat.info.

First, you need to declare your language encoding; the language set that your web browser will use to render the text. This is not mandatory, but it is good practice, even if you web site is in English.

For Unicode characters (good in this application - it is a wide set of characters):

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

For a basic English web site, you can use:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Next, you need to use the HTML entity code for special character which looks like this:

&#123;

The characters I used for the Vietnamese menu are in the Latin Extended Additional set of the Unicode specification; there are usually HTML Entity equivalents you can use. For example, the Unicode code for the small u with horn and tilde is U+1EEF; the equivalent HTML entity code is &#7919;.

The site is coming together nicely, and having the correct language, including the diacriticals, is the icing on the cake.

Leave a Reply