Why are HTML character entities necessary?

By | December 11, 2017
Questions:

Why are HTML character entities necessary? What good are they? I don’t see the point.

Answers:

Two main things.

  1. They let you use characters that are not defined in a current charset. E.g., you can legally use ASCII as the charset, and still include arbitrary Unicode characters thorugh entities.
  2. They let you quote characters that HTML gives special meaning to, as Simon noted.
Questions:
Answers:

1 &lt; 2” lets you put “1 < 2” in your page.

Long answer:

Since HTML uses ‘<‘ to open tags, you can’t just type ‘<‘ if you want that as text. Therefore, you have to have a way to say “I want the text < in my page”. Whoever designed HTML (or, actually SGML, HTML’s predecessor) decided to use ‘&something;‘, so you can also put things like non-breaking space: ‘&nbsp;‘ (spaces that are not collapsed or allow a line break). Of course, now you need to have a way to say ‘&‘, so you get ‘&amp;‘…

Questions:
Answers:

They aren’t, apart from &amp;, &lt;, &gt;, &quot; and probably &nbsp;. For all other characters, just use UTF-8.

Questions:
Answers:

In SGML and XML they aren’t just for characters. They are generic inclusion mechanism, and their use for special characters is just one of many cases.

<!ENTITY signature "<hr/><p>Regards, <i>&myname;</i></p>">
<!ENTITY myname "John Doe">

This kind of entities is not useful for web sites, because they work only in XML mode, and you can’t use external DTD file without enabling “validating” parsing mode in browser configuration.


Entities can be expanded recursively. This allows use of XML for Denial of Serice attack called “Billion Laughs Attack”.


Firefox uses entities internally (in XUL and such) for internationalization and brand-independent messages (to make life easier for Flock and IceWeasel):

<!ENTITY hidemac.label "Hide &brandShortName;">
<!ENTITY hidewin.label "Hide - &brandShortName;">

In HTML you just need &lt;, &amp; and &quot; to avoid ambiguities between text and markup.

All other entities are basically obsoleted by Unicode encodings and remain only as covenience (but a good text editor should have macros/snippets that can replace them).


In XHTML all entities except the basic few are problematic, because won’t work with stand-alone XML parsers (e.g. &nbsp; won’t work).

To parse all XHTML entities you need validating XML parser (option’s usually called “resolve externals”) which is slower and needs DTD Catalog set up. If you ignore or screw up your DTD Catalog, you’ll be participating in DDoS of W3C servers.

Questions:
Answers:

Character entities are used to represent character which are reserved to write HTML for.ex.
<, >, /, & etc, if you want to represent these characters in your content you should use character entities, this will help the parser to distinguish between the content and markup

Questions:
Answers:

You use entities to help the parser distinguish when a character should be represented as HTML, and what you really want to show the user, as HTML will reserve a special set of characters for itself.

Typing this literally in HTML

I don’t mean it like that </sarcasm>

will cause the “</sarcasm>” tag to disappear,

e.g.

I don’t mean it like that

as HTML does not have a tag defined as such. In this case, using entities will allow the text to display properly.

e.g.

No, really! &lt;/sarcasm&gt;

gives

No, really! </sarcasm>

as desired.

Leave a Reply

Your email address will not be published. Required fields are marked *