The Wayback Machine - https://web.archive.org/web/20081006122423/http://webreference.com/html/tutorial17/1.html


spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / experts / html / tutorials / 17 / 1

index12345summary

Tutorial 17: Shady Characters

Developer News
Microsoft Shows Some Ankle With Visual Studio
Gentoo Linux Cancels Distribution
It's Official: Windows 7 at PDC, WinHEC

Character sets & character encodings

One of the most confusing things about HTML internationalization is the difference between character sets and character encodings. Once you grasp that, everything else will be pretty simple.

A character set is just that, a bunch of characters, in the way a human would understand them. For instance "A,B,C,D" and so on is the character set of the letters in the English alphabet. The character set of HTML is fixed and cannot be changed. It is called UCS, the Universal Character Set, which is the character set of the Unicode standard.

UCS is touted by many as the be-all and end-all of character sets, as it is supposed to contain every character ever used by the human race. I am not qualified to make a judgement as to whether this is truly the case, and I've heard occassional grumblings from various people that UCS cuts corners in various alphabets. Nonetheless, if you want to conform to the HTML 4.0 specification, you're stuck with UCS. But more on that later.

In UCS, as in any formally defined character set, characters are in a certain order, so each one has a number. This is very convenient for computers, which tend to deal with numbers instead of characters. It would be nice, some may argue, if all computers stored text in UCS (or some other universally accepted character set) with each character being stored as its UCS number. For various reasons that are out of the scope of this tutorial (and would probably put you to sleep faster than a crateful of Valiums anyway) this is not the case. The actual bytes stored in a computer might follow a different convention. The system of taking these bytes and translating them into characters in a document's character set is called a character encoding.

The character encoding of an HTML document is not fixed, so you have to supply it in order for a user agent to understand your document. Different user agents understand different character encodings, but several commonly used character encodings are understood by practically any user agent.

This is a pretty simple concept: A character set is a list of characters that may appear in a document, and a character encoding is a way of storing characters on a computer as bits. HTML's character set is fixed, and it's called UCS, which is supposed to cover every character you'll ever use, only it doesn't, but for most people that's close enough. An HTML document's character encoding, however, has to be supplied, so that a user agent can take a stream of bits and understand it as a bunch of characters.

This issue is further clouded by the fact that most people tend to say "character set" when they mean "character encoding" in the HTML sense. Which term is more accurate is another discussion that I steadfastly refuse to go into. As long as you understand that HTML documents have a fixed character set and a configurable character encoding, you should be able to make sense of the rest of character issues.

index12345summary

http://www.internet.com/



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info

Copyright 2008 Jupitermedia Corporation All Rights Reserved.
Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
Avaya Article: Call Control XML - Powerful, Standards-Based Call Control
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
Microsoft Partner Portal Video: Microsoft Gold Certified Partners Build Successful Practices
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Controllers: Programming Application Logic - Part 2 · How to Use JavaScript to Validate Form Data · Controllers: Programming Application Logic
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Sprint Launches Mobile WiMAX Network · Albatron Downsizes with the KI780G Mini-ITX Motherboard · Can't Find a Wi-Fi Network? Make Your Own.

All Rights Reserved. Legal Notices.

URL: http://www.webreference.com/html/tutorial17/1.html

Produced by Stephanos Piperoglou
Created: December 02, 1999
Revised: December 15, 1999

Morty Proxy This is a proxified and sanitized view of the page, visit original site.