Unicode
In isolation, a computer is free to choose whatever character set mapping it likes. If the computer wants the letter a to equal the number 10, then so be it. But when computers start talking to each other, they need to use a common character set. If two computers used different character sets, then when one computer transferred a string to the other, they would end up thinking the strings contained different characters.
There have been several standards over the years, but the most modern standard is Unicode. It defines the character set mapping that almost all computers use today.
Note: You can read more about Unicode at its official website, http:// unicode.org/.
As an example, consider the word cafe. The Unicode standard tells us that the
letters of this word should be mapped to numbers like so:
The number associated with each character is called a code point. So in the example above, c uses code point 99, a uses code point 97, and so on.
Of course, Unicode is not just for the simple Latin characters used in English, such as c, a, f and e. It also lets you map characters from languages around the world. The word cafe, as you’re probably aware, is derived from French, in which it’s written as café. Unicode maps these characters like so:
And here’s an example using Chinese characters (this, according to Google translate, means “Computer Programming”):
You’ve probably heard of emojis, which are small pictures you can use in your text. These pictures are, in fact, just normal characters and are also mapped by Unicode. For example:
This is only two characters. The code points for these are very large numbers, but each is still only a single code point. The computer considers these as no different than any other two characters.
Note: The word “emoji” comes from Japanese, where “e” means picture and “moji” means character.