UTF-8 unicode

Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.

UTF-8 migration > UTF-8 justification > Unicode

Unicode

There is a lot of info about Unicode on the Web. If you want to learn more about it, these links are recommended:

Speaking properly, although it's commonly called that, Unicode isn't an encoding itself, but don't panic! Instead, it's a standard that defines the rules to represent nearly every known character in the world. But Unicode doesn't define how every character is stored by electronic devices. This is a character encoding responsibility.

So, under Unicode, we can find different character encodings that, following the Standard rules, are able to store information under different formats, performing the mapping between characters and their internal (electronic) representation differently.

The main two mapping systems able to work with the Unicode Standard are UTF (Unicode Transformation Format) and UCS (Universal Character Set). More yet, each of these mapping systems can use different internal representations (7/8 bit encoding, fixed or variable length...). In short, there are different representations (encodings) of the Unicode Standard, each one with its own pros and cons.

So, we have to choose one of them!

Luckily, one of them stands out above the crowd. It's the UTF-8 character encoding. Its rapid adoption by practically all the new technologies (web browsers, operating systems, databases, xml...) and the support offered by any modern programming language to handle it (php, java...) makes it the perfect choice!

Now, let's go to know something about the current situation.

Documentation

UTF-8 unicode

Unicode