What is Unicode?
Unicode is a universal character encoding standard that assigns a unique number (code point) to every character from every writing system in the world. It aims to represent all characters used in human communication, including letters, symbols, and emojis.
Quick Facts
| Full Name | Unicode Standard |
|---|---|
| Created | 1991 (Unicode 1.0) |
| Specification | Official Specification |
How Unicode Works
Unicode was developed starting in 1987 to solve the problem of incompatible character encoding systems. Before Unicode, different systems used different encodings (ASCII, ISO-8859, GB2312, etc.), causing text to display incorrectly across platforms. Unicode assigns each character a unique code point written as U+XXXX (e.g., U+0041 for 'A'). The standard includes over 150,000 characters covering 161 scripts. Unicode can be encoded in different formats: UTF-8 (variable-width, web standard), UTF-16 (used by Windows/Java), and UTF-32 (fixed-width).
Key Characteristics
- Universal standard covering all writing systems
- Over 150,000 characters from 161 scripts
- Code points written as U+XXXX format
- Multiple encoding forms: UTF-8, UTF-16, UTF-32
- Backward compatible with ASCII (first 128 code points)
- Includes emojis, symbols, and historic scripts
Common Use Cases
- Multilingual text processing
- Web content internationalization
- Database character storage
- Cross-platform text compatibility
- Emoji support in applications
Example
Unicode Code Points:
Char Code Point Name
A U+0041 LATIN CAPITAL LETTER A
a U+0061 LATIN SMALL LETTER A
中 U+4E2D CJK UNIFIED IDEOGRAPH-4E2D
😀 U+1F600 GRINNING FACE
€ U+20AC EURO SIGN
UTF-8 Encoding:
A (U+0041) → 41 (1 byte)
€ (U+20AC) → E2 82 AC (3 bytes)
中 (U+4E2D) → E4 B8 AD (3 bytes)
😀 (U+1F600) → F0 9F 98 80 (4 bytes)
JavaScript:
'A'.codePointAt(0).toString(16) // '41'
String.fromCodePoint(0x4E2D) // '中'