What is Unicode?

Unicode is a universal character encoding standard that assigns a unique number (code point) to every character from every writing system in the world. It aims to represent all characters used in human communication, including letters, symbols, and emojis.

Quick Facts

Full NameUnicode Standard
Created1991 (Unicode 1.0)
SpecificationOfficial Specification

How Unicode Works

Unicode was developed starting in 1987 to solve the problem of incompatible character encoding systems. Before Unicode, different systems used different encodings (ASCII, ISO-8859, GB2312, etc.), causing text to display incorrectly across platforms. Unicode assigns each character a unique code point written as U+XXXX (e.g., U+0041 for 'A'). The standard includes over 150,000 characters covering 161 scripts. Unicode can be encoded in different formats: UTF-8 (variable-width, web standard), UTF-16 (used by Windows/Java), and UTF-32 (fixed-width).

Key Characteristics

  • Universal standard covering all writing systems
  • Over 150,000 characters from 161 scripts
  • Code points written as U+XXXX format
  • Multiple encoding forms: UTF-8, UTF-16, UTF-32
  • Backward compatible with ASCII (first 128 code points)
  • Includes emojis, symbols, and historic scripts

Common Use Cases

  1. Multilingual text processing
  2. Web content internationalization
  3. Database character storage
  4. Cross-platform text compatibility
  5. Emoji support in applications

Example

Unicode Code Points:

Char  Code Point  Name
A     U+0041      LATIN CAPITAL LETTER A
a     U+0061      LATIN SMALL LETTER A
中    U+4E2D      CJK UNIFIED IDEOGRAPH-4E2D
😀    U+1F600     GRINNING FACE
€     U+20AC      EURO SIGN

UTF-8 Encoding:
A (U+0041)    → 41           (1 byte)
€ (U+20AC)    → E2 82 AC     (3 bytes)
中 (U+4E2D)   → E4 B8 AD     (3 bytes)
😀 (U+1F600)  → F0 9F 98 80  (4 bytes)

JavaScript:
'A'.codePointAt(0).toString(16)  // '41'
String.fromCodePoint(0x4E2D)     // '中'

Related Tools on QubitTool

Related Concepts