Buscar herramientas...

Buscar herramientas...

Codificador/Decodificador Punycode

Convertir nombres de dominio internacionales entre Unicode y ASCII (IDN)

How Codificador/Decodificador Punycode Works

The Domain Name System (DNS) was originally designed only to handle a subset of ASCII characters (A-Z, 0-9, and the hyphen). However, as the internet became global, there was a critical need to support Internationalized Domain Names (IDNs) containing characters from languages like Chinese, Arabic, or German. Punycode is the specialized encoding system, defined in RFC 3492, that transforms these Unicode strings into ASCII-compatible "A-labels" (starting with xn--).

The Punycode algorithm utilizes a unique "Bootstring" transformation process:

  1. Direct Character Extraction: The encoder first identifies all standard ASCII characters in the domain (e.g., in münchen.de, these are m, n, c, h, e, n, ., d, e).
  2. Basic String Assembly: These ASCII characters are moved to the front of the encoded string, followed by a delimiter hyphen (-).
  3. Unicode Character Analysis: The tool then focuses on the non-ASCII "extended" characters (like ü).
  4. Delta Encoding: Instead of representing the characters directly, Punycode calculates the mathematical "distance" (delta) between the Unicode code points. This allows complex strings to be represented with very few characters.
  5. Mixed-Radix Representation: These deltas are converted into a string of ASCII characters using a specialized Mixed-Radix system.
  6. Prefix Application: The final string is prefixed with xn-- to notify browsers and DNS servers that it is an internationalized domain.

The History of Punycode and RFC 3492

The Punycode specification was authored by Adam M. Costello in 2003 as a core component of the IDNA (Internationalizing Domain Names in Applications) framework. Before Punycode, various incompatible methods were proposed, but Costello's system won out due to its high efficiency and its ability to handle extremely long Unicode strings within the 63-character limit of a DNS label. Today, Punycode is an invisible but essential part of the global internet, used by every Web Browser and Registrar.

Technical Comparison: Punycode vs. URL Encoding vs. UTF-8

It is important to understand that Punycode is specifically for domain names, not for the entire URL.

Feature Punycode (RFC 3492) URL Encoding (Percent) UTF-8 (Raw)
Primary Target DNS Labels (Domains) URL Paths and Queries Data Storage / APIs
Example xn--mnchen-3ya m%C3%BCnchen 0xC3 0xBC
Prefix xn-- None None
Output Set a-z, 0-9, - a-z, A-Z, 0-9, ., -, % Full Unicode Range
Character Limit 63 Characters per label Browser Dependent Virtually Unlimited

By using a dedicated Punycode Encoder, you ensure your international domain is IDNA-Compliant, making it accessible to users around the world.

Security Considerations: Homograph and Phishing Attacks

Punycode's power to represent different characters comes with significant security risks:

  • Internationalized Domain Name (IDN) Phishing: Attackers can register a domain that looks identical to a trusted site but uses a different Punycode representation (e.g., аррӏе.com using Cyrillic characters instead of apple.com). This is known as a Homograph Attack.
  • Visual Deception: To combat this, modern browsers (like Chrome and Firefox) will often display the raw Punycode xn--... if they suspect a domain is trying to trick a user.
  • Client-Side Privacy: To maintain the highest Data Privacy standards, all encoding and decoding happens locally in your browser. Your sensitive domain research or private URLs are never transmitted to our servers.

Frequently Asked Questions

This is a security feature. If a domain uses characters from multiple different scripts (e.g., English mixed with Cyrillic), the browser may show the Punycode to prevent you from being fooled by a Homograph Attack.

Herramientas relacionadas