Why Is It Called “Punycode”?

18 Jul 2020 tech

A few weeks ago, I set out on a small internet journey to answer a question for my own curiosity: How did the term Punycode get its name? Just what is so “puny” about it?

I got in touch with Punycode RFC writer Adam M. Costello, and his answer is below.

But first, if you haven’t heard of Punycode, here is a short Wikipedia excerpt that describes what it is:

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the Letter-Digit-Hyphen (LDH) subset. For example, München (German name for Munich) is encoded as Mnchen-3ya.

Punycode - Wikipedia, the free encyclopedia

Punycode was created to encode international domain names into the standard ASCII character set so that computers and internet routers could understand them. This was necessary because back when the Domain Name System (DNS) was conceptualized in 1983, there was no widely-accepted standard for representing foreign language characters in computer systems. That standard wouldn’t materialize until Unicode, which began development four years later in 1987.

So, why is it called Punycode? The answer, according to my correspondence with Adam, is as follows:

Why “Punycode”? It rhymes with Unicode and is intended to encode Unicode strings. It is “puny” in three senses: The repertoire of characters used in the encoded strings is small, the encoded strings are short, and the implementation is small.

Correspondence with Adam Costello

So let’s break down those three reasons:

The character set used for the encoded strings is small (it’s limited to the very small LDH, or “Letter-Digit-Hyphen” ASCII subset that DNS understands natively)
The encoded strings are short (as per Wikipedia’s example, the punycode encoding of ドメイン名例.example is xn--eckwd4c7cu47r2wf.example, which is not unnecessarily long compared to its native representation)
The implementation is small (in other words the RFC contains reference code in C that is short and straightforward as far as encoding algorithms go)

Basically it’s called Punycode because it’s “puny”, or “small”, in those three senses.

The rest of this article covers my short trek around the internet to uncover that info.

The Journey

In trying to solve this mystery, my first stop was (of course) Google Search. But my searches of “origin of the term punycode” and “why is it called punycode” came up surprisingly blank.

My next stop was Webmasters Stack Exchange, where I posed the question:

I don’t see any explanation of the name in the RFC, in the Wikipedia article on punycode, or elsewhere in my google searches, but maybe I didn’t look hard enough.

Why is it called “punycode”? What’s “puny” about it?

What is the origin of the term “punycode”? - Webmasters SE

A few hours later, my question got a comment that pointed me to the Wikipedia talk page for Punycode. There, an anonymous poster had asked the same question, and another user had answered:

Because it is a PUN on “UNIcode” (Unicode) Uni rhymes with puny • Firejuggler86 (talk) 21:31, 4 April 2020 (UTC)

Still no source, but then another user suggested that I might email Adam M. Costello, UC Berkeley graduate and writer of the Punycode RFC. So I did.

Correspondence with Adam M. Costello

Following is Adam’s email to me, where he quotes a 2002 email that explains why he named it Punycode:

Date: Fri, 17 Jul 2020 21:31:18 +0000
From: “Adam M. Costello” <redacted@nicemice.net>
To: Maximillian Laumeister <max@maxlaumeister.com>
Subject: Re: Origin of the term “Punycode”?
Message-Id: XXXXXXXXXXXXXX.XXX@nicemice.net

Maximillian Laumeister <max@maxlaumeister.com> wrote:

I’m discussing the origin of the term “Punycode” in an online group, and we’re trying to figure out how the name was chosen.

The only ideas we have are that it could be a portmanteau of “puny” and “unicode”

Below is an excerpt from a message I sent to the IETF mailing list where IDN was developed.

AMC

From: “Adam M. Costello” <redacted@nicemice.net>
To: IETF idn working group <idn@ops.ietf.org>
Date: Sun, 6 Jan 2002 23:30:28 +0000
Subject: AMC-ACE-Z has a name: Punycode
Message-ID: <XXXXXXXXXXXXXX.XXXXXXX@nicemice.net>

AMC-ACE-Z, which has always been a working name, now has a real name: Punycode. I just submitted draft-ietf-idn-punycode-00.txt. It can be obtained now from:

http://www.cs.berkeley.edu/~amc/idn/

Why “Punycode”? It rhymes with Unicode and is intended to encode Unicode strings. It is “puny” in three senses: The repertoire of characters used in the encoded strings is small, the encoded strings are short, and the implementation is small.

In that message to me, Adam quoted a message that he wrote in 2002. Below is a full copy of the email that he pulled that quote from, which I was able to retrieve from the public psg.com mailing list archive:

To: IETF idn working group <idn@ops.ietf.org>
Subject: [idn] AMC-ACE-Z has a name: Punycode
From: “Adam M. Costello” <redacted@nicemice.net>
Date: Sun, 6 Jan 2002 23:30:28 +0000
Reply-to: IETF idn working group <idn@ops.ietf.org>
User-agent: Mutt/1.3.24i

AMC-ACE-Z, which has always been a working name, now has a real name: Punycode. I just submitted draft-ietf-idn-punycode-00.txt. It can be obtained now from:

http://www.cs.berkeley.edu/~amc/idn/

Why “Punycode”? It rhymes with Unicode and is intended to encode Unicode strings. It is “puny” in three senses: The repertoire of characters used in the encoded strings is small, the encoded strings are short, and the implementation is small.

No changes have been made to the algorithm. Here is a summary of changes since the previous draft:

AMC-ACE-Z has been renamed to Punycode.

Usage of the term “ACE” has been made consistent with the latest IDNA draft.

An incorrect claim regarding the conditions under which the encoded string can begin with a hyphen has been corrected.

A paragraph has been added to section 6.4 (Alternative methods for handling overflow) pointing out that if the Punycode decoder is used only inside the ToUnicode operation (see the IDNA draft), then it doesn’t need to check for overflow. (It also explains why.)

A Disclaimer and License appendix has been added:

Regarding this entire document or any portion of it (including the pseudocode and C code), the author makes no guarantees and is not responsible for any damage resulting from its use. The author grants irrevocable permission to anyone to use, modify, and distribute it in any way that does not diminish the rights of anyone else to use, modify, and distribute it, provided that redistributed derivative works do not contain misleading author or version information. Derivative works need not be licensed under similar terms.

In the sample implementation the mixed-case annotation handling in the encoder has changed slightly. The previous version, when faced with ASCII letters inconsistent with their case flags, would ignore the case flags and copy the letters verbatim. The present version honors the case flags and forces the letters to the case indicated by the flag. The new behavior is more useful and more intuitive. The change has no effect when the input is nameprepped (already lowercase) and the case flags are absent (all unset == lowercase).

AMC

So, there you have it. Punycode is actually “puny”. Mystery solved ;)

Why Is It Called “Punycode”?

The Journey

Correspondence with Adam M. Costello

More Articles Tagged #tech

Kindle Display for Eaton UPS (via NUT)

The Block Size Debate: 5 Years Later

The Absurdity Of Paying For 5G

Comments