Getting a Better Handle on International Domain Names and Punycode, (Tue, Aug 26th)
国际域名(IDN)使用Punycode编码显示为ASCII格式(以"xn--"开头),常见问题包括无效编码和混合脚本(不同语言字符混用),可利用Python模块检测异常。 2025-8-26 16:34:11 Author: isc.sans.edu(查看原文) 阅读量:11 收藏

International domain names (IDN) continue to be an interesting topic. For the most part, they are probably less of an issue than some people make them out to be, given that popular browsers like Google Chrome are pretty selective in displaying them. But on the other hand, they are still used legitimately or not, and keeping a handle on them is interesting.

When analyzing DNS traffic, you should see the Punycode encoding for these domain names. Punycode is defined in RFC 3492 [1]. Punycode encoded domain names start with "xn--", making identifying them easy. 

Several anomalies may happen with Punnycode; luckily, some Python modules can help us identify them.

1 - Invalid Punycode

The Punycode standard is complex, and you may end up with invalid Punycode domains.

2 - Mixed Script

That is the most interesting issue. You are detecting if a domain name mixes different languages. There is no easy way to identify the "language"; instead, we are using the "Script". The Latin script can be used for most European languages. The "Script" identifies a group of languages using the same characters. In Python, the "unicodedata2" module can be used to determine the script of a particular character.

The Python "unicodedata2" module can be used to look up the Unicode name of a character, and the first word in a Unicode name identifies the script the character is a part of. Mixing different scripts in a domain name is suspect as legit international domain names should only use one language.

You can find a quick Python implementation on GitHub: https://github.com/jullrich/idntest

[1] https://datatracker.ietf.org/doc/html/rfc3492


Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|


文章来源: https://isc.sans.edu/diary/rss/32234
如有侵权请联系:admin#unsafe.sh