Linux / Unix: Unicode and HTML Characters Lookup By Name or Number

I need to replace special characters with equivalent hexadecimal unicode under Linux or Unix like operating system. How do I list or find out unicodes for given characters?

Tutorial details
Difficulty Easy (rss)
Root privileges No
Requirements Perl v5.8+
Time N/A
You need to use the unum program which is written in Perl. From the man page:

It is a command line utility which allows you to convert decimal, octal, hexadecimal, and binary numbers; Unicode character and block names; and HTML/XHTML character entity names into one another. It can be used as an on-line special character reference for Web authors. This program written in portable Perl which allows you to look up Unicode and HTML characters by name or number, and inter convert numbers in decimal, hexadecimal, and octal bases.

Use the unum program to insert special characters into a document or a text field. This is useful for characters that are not available on your keyboard.

Download and Install unum program

Type the following wget command:
$ [ ! -d ~/bin/perl ] && mkdir -p ~/bin/perl
$ cd ~/bin/perl
$ wget http://www.fourmilab.ch/webtools/unum/download/unum.tar.gz

Untar unum.tar.gz using tar command, enter:
$ tar xvf unum.tar.gz
Use ln command to create a softlink, run:
$ ln -s unum.pl unum
Set PATH:
$ export PATH=$PATH:$HOME/bin:$HOME/bin/perl

How do I use unum program?

The syntax is:

unum arg
unum query 
unum character 
unum a 
unum 9

Please note that all name queries are case-insensitive and accept regular expressions. Be sure to quote regular expressions if they contain characters with meaning to the shell.

Examples

Perform unicode look for a character called ‘d’, run:
$ unum d
Sample outputs:

   Octal  Decimal      Hex        HTML    Character   Unicode
    0144      100     0x64      d    "d"         LATIN SMALL LETTER D

To perform unicode look up for ‘abc’ (non-digit), enter:
$ unum abc
Sample outputs:

   Octal  Decimal      Hex        HTML    Character   Unicode
    0141       97     0x61       a    "a"         LATIN SMALL LETTER A
    0142       98     0x62       b    "b"         LATIN SMALL LETTER B
    0143       99     0x63       c    "c"         LATIN SMALL LETTER C

Other examples:

        ## arg ##         ## Description ##
	147               Decimal number
	0371              Octal number
	0xfa75            Hexadecimal number (letters may be A-F or a-f)
	0b11010011        Binary number
	'∫π'  One or more XHTML numeric entities (hex or decimal)
	xyz               The characters xyz (non-digit)
	c=7Y              The characters 7Y (any Unicode characters)
	b=cherokee        List Unicode blocks containing "CHEROKEE"
	h=alpha           List XHTML entities containing "alpha"
	n=aggravation     Unicode characters with "AGGRAVATION" in the name
	n=^greek.*rho     Unicode characters beginning with "GREEK" and containing "RHO"
	l=gothic          List all characters in matching Unicode blocks

A note about GUI programs

You can use gucharmap GUI tool that allows you to browse through all the available Unicode characters and categories for the installed fonts, and to examine their detailed properties. You can start this app by visiting Applications menu:
Applications menu ▸ Choose Accessories ▸ Character Map
Or, execute the following command:
$ gucharmap
OR
$ gnome-character-map
Sample outputs:

Fig.01: The Character Map application

To display detailed information about a character, perform the following steps:
  1. Select a character set from the Script or Unicode Block list box. Example: Basic Latin
  2. Select a character from the Character Table tabbed section. Example: @
  3. Click on the Character Details tabbed section.

Sample outputs:

Fig.02: Get detailed information about a character

A note about KDE users

Use KCharSelect utility for KDE desktop:

KCharSelect is a tool to select special characters from all installed fonts and copy them into the clipboard.

Sample outputs:

Fig.03: KDE – unicode character look up utility

A note about Mac OS X unix users

On the Mac OS X, you need to use the Character Viewer application.

Check out related media

This tutorial is also available is a quick video format:



(Video 01: Linux / Unix: Unicode Character Map And Lookup Tools)
References
  1. unum home page.
🐧 If you liked this page, please support my work on Patreon or with a donation.
🐧 Get the latest tutorials on SysAdmin, Linux/Unix, Open Source/DevOps topics:
CategoryList of Unix and Linux commands
File Managementcat
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
3 comments… add one

Leave a Reply

Your email address will not be published. Required fields are marked *

Use HTML <pre>...</pre>, <code>...</code> and <kbd>...</kbd> for code samples.