How to get domain name from URL in bash shell script

Posted on in Categories , , , , last updated May 16, 2017

How can I extract or fetch a domain name from a URL string (e.g. https://www.cyberciti.biz/index.php) using bash shell scripting under Linux or Unix-like operating system?

You can use standard Unix commands such as sed, awk, grep, Perl, Python and more to get domain name from URL. No need to write regex. It is pretty simple.
How to get domain name from URL in bash shell script
Let use see various commands and option to grab the domain part from given variable under Linux or Unix-like system.

Get domain name from full URL

Say your url name is stored in a bash shell variable such as $x:
x='https://www.cyberciti.biz/faq/copy-command/'
You can use the awk as follows:
echo "$x" | awk -F/ '{print $3}'
### OR ###
awk -F/ '{print $3}' <<<$x

Sample outputs:

www.cyberciti.biz

Extract domain name from URL using sed

Here is a sample sed command:
url="https://www.cyberciti.biz/faq/copy-command"
echo "$url" | sed -e 's|^[^/]*//||' -e 's|/.*$||'

Extract domain name from URL using bash shell parameter substitution

Another option is to use bash shell parameter substitution:

# My shell variable 
f="https://www.cyberciti.biz/faq/copy-command/"
 
## Remove protocol part of url  ##
f="${f#http://}"
f="${f#https://}"
f="${f#ftp://}"
f="${f#scp://}"
f="${f#scp://}"
f="${f#sftp://}"
 
## Remove username and/or username:password part of URL  ##
f="${f#*:*@}"
f="${f#*@}"
 
## Remove rest of urls ##
f=${f%%/*}
 
## Show domain name only ##
echo "$f"

Shell script example

A shell script to purge urls from Cloudflare by matching domain name part:

#!/bin/bash
zone_id=""
api_key=""
 
urls="$@"
bon=$(tput bold)
boff=$(tput sgr0)
c=1
[ "$urls" == "" ] && { echo "Usage: $0 url"; exit 1; }
 
clear
echo "Purging..."
echo
for u in $urls
do
     echo -n "${bon}${c}${boff}.${u}: "
     ## Get domain name ##
     d="$(echo $u | awk -F/ '{ print $3}')"
     ## Set API_KEY, Email_ID, and ZONE_ID as per domain ##
     case $d in
	     www.cyberciti.biz) zone_id="ID_1"; api_key="MY_KEY_1"; email_id="[email protected]";;
	     theos.in) zone_id="ID_2"; api_key="MY_KEY_2"; email_id="[email protected]";;
	     *) echo "Domain not configured."; continue;;
     esac
     ## Do it ##
     curl -X DELETE "https://api.cloudflare.com/client/v4/zones/${zone_id}/purge_cache" \
     -H "X-Auth-Email: ${email_id}" \
     -H "X-Auth-Key: ${api_key}" \
     -H "Content-Type: application/json" \
     --data "{\"files\":[\"${u}\"]}"
     echo
     (( c++ ))
done
echo

See also

4 comment

  1. The pure bash version can be shortened to

    url="https://www.cyberciti.biz/faq/copy-command/"
    # Remove protocol
    url="${url#*://}"
    # Remove username and/or username:password url="${url#*@}"
    # Remove rest of url
    url=${url%%/*}
    # Show domain name only
    echo "$url"

  2. The awk version can be improved to remove any userid/password with

    awk -F/ '{sub("^[^@]+@","",$3); print $3}' <<<$x

  3. The sed version can be improved to remove any userid/password with
    sed -e 's|^[^/]*//||' -e '/^[^@]*@/s///' -e 's|/.*$||' <<<$x

Leave a Comment