How do I use the Grep command with regular expressions under Linux operating systems?
Linux comes with GNU grep, which supports extended regular expressions. GNU grep is the default on all Linux systems. The grep command is used to locate information stored anywhere on your server or workstation.
Regular Expressions
Regular Expressions is nothing but a pattern to match for each input line. A pattern is a sequence of characters. Following all are examples of pattern:
^w1 w1|w2 [^ ]
grep Regular Expressions Examples
Search for 'vivek' in /etc/passswd
grep vivek /etc/passwd
Sample outputs:
vivek:x:1000:1000:Vivek Gite,,,:/home/vivek:/bin/bash vivekgite:x:1001:1001::/home/vivekgite:/bin/sh gitevivek:x:1002:1002::/home/gitevivek:/bin/sh
Search vivek in any case (i.e. case insensitive search)
grep -i -w vivek /etc/passwd
Search vivek or raj in any case
grep -E -i -w 'vivek|raj' /etc/passwd
The PATTERN in last example, used as an extended regular expression.
Anchors
You can use ^ and $ to force a regex to match only at the start or end of a line, respectively. The following example displays lines starting with the vivek only:
grep ^vivek /etc/passwd
Sample outputs:
vivek:x:1000:1000:Vivek Gite,,,:/home/vivek:/bin/bash vivekgite:x:1001:1001::/home/vivekgite:/bin/sh
You can display only lines starting with the word vivek only i.e. do not display vivekgite, vivekg etc:
grep -w ^vivek /etc/passwd
Find lines ending with word foo:
grep 'foo$' filename
Match line only containing foo:
grep '^foo$' filename
You can search for blank lines with the following examples:
grep '^$' filename
Character Class
Match Vivek or vivek:
grep '[vV]ivek' filename
OR
grep '[vV][iI][Vv][Ee][kK]' filename
You can also match digits (i.e match vivek1 or Vivek2 etc):
grep -w '[vV]ivek[0-9]' filename
You can match two numeric digits (i.e. match foo11, foo12 etc):
grep 'foo[0-9][0-9]' filename
You are not limited to digits, you can match at least one letter:
grep '[A-Za-z]' filename
Display all the lines containing either a "w" or "n" character:
grep [wn] filename
Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are:
- [:alnum:] - Alphanumeric characters.
- [:alpha:] - Alphabetic characters
- [:blank:] - Blank characters: space and tab.
- [:digit:] - Digits: '0 1 2 3 4 5 6 7 8 9'.
- [:lower:] - Lower-case letters: 'a b c d e f g h i j k l m n o p q r s t u v w x y z'.
- [:space:] - Space characters: tab, newline, vertical tab, form feed, carriage return, and space.
- [:upper:] - Upper-case letters: 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z'.
In this example match all upper case letters:
grep '[:upper:]' filename
Wildcards
You can use the "." for a single character match. In this example match all 3 character word starting with "b" and ending in "t":
grep '\<b.t\>' filename
Where,
- \< Match the empty string at the beginning of word
- \> Match the empty string at the end of word.
Print all lines with exactly two characters:
grep '^..$' filename
Display any lines starting with a dot and digit:
grep '^\.[0-9]' filename
Escaping the dot
The following regex to find an IP address 192.168.1.254 will not work:
grep '192.168.1.254' /etc/hosts
All three dots need to be escaped:
grep '192\.168\.1\.254' /etc/hosts
The following example will only match an IP address:
egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' filename
The following will match word Linux or UNIX in any case:
egrep -i '^(linux|unix)' filename
How Do I Search a Pattern Which Has a Leading - Symbol?
Searches for all lines matching '--test--' using -e option Without -e, grep would attempt to parse '--test--' as a list of options:
grep -e '--test--' filename
How Do I do OR with grep?
Use the following syntax:
grep 'word1|word2' filename
How Do I do AND with grep?
Use the following syntax to display all lines that contain both 'word1' and 'word2'
grep 'word1' filenae | grep 'word2'
How Do I Test Sequence?
You can test how often a character must be repeated in sequence using the following syntax:
{N}
{N,}
{min,max}Match a character "v" two times:
egrep "v{2}" filename
The following will match both "col" and "cool":
egrep 'co{1,2}l' filename
The following will match any row of at least three letters 'c'.
egrep 'c{3,}' filename
The following example will match mobile number which is in the following format 91-1234567890 (i.e twodigit-tendigit)
grep "[[:digit:]]\{2\}[ -]\?[[:digit:]]\{10\}" filename
How Do I Hightlight with grep?
Use the following syntax:
grep --color regex filename
How Do I Show Only The Matches, Not The Lines?
Use the following syntax:
grep -o regex filename
Regular Expression Operator
| Regex operator | Meaning |
|---|---|
| . | Matches any single character. |
| ? | The preceding item is optional and will be matched, at most, once. |
| * | The preceding item will be matched zero or more times. |
| + | The preceding item will be matched one or more times. |
| {N} | The preceding item is matched exactly N times. |
| {N,} | The preceding item is matched N or more times. |
| {N,M} | The preceding item is matched at least N times, but not more than M times. |
| - | Represents the range if it's not first or last in a list or the ending point of a range in a list. |
| ^ | Matches the empty string at the beginning of a line; also represents the characters not in the range of a list. |
| $ | Matches the empty string at the end of a line. |
| \b | Matches the empty string at the edge of a word. |
| \B | Matches the empty string provided it's not at the edge of a word. |
| \< | Match the empty string at the beginning of word. |
| \> | Match the empty string at the end of word. |
grep vs egrep
egrep is the same as grep -E. It interpret PATTERN as an extended regular expression. From the grep man page:
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{,
\|, \(, and \).
Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in
grep -E patterns and should use [{] to match a literal {.
GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification.
For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax error in the regular expression.
POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.
References:
- man page grep and regex(7)
- info page grep
You should follow me on twitter here or grab rss feed to keep track of new changes.
This FAQ entry is 2 of 7 in the "Linux / UNIX grep Command Tutorial" series. Keep reading the rest of the series:







![Linux: Find Out My Group Name [ Group Memberships ]](http://s13.cyberciti.org/images/shared/rp/3/7.jpg)



{ 42 comments… read them below or add one }
if you want know the line number of found match so you can use -n attributes.
cmd: grep -n printf *.c
This will show you all printf in c files with line number.
Some time we need the result in reverse manner. like i want to search all line that don’t have ‘printf’.
cmd: grep -v printf *.c
this will show all line that don’t have printf.
HAPPY PROGRAMMING !!!!!!!!!
thx for the regex examples. verry usefull
there is also % grep -F;, formerly known as % fgrep;, which provides fixed string matching and is faster.
by man page, use of % fgrep; and % egrep; is deprecated and % grep -F; and % grep -E; should be used instead.
great summary!
Ok If i am tailing a firewall log with
tail -F /log/myfirewall.log |grep -i 135
I get results for port 135 but also 1352 for example, how do i use grep to only display port 135 and not 1352.
tail -f /log/myfirewall.log |grep -w '135'Many Thanks Vivek for your great post, but let me correct on command with grep using wildcards, you typed :
grep ‘^\.[0-9]‘ filename
Display any lines starting with a dot and digit, but this is wrong, and the right as the following:
grep -E ‘^\.|[0-9]‘ wildcards.txt
Thanks,
The above example “grep -E ‘^\.|[0-9]‘ wildcards.txt” is not also correct. This will match “a9b” which should not be matched.
The correct expression is: grep -E “^\.|^[0-9]” wildcards.txt
Note: the caret ‘^’ when appear at the beginning indicates a line start anchor. However this is not all. Due to the OR ‘|’ symbol in this case, a line can start matching with “[0-9]” and to ensure that all lines that doesn’t start wilh ‘dot’ when takes alternate path must ensure that it starts with only digit, we need to prefix another ‘^’ symbol.
Only thing I miss from other Unices is grepping for a metacharacter. For instance, in dg/ux to count the number of tabs in a document I could do a
grep -c \t
or a \n for newlines, a \f for page feeds, etc.
Apparently, this doesn’t work in Linux – I’ve changed those scripts to perl scripts.
To use Tabs, use \t as expected followed by a qualifier (ex. *, +, ?)
For example:
Will find find 1 or No Tabs. \t* will find 0 or more Tabs.
Although I must say, this comment thread got me thinking to add the qualifier. Thanks to all who post ideas, questions, etc. so the rest of us can learn!!
David
Unfortunately, that seems not to work – at least in RHEL5
[tim@kyushu ~]$ cat testgrep
Test
T est
Test 1
T e s t
notatest
test 1
(All of those whitespaces are tabs)
[tim@kyushu ~]$ grep -e ‘\t?’ testgrep
[tim@kyushu ~]$
Maybe is upper-case ‘E’ ? Just a shot-in-the-dark.
David
-E returns… everything. Including the lines that absolutely have no tab in them.
[tim@kyushu ~]$ cat testgrep
Test
T est
Test 1
T e s t
notatest
test 1
thereisnotabhere
[tim@kyushu ~]$ grep -E ‘\t?’ testgrep
Test
T est
Test 1
T e s t
notatest
test 1
thereisnotabhere
[tim@kyushu ~]$
Nice article.
One comment. You say:
> The following regex to find an IP address 192.168.1.254 will not work:
> grep ’192.168.1.254′ /etc/hosts
Actually, it *will* work; it will find the line you are looking for.
There’s just a small chance of matching other things, too.
Tim:
You can do this with GNU grep also. For newlines, just use quotes before and after, e.g.
grep -c '' filename
(of course you can accomplish the same thing with
wc -l filename:)
Tabs (and I assume formfeeds as well, though I haven’t tested it) can also be entered at the command line. Type Ctrl-V before hitting tab and you’ll get a literal tab instead of triggering filename autocompletion.
Vance -
The nl really isn’t a problem, because, as you pointed out, there are other ways around it. Tabs are what I was shooting for, and your solution works perfectly! Thanks very much…
– tim –
Instead of:
egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' filenameI suggest:
egrep '([0-9]{1,3}\.){3}[0-9]{1,3}' filenameor better:
egrep '([1-9][0-9]{0,2}\.){3}[1-9][0-9]{0,2}' filenamevalid IP address range is 0.0.0.0 to 255.255.255.255. So, I suggest the following:-
egrep ‘[0-255]{1,3}\.[0-255]{1,3}\.[0-255]{1,3}’ my_file.txt
grep is very useful for analysing system resources. for e.g.
ps auxw | grep mysql
the tail -f command can be piped to grep like this…
tail -f /var/log/mysql-slow.log | grep ‘someTable’
Show the 10 lines After and Before the selected word using -A 10 -B 10 -C 10 (for both, after and before)
Other useful switches are:
-r, –recursive
-l, –files-with-matches
Shantanu ,how can I get the line above of my search.
Thanks
-B2 before context
Other useful options are:
-A2 after context
-C2 it will return 2 lines before and after context
for more:
man grep
> How Do I do AND with grep?
>
> Use the following syntax to display all lines that contain both ‘word1′ and ‘word2′
> $ grep ‘word1′ _filename_ | grep ‘word2′
Thanks for the information.
However – why does the message at the top of the page have to keep changing? It means the text I am reading keeps bouncing up and down every few seconds, which is really annoying when you’re trying to read it!
What are you talking about?
How can I find all the rows that contain a certain string a given number of times?
How do i find a string using grep.
Say input file has
Vi_beaconen_h i_beaconen_h 0 PWL(
I want to print only ” i_beaconen_h”
If i use
perl -lne ‘/ i/ and print’ try.txt
It return whole line
If i use
grep -o ‘ i’ try.txt
It returns only ” i”
I want it to return ” i_beaconen_h” [Or anything with i*]
I guess i m pretty new to perl and unix.
Mani !
how to display all lines the lines that have less than 9 character ?
> how to display all lines the lines that have less than 9 character ?
Use the regexp feature below, with a preceding character expression
{n,m}
The preceding item is matched at least n times, but not more than m times.
eg.
^[\w\s]{0,8}$ will match rows of 0 to 8 word or space characters.
Dear all,
I want to know how to grep an apache log file and save some details into a database,
say like, somebody access a url like http://site.com/test. so in that i wanted to save the access url time and from which ip, only this three details i wanted to save in mysql database.
Thanks In advance.
How would I search a file and print 4-letter words that start and end with the letter a?
Count all words that contain the four letter sequence A, then two more letters, and then another A?
Count all words that contain a letter, two letters, and then a repeat of the first letter?
Thanks. Very useful information
pls just hlp me out with this question..
how will i Find all lines in a file with exactly 9 characters in them using grep command.
grep ‘^.\{9\}$’ filename
How do I find the occurence of the following pattern
[x,y] (in the square brackets), where x and y are one or more digits.
Meaning if there is a pattern [,8], it should not be displayed in the output
a=’[12,111]‘
echo “$a” | grep “\[[0-9][0-9]*,[0-9][0-9]*\]”
Had to do it this way in RHEL5 because of issues with some of the regular expressions. i.e. echo “$a” | grep “\[[0-9]+,[0-9]+\]” should work but doesn’t and echo “$a” | grep -e “\[[0-9]{1,}\,[0-9]{1,}\]” should work but doesn’t…
a=’[12,111]‘
echo “$a” | grep “\[[0-9][0-9]*,[0-9][0-9]*\]”
Had to do it this way in RHEL5 because of issues with some of the regular expressions. i.e. echo “$a” | grep “\[[0-9]+,[0-9]+\]” should work but doesn’t and echo “$a” | grep -e “\[[0-9]{1,}\,[0-9]{1,}\]” should work but doesn’t…
Hi Guys,
I’m just newbie with unix and is wondering if there’s a way to grep a word in a vertical manner.
Example from a datafile:
a b c d e f g h
a b c g e f g h
a b c r e f g h
a b c e e f g h
a b c p e f g h
a b c d e f g h
On the third column from rows 2 to 5, the word ‘grep’ is formed vertically. Is there a way I can grep this or are there any other commands I could leverage?
Waiting for your expert advise : )
Hi,
It would be done by the below ways:
cat word.txt | cut -d’ ‘ -f4 | grep [g,r,e,p]
g
r
e
p
awk ‘{print $4;}’ word.txt |grep [^d]
g
r
e
p
cat word.txt | tr d ‘ ‘ | cut -f4 -d\
g
r
e
p
Hi,
I have to validate a a String against a regular expression for a date format ‘YYYYMMddhhmmss’.I have tested the below code,
temp=`echo $file_timestamp | egrep ‘^(20)[0-9][0-9](0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])(0[0-9]|1[0-9]|2[0123])([0-5])[0-9]([0-5])[0-9]$’`;
The following returns the content of file_timestamp if it satisfies the pattern else returns null to the variable temp. If anyone can validate my understanding for the above snippet.
Thanks,
Sumit
Hi guys
I have to export data from hundreds of output files, and all the output files contain this information based on some rules.
It’s starting with ASM2_ , than sometimes comes BSSE_ sometimes don’t, than every time comes one of these H3CO, BF3CO, BH3NH3, BF3NH3, BH3PH3, BH3, BF3, CO, NH3, PH3 than _ than one of these HF, B3LYP, PW91 than / and than one of these 6-31G(d), 6-311G(d), 6-311++G(2d,p) and this is the end of line.
One example would be
ASM2_BH3CO_HF/6-311++G(2d,p)
The problem is that these things will appear many times alone in the text, but just once in this order and as one line from start to end.
Can I do something about it with grep, or I would have to use something else?
Thx
Hi, does anyone know how I can use grep to only show word matches that start with c for example? Cuz I was thinking of using the wildcard “c*” but that wouldn’t work in grep since it uses regex which has a different meaning for *. So what I want to ask is: What is the regex equivalent of “c*”? Thanks in advance.
Jason, you can use the “word boundary” expression, which depending on what tool you’re using can be either \b or \<
* is a quantifier, so "c*" would match "zero, one or more 'c' characters". You need exactly one c followed by anything, that would be:
\bc.*