Grep

The grep command is a very handy little search utility and whilst quite simple is actually very powerful. The classic use case would be to search files for a piece of text in the current directory, which would be something like grep drupal *, meaning search all the files in the current directory for the word "drupal". Clearly you can change * to *.txt or *.php, however there are some other options which I find useful which are as follows:

  • -r meaning search all sub-directories, a recursive search
  • -i which means ignore the case, do a case-insensitive search
  • -c just count the number of occurrences in a file, this can return a lot of files with a count of zero (see below)
  • -H show filenames this is the default when there more than one file, however it is handy when there is one file, which is sometimes the case when grep is used in -exec of the find command
  • --color means highlight in colour the matches on the line
  • -v invert results, returns lines that do not match
  • -n shows line numbers
  • -A x shows x lines after the matching line
  • -B x shows x lines before the matching line
  • -C x is a shortcut where the x of -A and -B is the same
  • -E allows extended regular expressions

Clearly you can put these options together to do some handy things and you can pipe your output into another grep, using the first one to get what you need and the second to filter some you don't from the first execution. Recently I have noticed that when using something like "*.txt" as above it only works if there is a .txt file in the current directory, so to work around this you might need something like grep -ri --include '*.java' 'hello world' . (watch out for the vital . at the end) which does a case-insensitive search in all Java source files, recursively from the current directory looking for 'hello world'.

With extended regular expressions enabled via -E you can search for 'NAME|DESCRIPTION' meaning match NAME or DESCRIPTION, whereas without this you would need to escape the vertical bar and thus use 'NAME\|DESCRIPTION'. It is worth noting that you can use egrep instead of grep -E, if you prefer, they do the same thing, see Regular Expressions for more detail on regular expressions.

Examples

  • grep -ri --include '*.java' '\t' . - search all Java files for tab characters below the current directory
  • grep -ric 'hello world' *.txt - search for 'hello world', not case sensitively, in all .txt files in the current or any subdirectory and show a count of matches per file found
  • grep -iv 'warn' error_log.txt - which will display an error log file but exclude all warning lines, although this could exclude real errors, it is not perfect
  • grep -ric "some text" | grep -v ":0$" - Search from current directory and all sub-directories for the text and count how many times they appear in a file, then exclude files with 0 matches, so output ending with ":0"
  • ls -l | grep '^d' - list files in long format and pipe output into grep, where we search for lines beginning with 'd'
  • ls -l | grep -v '^d' - list files in long format and pipe output into grep, where we search for lines that do not begin with 'd'

A nice use of the egrep command or the "extended regular expression" option is with the lscpu command, first we should consider the regular output of the command, it is typically something like as follows, but with more information.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       36 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Stepping:            3
CPU MHz:             4008.000
CPU max MHz:         4008.0000
BogoMIPS:            8016.00

Let's suppose we just want a quick summary, so all we want to see is the following:

CPU(s):              8
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1

This can be done with either of the following commands:
lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
lscpu | egrep '^Thread|^Core|^Socket|^CPU\('