awk

AWK is a powerful programming language for text processing. It was created at Bell Labs in the 70s. The name AWK comes from the surnames of its three authors: Alfred Aho, Peter Weinberger, and Brian Kernighan.

SYNTAX: awk 'pattern { action }' [file]
Examples
Print specific columns from a file
awk '{print $1, $3}' filename
Extract and print specific fields from csv
awk -F',' '{print "Name: " $1, "Salary: " $3}' data.csv
Print lines between two patterns
awk '/start_pattern/,/end_pattern/' filename
Reverse the order of columns, use comma (,) as the field separator
awk -F',' '{for(i=NF;i>=1;i--) printf $i" "; print ""}' filename
Convert spaces to tabs in a file
awk '{gsub(/    /,"\t"); print}' input.txt > output.txt
Display lines with more than 3 fields
awk 'NF > 3' filename
Calculate and print total size of files in a directory
ls -l /path/to/directory | awk '{total += $5} END {print "Total Size: ", total/1024, "KB"}'
Print lines where the 3rd column is greater than 50
awk '$3 > 50 {print $0}' filename
Print lines where the 4th column is blank
 awk -F',' '$4 == ""' filename
Count the number of lines where the 4th column is blank
awk -F',' '$4 == ""' filename | wc -l
Print lines that are duplicates based on the 4th column in a CSV
awk -F ',' '{if (++seen[$4] == 2) print}' filename 
Count lines that are duplicates based on the 4th column in a CSV
awk -F ',' '{if (++seen[$4] == 2) print}' filename | wc -l
Count the number of rows in a file
awk 'END {print NR}' filename
Print lines matching a pattern
awk '/pattern/ {print $0}' filename
Print unique values in a column
awk '{print $1}' filename | sort | uniq
Show unique lines (without duplicates)
awk '!seen[$0]++' filename.csv
Identify duplicate lines
awk 'seen[$0]++' filename.csv
Replace text in a file
awk '{gsub(/old_text/, "new_text"); print}' filename
Format output
ps aux | awk '{printf "%-10s %-10s %-20s\n", $1, $2, $11}'
Extract information based on delimiter
cat /etc/passwd | awk -F: '{print "Username: " $1, "UID: " $3, "Shell: " $NF}'
Extract and sum numeric values in a column
awk '{if ($2 ~ /^[0-9]+$/) sum += $2} END {print "Sum: ", sum}' filename
Process files in some_directory, get first column, remove double quotes, sort, get unique, save to clean.csv
awk -F',' '{gsub(/"/, "", $1); print $1}' some_directory/* | sort | uniq > clean.csv
Identify unique lines in one file not present in another
awk 'FNR==NR {seen[$0]=1; next} !seen[$0]' product_test.csv ledger_test.csv > not_found_test.txt
Identify common lines in two files
awk 'FNR==NR {seen[$0]=1; next} seen[$0]' product_test.csv ledger_test.csv > found_test.txt
Count empty or whitespace lines in csv
awk -F',' '$4 ~ /^ *$/ {count++} END {print count}' "test.csv"
Extract and sort fourth column, save to new file
awk -F',' '{print $4}' filename.csv | sort > new_file.txt
Extract, sort, and get unique values from fourth column
awk -F',' '{print $4}' filename.csv | sort | uniq > uniques.txt
Extract, sort, and get duplicate values from fourth column
awk -F',' '{print $4}' filename.csv | sort | uniq -d> duplicates.txt
Count blank values in the fourth column
awk -F',' '$4 ~ /^ *$/ {count++} END {print count}' filename.csv
Sum of the nth column
awk '{ sum += $n } END { print sum }' data.txt
Maximum value of the nth column
awk 'NR == 1 { max = $n } { if ($n > max) max = $n } END { print max }' data.txt
Minimum value of the nth column
awk 'NR == 1 { min = $n } { if ($n < min) min = $n } END { print min }' data.txt
Average of the nth column
awk '{ sum += $n } END { print sum / NR }' data.txt

Expression Operators

Operation Operators Example Meaning
assignment = += -= *= /= %= ^= x = x * 2 x = x * 2
conditional ?: x ? y : z If x is true, then y; else z
logical OR || x || y 1 if x or y is true; 0 otherwise
logical AND && x && y 1 if x and y are true; 0 otherwise
array membership in i in a 1 if a[i] exists; 0 otherwise
matching ~ !~ $1 ~ /x/ 1 if the first field contains an x; 0 otherwise
relational < < = > >= == != x == y 1 if x equals y; 0 otherwise
concatenation “a” “bc” “abc”; there is no explicit concatenation operator
add, subtract + - x + y Sum of x and y
multiply, divide, mod * / % x % y Remainder of x divided by y (fraction)
unary plus and minus + - -x Negative x
logical NOT ! !$1 1 if $1 is zero or null; 0 otherwise
exponentiation ^ x ^ y x^y
increment, decrement ++ -- ++x, x++ Add 1 to x
field $ $i + 1 Value of the ith field, plus 1
grouping ( ) ($i)++ Add 1 to the value of the ith field

Built-In Variables

Variable Meaning Default
ARGC Number of command line arguments
ARGV Array of command line arguments
FILENAME Name of current input file
FNR Record number in current file
FS Controls the input field separator one space
NF Number of fields in current record
NR Number of records read so far
OFMT Output format for numbers %.6g
OFS Output field separator one space
ORS Output record separator \n
RLENGTH Length of string matched by match function
RS Controls the input record separator \n
RSTART Start of string matched by match function
SUBSEP Subscript separator \034

Expression Metacharacters

Character Description
\ Used in an escape sequence to match a special symbol (e.g., \t matches a tab and \* matches * literally)
^ Matches the beginning of a string
$ Matches the end of a string
. Matches any single character
[ABDU] Matches either character A, B, D, or U; may include ranges like [a-e-B-R]
A|B Matches A or B
DF Matches D immediately followed by an F
R* Matches zero or more Rs
R+ Matches one or more Rs
R? Matches a null string or R
NR==10, NR==25 Matches all lines from the 10th read to the 25th read

Escape Sequence

\b Backspace
\f Form feed
\n Newline (line feed)
\r Carriage return
\t Tab
\ddd Octal value ddd, where ddd is 1 to 3 digits between 0 and 7
\c Any other character literally (e.g., \\ for backslash, \” for “, \* for *, and so on)

Comparison Operators

Operator Description
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
>= Greater than or equal to
> Greater than
~ Matched by (used when comparing strings)
!~ Not matched by (used when comparing strings)

Built-In String Functions

Variable Meaning
r Represents a regular expression
s and t Represent string expressions
n and p Integers
Function Description
gsub(r,s) Substitute s for r globally in $0; return number of substitutions made
gsub(r,s,t) Substitute s for r globally in string t; return number of substitutions made
index(s,t) Return the first position of string t in s, or 0 if t is not present
length(s) Return the number of characters in s
match(s,r) Test whether s contains a substring matched by r; return index or 0; sets RSTART and RLENGTH
split(s,a) Split s into array 'a' on FS; return the number of fields
split(s,a,fs) Split s into array 'a' on the field separator fs; return the number of fields
sprintf(fmt,expr-list) Return expr-list formatted according to the format string fmt
sub(r,s) Substitute s for the leftmost longest substring of $0 matched by r; return the number of substitutions made
sub(r,s,t) Substitute s for the leftmost longest substring of t matched by r; return the number of substitutions made
substr(s,p) Return the suffix of s starting at position p
substr(s,p,n) Return the substring of s of length n starting at position p
References:
The GNU Awk User's Guide. (n.d.). Retrieved from https://www.gnu.org/software/gawk/manual/gawk.html
Hayes, M. (n.d.). Quick Tip: Use our AWK cheat sheets to quickly and easily manipulate UNIX data. Retrieved from https://www.techrepublic.com/article/quick-tip-use-our-awk-cheat-sheets-to-quickly-and-easily-manipulate-unix-data/

Last Updated: July 11, 2022