awk

AWK is a powerful programming language for text processing. It was created at Bell Labs in the 70s. The name AWK comes from the surnames of its three authors: Alfred Aho, Peter Weinberger, and Brian Kernighan.

SYNTAX: awk 'pattern { action }' [file]

Examples

Print specific columns from a file

awk '{print $1, $3}' filename

Extract and print specific fields from csv

awk -F',' '{print "Name: " $1, "Salary: " $3}' data.csv

Print lines between two patterns

awk '/start_pattern/,/end_pattern/' filename

Reverse the order of columns, use comma (,) as the field separator

awk -F',' '{for(i=NF;i>=1;i--) printf $i" "; print ""}' filename

Convert spaces to tabs in a file

awk '{gsub(/    /,"\t"); print}' input.txt > output.txt

Display lines with more than 3 fields

awk 'NF > 3' filename

Calculate and print total size of files in a directory

ls -l /path/to/directory | awk '{total += $5} END {print "Total Size: ", total/1024, "KB"}'

Print lines where the 3rd column is greater than 50

awk '$3 > 50 {print $0}' filename

Print lines where the 4th column is blank

 awk -F',' '$4 == ""' filename

Count the number of lines where the 4th column is blank

awk -F',' '$4 == ""' filename | wc -l

Print lines that are duplicates based on the 4th column in a CSV

awk -F ',' '{if (++seen[$4] == 2) print}' filename

Count lines that are duplicates based on the 4th column in a CSV

awk -F ',' '{if (++seen[$4] == 2) print}' filename | wc -l

Count the number of rows in a file

awk 'END {print NR}' filename

Print lines matching a pattern

awk '/pattern/ {print $0}' filename

Print unique values in a column

awk '{print $1}' filename | sort | uniq

Show unique lines (without duplicates)

awk '!seen[$0]++' filename.csv

Identify duplicate lines

awk 'seen[$0]++' filename.csv

Replace text in a file

awk '{gsub(/old_text/, "new_text"); print}' filename

Format output

ps aux | awk '{printf "%-10s %-10s %-20s\n", $1, $2, $11}'

Extract information based on delimiter

cat /etc/passwd | awk -F: '{print "Username: " $1, "UID: " $3, "Shell: " $NF}'

Extract and sum numeric values in a column

awk '{if ($2 ~ /^[0-9]+$/) sum += $2} END {print "Sum: ", sum}' filename

Process files in some_directory, get first column, remove double quotes, sort, get unique, save to clean.csv

awk -F',' '{gsub(/"/, "", $1); print $1}' some_directory/* | sort | uniq > clean.csv

Identify unique lines in one file not present in another

awk 'FNR==NR {seen[$0]=1; next} !seen[$0]' product_test.csv ledger_test.csv > not_found_test.txt

Identify common lines in two files

awk 'FNR==NR {seen[$0]=1; next} seen[$0]' product_test.csv ledger_test.csv > found_test.txt

Count empty or whitespace lines in csv

awk -F',' '$4 ~ /^ *$/ {count++} END {print count}' "test.csv"

Extract and sort fourth column, save to new file

awk -F',' '{print $4}' filename.csv | sort > new_file.txt

Extract, sort, and get unique values from fourth column

awk -F',' '{print $4}' filename.csv | sort | uniq > uniques.txt

Extract, sort, and get duplicate values from fourth column

awk -F',' '{print $4}' filename.csv | sort | uniq -d> duplicates.txt

Count blank values in the fourth column

awk -F',' '$4 ~ /^ *$/ {count++} END {print count}' filename.csv

Sum of the nth column

awk '{ sum += $n } END { print sum }' data.txt

Maximum value of the nth column

awk 'NR == 1 { max = $n } { if ($n > max) max = $n } END { print max }' data.txt

Minimum value of the nth column

awk 'NR == 1 { min = $n } { if ($n < min) min = $n } END { print min }' data.txt

Average of the nth column

awk '{ sum += $n } END { print sum / NR }' data.txt

Expression Operators

Operation	Operators	Example	Meaning
assignment	= += -= *= /= %= ^=	x = x * 2	x = x * 2
conditional	?:	x ? y : z	If x is true, then y; else z
logical OR	\|\|	x \|\| y	1 if x or y is true; 0 otherwise
logical AND	&&	x && y	1 if x and y are true; 0 otherwise
array membership	in	i in a	1 if a[i] exists; 0 otherwise
matching	~ !~	$1 ~ /x/	1 if the first field contains an x; 0 otherwise
relational	< < = > >= == !=	x == y	1 if x equals y; 0 otherwise
concatenation	–	“a” “bc”	“abc”; there is no explicit concatenation operator
add, subtract	+ -	x + y	Sum of x and y
multiply, divide, mod	* / %	x % y	Remainder of x divided by y (fraction)
unary plus and minus	+ -	-x	Negative x
logical NOT	!	!$1	1 if $1 is zero or null; 0 otherwise
exponentiation	^	x ^ y	x^y
increment, decrement	++ --	++x, x++	Add 1 to x
field	$	$i + 1	Value of the ith field, plus 1
grouping	( )	($i)++	Add 1 to the value of the ith field

Built-In Variables

Variable	Meaning	Default
ARGC	Number of command line arguments	–
ARGV	Array of command line arguments	–
FILENAME	Name of current input file	–
FNR	Record number in current file	–
FS	Controls the input field separator	one space
NF	Number of fields in current record	–
NR	Number of records read so far	–
OFMT	Output format for numbers	%.6g
OFS	Output field separator	one space
ORS	Output record separator	\n
RLENGTH	Length of string matched by match function	–
RS	Controls the input record separator	\n
RSTART	Start of string matched by match function	–
SUBSEP	Subscript separator	\034

Expression Metacharacters

Character	Description
\	Used in an escape sequence to match a special symbol (e.g., \t matches a tab and \* matches * literally)
^	Matches the beginning of a string
$	Matches the end of a string
.	Matches any single character
[ABDU]	Matches either character A, B, D, or U; may include ranges like [a-e-B-R]
A\|B	Matches A or B
DF	Matches D immediately followed by an F
R*	Matches zero or more Rs
R+	Matches one or more Rs
R?	Matches a null string or R
NR==10, NR==25	Matches all lines from the 10th read to the 25th read

Escape Sequence

\b	Backspace
\f	Form feed
\n	Newline (line feed)
\r	Carriage return
\t	Tab
\ddd	Octal value ddd, where ddd is 1 to 3 digits between 0 and 7
\c	Any other character literally (e.g., \\ for backslash, \” for “, \* for *, and so on)

Comparison Operators

Operator	Description
<	Less than
<=	Less than or equal to
==	Equal to
!=	Not equal to
>=	Greater than or equal to
>	Greater than
~	Matched by (used when comparing strings)
!~	Not matched by (used when comparing strings)

Built-In String Functions

Variable	Meaning
r	Represents a regular expression
s and t	Represent string expressions
n and p	Integers

Function	Description
gsub(r,s)	Substitute s for r globally in $0; return number of substitutions made
gsub(r,s,t)	Substitute s for r globally in string t; return number of substitutions made
index(s,t)	Return the first position of string t in s, or 0 if t is not present
length(s)	Return the number of characters in s
match(s,r)	Test whether s contains a substring matched by r; return index or 0; sets RSTART and RLENGTH
split(s,a)	Split s into array 'a' on FS; return the number of fields
split(s,a,fs)	Split s into array 'a' on the field separator fs; return the number of fields
sprintf(fmt,expr-list)	Return expr-list formatted according to the format string fmt
sub(r,s)	Substitute s for the leftmost longest substring of $0 matched by r; return the number of substitutions made
sub(r,s,t)	Substitute s for the leftmost longest substring of t matched by r; return the number of substitutions made
substr(s,p)	Return the suffix of s starting at position p
substr(s,p,n)	Return the substring of s of length n starting at position p

References:
The GNU Awk User's Guide. (n.d.). Retrieved from https://www.gnu.org/software/gawk/manual/gawk.html
Hayes, M. (n.d.). Quick Tip: Use our AWK cheat sheets to quickly and easily manipulate UNIX data. Retrieved from https://www.techrepublic.com/article/quick-tip-use-our-awk-cheat-sheets-to-quickly-and-easily-manipulate-unix-data/

Last Updated: July 11, 2022