My input file has three columns like the one below
Input file:
water 123 wa
water 123 at
water 123 te
water 123 er
rater 347 ra
rater 347 at
rater 347 te
rater 347 er
Now I want my output file to be like the one below, in which the frequency of bigrams is listed after them in a new column.
Output file:
water 123 wa 1
water 123 at 2
water 123 te 2
water 123 er 2
rater 347 ra 1
rater 347 at 2
rater 347 te 2
rater 347 er 2
I tried the below command, but unfortunately, I did not get the desired result:
$ awk 'BEGIN {FS="\t"} {for (i=1; i<=NF; i++) count[$3]++}
END {for (word in count) printf "%s\t%s\t%s\t%d\n", $1, $2, word, count[word]}' \
INPUT_FILE
for (i=1; i<=NF; i++)to look at the file one line at a time. That’s wrong;awkautomatically looks at the file one line at a time, and executes (or at least considers) every statement other than the BEGIN and END statements.for (i=1; i<=NF; i++)looks at the current line one field at a time. If you just did{count[$3]++}you’d have a good start, but, when you got to the END, you would no longer have access to the$1and$2values. – Scott - Слава Україні Sep 06 '14 at 19:35