4

My input file has three columns like the one below

Input file:

water   123   wa
water   123   at
water   123   te
water   123   er
rater   347   ra
rater   347   at
rater   347   te
rater   347   er

Now I want my output file to be like the one below, in which the frequency of bigrams is listed after them in a new column.

Output file:

water   123   wa   1
water   123   at   2
water   123   te   2
water   123   er   2
rater   347   ra   1
rater   347   at   2
rater   347   te   2
rater   347   er   2

I tried the below command, but unfortunately, I did not get the desired result:

$ awk 'BEGIN {FS="\t"} {for (i=1; i<=NF; i++) count[$3]++}
       END {for (word in count) printf "%s\t%s\t%s\t%d\n", $1, $2, word, count[word]}' \
            INPUT_FILE
Mani
  • 41
  • in fact i Need an awk program to generate the Output file. i mean first to calculate the frequency of bigrams and then writing the frequency of each in front of it in a new column. – Mani Sep 06 '14 at 09:29
  • It looks like you are trying to use for (i=1; i<=NF; i++) to look at the file one line at a time. That’s wrong; awk automatically looks at the file one line at a time, and executes (or at least considers) every statement other than the BEGIN and END statements. for (i=1; i<=NF; i++) looks at the current line one field at a time. If you just did {count[$3]++} you’d have a good start, but, when you got to the END, you would no longer have access to the $1 and $2 values. – Scott - Слава Україні Sep 06 '14 at 19:35

1 Answers1

1

One way would be to process the file twice: first time counting, second time printing:

awk 'NR==FNR {count[$3]++; next} {print $0, count[$3]}' input.file input.file

Alternately, store each line, then output them all at the end:

awk '
    {count[$3]++; line[NR]=$0} 
    END {
        for (nr=1; nr<=NR; nr++) {
            $0 = line[nr]
            print $0, count[$3]
        }
    }
' input.file
glenn jackman
  • 26,306