I have recently come across an awk seen option. I can see it is removing duplicates in files. I could use some clarification on how it works.
cat tes
1
2
3
1
1
1
3
4
with awk seen output
cat tes | awk '!seen[$0]++'
1
2
3
4
I have recently come across an awk seen option. I can see it is removing duplicates in files. I could use some clarification on how it works.
cat tes
1
2
3
1
1
1
3
4
with awk seen output
cat tes | awk '!seen[$0]++'
1
2
3
4
seen is the arbitrary name of an associative array. It is not an option of any kind. You could use a or b or most other names in its place.
The code !seen[$0]++ consists of a test and an increment.
If seen[$0], i.e. the value of the array element associated with the key $0, the current line of input, is zero (or empty), then the boolean value of !seen[$0] is true.
The value in the array corresponding to the key $0 is then incremented, which means that the test will be false all other times that the same value of $0 is found.
The effect is that the test is true the first time a particular line is seen in the input, and false all other times.
Whenever the a test with no associated action is true, the default action is triggered. The default action is the equivalent of { print } or { print $0 }, which prints the current record, which for all accounts and purposes in this example is the current unmodified line of input.