$ echo ABC | awk '$0 ~ /^[a-b]/'
ABC
$ echo ABC | awk '$0 ~ /^[a-a]/'
$ echo ABC | awk '$0 ~ /^a/'
$
You see. /[a-b]/ captures A, but /[a-a]/ or /a/ doesn't. Why?
$ echo ABC | awk '$0 ~ /^[a-b]/'
ABC
$ echo ABC | awk '$0 ~ /^[a-a]/'
$ echo ABC | awk '$0 ~ /^a/'
$
You see. /[a-b]/ captures A, but /[a-a]/ or /a/ doesn't. Why?
It is a "locale" problem, I think.
In my locale, it_IT, the following snippet
if [[ a < A ]]; then
echo "a < A"
elif [[ a > A ]]; then
echo "a > A"
else
echo "a = A"
fi
if [[ b < A ]]; then
echo "b < A"
elif [[ b > A ]]; then
echo "b > A"
else
echo "b = A"
fi
shows
a < A
b > A
so that A is (surprisingly) between a and b, so in the range.
Try executing
echo ABC | LC_COLLATE=C awk '$0 ~ /^[a-b]/'
Edit
the following command shows the collating order in your locale:
echo $(LC_COLLATE=C printf '%s\n' {A..z} | sort)
the output on my machine is
` ^ _ [ ] a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z
(cannot understand from bash's manual page if sequence expressions are expanded in locale collating order or not; it seems not).
sort, join or the like, I start my scripts with export LC_COLLATE=C. Now I have to start this way also scripts using awk :)
– enzotib
Aug 24 '11 at 18:44
LC_COLLATE=C with your printf command in the edit?
– rozcietrzewiacz
Oct 24 '11 at 06:23
printf interpret the sequence {A..z} in a way independent of the particular locale (as the sentence following explains in some way: "cannot understand from bash's manual page if sequence expressions are expanded in locale collating order or not; it seems not".
– enzotib
Oct 24 '11 at 07:46
LC_ALL was set in the environment, then changing LC_COLLATE alone would have no effect.
– rozcietrzewiacz
Oct 24 '11 at 09:03