My terminal is set to UTF-8. If I type this:
echo -n 'ýá' | xxd
I can see this output:
00000000: c3bd c3a1
Which is fine. Now I would like to remove the 'ý' character from the string, so I use:
echo -n 'ýá' | tr -d 'ý' | xxd
But the result will be:
00000000: a1
The tr removes also the next c3 byte, but that is part of the 'á' character. Why is working this way? Is this a bug? Or should I set something?
trworks on the byte level and the UTF-8 characters you show are represented by 2 bytes and one of those bytes is common in both UTF-8 characters. I think it is not a bug, but the way this primitive tool works. (You can test with some characters that are represented by single bytes and see what happens when some of them are represented twice in the pattern.) – sudodus Mar 23 '20 at 14:16tris not prepared for UTF-8 ? – user3719454 Mar 23 '20 at 14:17