I have a large input file which contains 30M lines, new lines in \r\n. I decided to do something silly and compare the speed of counting all lines via read -r compared to xargs (stripping the \r first, because xargs does not seem to be able to split on multiple characters). Here are my two commands:
time tr -d '\r' < input.txt | xargs -P 1 -d '\n' -I {} echo "{}" | wc -l
time while read -r p || [ -n "$p" ]; do echo "$p"; done < input.txt | wc -l
Here, the second solution is much faster. Why is that?
Please note that I know that this is not a proper way to count lines of a file. This question is merely out of interest of my observation.
xargsspawning a/usr/bin/echoprocess for each line, while the second command is probably using the bashechodirective, instead of a process. I'll check that. – Frazier Thien Jul 21 '21 at 13:52tr. And how are you measuring the time here? Timing a pipe is complicated. – terdon Jul 21 '21 at 14:03trshould be minimal. Regarding timing, I just writetimein front of both commands, so including the overhead oftrindeed. – Frazier Thien Jul 21 '21 at 14:06tr,xargsand/bin/echoon each invocation while the shell one doesn't call any external tools at all. That will likely explain it, but I'm not sure. Can you also show the actual results you get? It's hard to understand what "much faster" means without them. – terdon Jul 21 '21 at 14:15/usr/bin/echofor thereadand it does also seem to take ages now. I will added these measurements later on. – Frazier Thien Jul 21 '21 at 14:17trin the first version is not doing anything -- it is waiting for input from the terminal. Thexargsis reading from the redirection<, not the pipe|. – Paul_Pedant Jul 21 '21 at 17:27