This should do the trick:
awk -F '[<>]' '
NR!=1 && FNR==1{printf "\n"}
FNR==1{sub(".*/", "", FILENAME); sub(".xml$", "", FILENAME); printf FILENAME}
/double/{printf " %s", $3}
END{printf "\n"}
' $path_to_xml/*.xml > final_table.csv
Explanation:
awk: use the program awk, I tested it with GNU awk 4.0.1
-F '[<>]': use < and > as field separators
NR!=1 && FNR==1{printf "\n"}: if it is not the first line overall (NR!=1) but the first line of a file (FNR==1) print a newline
FNR==1{sub(".*/", "", FILENAME); sub(".xml$", "", FILENAME); printf FILENAME}: if it is the first line of a file, strip away anything up to the last / (sub(".*/", "", FILENAME)) in the name of the file (FILENAME), strip a trailing .xml (sub(".xml$", "", FILENAME)) and print the result (printf FILENAME)
/double/{printf " %s", $3} if a line contains "double" (/double/), print a space followed by the third field (printf " %s", $3). Using < and > as separators this would be the the number (with the first field being anything before the first < and the second field being double). If you want, you can format the numbers here. For example by using %8.3f instead of %s any number will be printed with 3 decimal places and an overall length (including dot and decimal places) of at least 8.
- END{printf "\n"}: after the last line print an additional newline (this could be optional)
$path_to_xml/*.xml: the list of files
> final_table.csv: put the result into final_table.csv by redirecting the output
In the case of "argument list to long" errors, you can use find with parameter -exec to generate a file list instead of passing it directly:
find $path_to_xml -maxdepth 1 -type f -name '*.xml' -exec awk -F '[<>]' '
NR!=1 && FNR==1{printf "\n"}
FNR==1{sub(".*/", "", FILENAME); sub(".xml$", "", FILENAME); printf FILENAME}
/double/{printf " %s", $3}
END{printf "\n"}
' {} + > final_table.csv
Explanation:
find $path_to_xml: tell find to list files in $path_to_xml
-maxdepth 1: do not descend into subfolders of $path_to_xml
-type f: only list regular files (this also excludes $path_to_xml itself)
-name '*.xml': only list files that match the pattern*.xml`, this needs to be quoted else the shell will try to expand the pattern
-exec COMMAND {} +: run the command COMMAND with the matching files as parameters in place of {}. + indicates that multiple files may be passed at once, which reduces forking. If you use \; (; needs to be quoted else it is interpreted by the shell) instead of + the command is run for each file separately.
You can also use xargs in conjunction with find:
find $path_to_xml -maxdepth 1 -type f -name '*.xml' -print0 |
xargs -0 awk -F '[<>]' '
NR!=1 && FNR==1{printf "\n"}
FNR==1{sub(".*/", "", FILENAME); sub(".xml$", "", FILENAME); printf FILENAME}
/double/{printf " %s", $3}
END{printf "\n"}
' > final_table.csv
Explanation
-print0: output list of files separated by null characters
| (pipe): redirects standard output of find to the standard input of xargs
xargs: builds and runs commands from standard input, i.e. run a command for each argument (here file names) passed.
-0: direct xargs to assume arguments are separated by null characters
awk -F '[<>]' '
BEGINFILE {sub(".*/", "", FILENAME); sub(".xml$", "", FILENAME); printf FILENAME}
/double/{printf " %s", $3}
ENDFILE {printf "\n"}
' $path_to_xml/*.xml > final_table.csv
where BEGINFILE, ENDFILE are called when changing file (if your awk supports it).
too many argsthere's alwaysxargs. Or you could export function then usefind...-exec func{} +...for huge dirs. I do recall a limit of 32768 dirs on some bsd distro that screwed with some command combo. You could skipsedand usetronly to remove tag if element contents are just[0-9.]as element is just<double></double>,trcan do all that. Do you need tocator can redirect straight togrep. Maybe can cut down on pipes and use more redirection. – gwillie Jul 23 '15 at 00:56