If I want to efetch two records, I can do the following:
efetch -db nuccore -id AB610939,AB610940 -format fasta
I have hundreds of records to fetch. Can I put all the IDs in a file and efetch. If so, how?
Yes, efetch can be given a file with IDs. See efetch -h:
$ efetch -h | grep -m1 file
-input Read identifier(s) from file instead of stdin
So you can pass your file of IDs (e.g. ids) directly:
$ cat ids
AB610939
AB610940
$ efetch -db nuccore -input ids -format fasta
>AB610939.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: 6204-a-500-1
CTCGACTAGTTAATACGGTACAGGATAACCGATCGGCTTGCAACATAACGGCGTTAAGAATGCGGGAGTG
CAGTTTCCGATTCTCACATCAATCGCCAATAAGGCCTTGTCGCAATATAGACTCAACGGTTCTAGTAGCT
GATCGGTATTACGTGACGCAACCGATTAGACATGCACAATTCCTTGGTCGCTATACTACGGAAATCGTCA
GGTACTATAACCCGTCGCAGGCCTAATACGTGTCGTCACATCGCCAACCTATCGTCAGTCGGAAAGACGT
TGCTGTCTACCATCGAAACTATTTACCGCTCCGAGATTCACGAGTACGAACTCACGAGGAAGTTGCCCTA
TGTAAGGTATCACTCCAGGTACTGCGCCGATAGTACCAGGTGATCAAACGGTTGCAAGAAGGCCACGACG
TATCGGGCTCTTTAGACGTACGCTCGAGATTAAACGCGCACTGATTCACTTTAGCCCGGAATGTCTCGGT
GCGATGTAGA
>AB610940.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: CRM6204-a-500-2
AGACTAAATCTCGGCGTCGGTTCATACGCGCGATCGTTTGCTGTCAGGGCATACTCGAATCCGGACTCCG
ACAATTATAGGCCATCCTGAATAGCCGATCATGCGAGTCACGATAAGGCAGGCTCTGCGATATCCCGATA
TACTGGAGAAGCTGAATCCCACCTAGAGCGAACTGTCAGAGGATCGACCTCAGGCTCGCTATCATCATAA
CGGCGGACGACCTGTGTCACATTCCGAACGCTACGTGACGATATTATCTGTCGAAAGGCATAGAACGCCG
GTCAATATCCTGCGGCATTCTCTTTATCACCGGCTATAACTACTAGGTTCCGCAGATATAGACTGCGCAC
GGAACATGTAGATAGATCGAGTAGGGTAGCGATTTAACGACTCGACTTACAGACAGAGACGTAGAACGTC
AGACGAGTGGTATGCCCACCAGAGGCGATACAGGCTGTACCTGCGTAGCACTAGAGTCGTGCGTCATGCG
GACCCTATCT
If that were not possible, another option would be to put all your IDs as a comma separated string. If your IDs are in a file named ids, one per line, you can convert that to a comma-separated string like this:
$ tr '\n' , < ids | sed 's/,$/\n/'
AB610939,AB610940
That assumes GNU sed. If your sed doesn't handle \n, you can use this instead:
printf "$(tr '\n' , < ids | sed 's/,$//')" "$(cat ids)"
Armed with this, you can use command substitution to pass the comma-separated list to efetch:
efetch -db nuccore -id $(tr '\n' , < ids | sed 's/,$/\n/') -format fasta
If there are limits to te number of IDs you can fetch at once, or as may well be the case, if this results in a command that is too long, then you can use a loop as suggested in another answer.
However, be aware that while looping seems to work for efetch, for reasons that are not clear to me, the esearch developers have chosen to make it consume stdin. This means that the obvious, straightforward approach fails with esearch. See https://unix.stackexchange.com/q/682748/22222 for details.
If you're happy looping through each id in turn and each id in the single file (idfile.txt) is separate by a newline:
while read -r x; do
efetch -db nuccore -id "$x" -format fasta
done < idfile.txt > fetchout.fa
Speculation If the ids are not separated by a newline but a comma on a single line that might give the simultaneous efetch you're wanting. I strongly suspect that each id would need to be surround in quotation marks in such a file. Again this is just speculation and the above solution is preferred IMO.
esearch, I cannot reproduce it withefetch. I deleted my comment. – terdon Sep 23 '23 at 14:54