efetch multiple records at once

Question

If I want to efetch two records, I can do the following:

efetch -db nuccore -id AB610939,AB610940 -format fasta

I have hundreds of records to fetch. Can I put all the IDs in a file and efetch. If so, how?

score 3 · Answer 1 · answered Sep 23 '23 at 15:01

Yes, efetch can be given a file with IDs. See efetch -h:

$ efetch -h | grep -m1 file
  -input         Read identifier(s) from file instead of stdin

So you can pass your file of IDs (e.g. ids) directly:

$ cat ids
AB610939
AB610940
$ efetch -db nuccore -input ids -format fasta
>AB610939.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: 6204-a-500-1
CTCGACTAGTTAATACGGTACAGGATAACCGATCGGCTTGCAACATAACGGCGTTAAGAATGCGGGAGTG
CAGTTTCCGATTCTCACATCAATCGCCAATAAGGCCTTGTCGCAATATAGACTCAACGGTTCTAGTAGCT
GATCGGTATTACGTGACGCAACCGATTAGACATGCACAATTCCTTGGTCGCTATACTACGGAAATCGTCA
GGTACTATAACCCGTCGCAGGCCTAATACGTGTCGTCACATCGCCAACCTATCGTCAGTCGGAAAGACGT
TGCTGTCTACCATCGAAACTATTTACCGCTCCGAGATTCACGAGTACGAACTCACGAGGAAGTTGCCCTA
TGTAAGGTATCACTCCAGGTACTGCGCCGATAGTACCAGGTGATCAAACGGTTGCAAGAAGGCCACGACG
TATCGGGCTCTTTAGACGTACGCTCGAGATTAAACGCGCACTGATTCACTTTAGCCCGGAATGTCTCGGT
GCGATGTAGA
>AB610940.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: CRM6204-a-500-2
AGACTAAATCTCGGCGTCGGTTCATACGCGCGATCGTTTGCTGTCAGGGCATACTCGAATCCGGACTCCG
ACAATTATAGGCCATCCTGAATAGCCGATCATGCGAGTCACGATAAGGCAGGCTCTGCGATATCCCGATA
TACTGGAGAAGCTGAATCCCACCTAGAGCGAACTGTCAGAGGATCGACCTCAGGCTCGCTATCATCATAA
CGGCGGACGACCTGTGTCACATTCCGAACGCTACGTGACGATATTATCTGTCGAAAGGCATAGAACGCCG
GTCAATATCCTGCGGCATTCTCTTTATCACCGGCTATAACTACTAGGTTCCGCAGATATAGACTGCGCAC
GGAACATGTAGATAGATCGAGTAGGGTAGCGATTTAACGACTCGACTTACAGACAGAGACGTAGAACGTC
AGACGAGTGGTATGCCCACCAGAGGCGATACAGGCTGTACCTGCGTAGCACTAGAGTCGTGCGTCATGCG
GACCCTATCT

If that were not possible, another option would be to put all your IDs as a comma separated string. If your IDs are in a file named ids, one per line, you can convert that to a comma-separated string like this:

$ tr '\n' , < ids | sed 's/,$/\n/'
AB610939,AB610940

That assumes GNU sed. If your sed doesn't handle \n, you can use this instead:

printf "$(tr '\n' , < ids | sed 's/,$//')" "$(cat ids)"

Armed with this, you can use command substitution to pass the comma-separated list to efetch:

efetch -db nuccore -id $(tr '\n' , < ids | sed 's/,$/\n/') -format fasta

If there are limits to te number of IDs you can fetch at once, or as may well be the case, if this results in a command that is too long, then you can use a loop as suggested in another answer.

However, be aware that while looping seems to work for efetch, for reasons that are not clear to me, the esearch developers have chosen to make it consume stdin. This means that the obvious, straightforward approach fails with esearch. See https://unix.stackexchange.com/q/682748/22222 for details.

score 2 · Answer 2 · edited Sep 23 '23 at 19:10

2

If you're happy looping through each id in turn and each id in the single file (idfile.txt) is separate by a newline:

while read -r x; do
  efetch -db nuccore -id  "$x" -format fasta
done < idfile.txt > fetchout.fa

Speculation If the ids are not separated by a newline but a comma on a single line that might give the simultaneous efetch you're wanting. I strongly suspect that each id would need to be surround in quotation marks in such a file. Again this is just speculation and the above solution is preferred IMO.

edited Sep 23 '23 at 19:10

terdon

10,071
5
22
48

answered Sep 23 '23 at 04:06

M__

12,263
5
28
47

1

Sorry, my bad. That issue seems to be specific to esearch, I cannot reproduce it with efetch. I deleted my comment. – terdon Sep 23 '23 at 14:54
@terdon world-first from the BioInfo 'nix tools police ;-) – M__ Sep 23 '23 at 16:46

efetch multiple records at once

2 Answers2