3

If I want to efetch two records, I can do the following:

efetch -db nuccore -id AB610939,AB610940 -format fasta

I have hundreds of records to fetch. Can I put all the IDs in a file and efetch. If so, how?

Supertech
  • 606
  • 2
  • 10

2 Answers2

3

Yes, efetch can be given a file with IDs. See efetch -h:

$ efetch -h | grep -m1 file
  -input         Read identifier(s) from file instead of stdin

So you can pass your file of IDs (e.g. ids) directly:

$ cat ids
AB610939
AB610940

$ efetch -db nuccore -input ids -format fasta >AB610939.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: 6204-a-500-1 CTCGACTAGTTAATACGGTACAGGATAACCGATCGGCTTGCAACATAACGGCGTTAAGAATGCGGGAGTG CAGTTTCCGATTCTCACATCAATCGCCAATAAGGCCTTGTCGCAATATAGACTCAACGGTTCTAGTAGCT GATCGGTATTACGTGACGCAACCGATTAGACATGCACAATTCCTTGGTCGCTATACTACGGAAATCGTCA GGTACTATAACCCGTCGCAGGCCTAATACGTGTCGTCACATCGCCAACCTATCGTCAGTCGGAAAGACGT TGCTGTCTACCATCGAAACTATTTACCGCTCCGAGATTCACGAGTACGAACTCACGAGGAAGTTGCCCTA TGTAAGGTATCACTCCAGGTACTGCGCCGATAGTACCAGGTGATCAAACGGTTGCAAGAAGGCCACGACG TATCGGGCTCTTTAGACGTACGCTCGAGATTAAACGCGCACTGATTCACTTTAGCCCGGAATGTCTCGGT GCGATGTAGA >AB610940.1 Synthetic construct RNA, spike-in microarray control RNA CRM 6204-a, clone: CRM6204-a-500-2 AGACTAAATCTCGGCGTCGGTTCATACGCGCGATCGTTTGCTGTCAGGGCATACTCGAATCCGGACTCCG ACAATTATAGGCCATCCTGAATAGCCGATCATGCGAGTCACGATAAGGCAGGCTCTGCGATATCCCGATA TACTGGAGAAGCTGAATCCCACCTAGAGCGAACTGTCAGAGGATCGACCTCAGGCTCGCTATCATCATAA CGGCGGACGACCTGTGTCACATTCCGAACGCTACGTGACGATATTATCTGTCGAAAGGCATAGAACGCCG GTCAATATCCTGCGGCATTCTCTTTATCACCGGCTATAACTACTAGGTTCCGCAGATATAGACTGCGCAC GGAACATGTAGATAGATCGAGTAGGGTAGCGATTTAACGACTCGACTTACAGACAGAGACGTAGAACGTC AGACGAGTGGTATGCCCACCAGAGGCGATACAGGCTGTACCTGCGTAGCACTAGAGTCGTGCGTCATGCG GACCCTATCT

If that were not possible, another option would be to put all your IDs as a comma separated string. If your IDs are in a file named ids, one per line, you can convert that to a comma-separated string like this:

$ tr '\n' , < ids | sed 's/,$/\n/'
AB610939,AB610940

That assumes GNU sed. If your sed doesn't handle \n, you can use this instead:

printf "$(tr '\n' , < ids | sed 's/,$//')" "$(cat ids)"

Armed with this, you can use command substitution to pass the comma-separated list to efetch:

efetch -db nuccore -id $(tr '\n' , < ids | sed 's/,$/\n/') -format fasta

If there are limits to te number of IDs you can fetch at once, or as may well be the case, if this results in a command that is too long, then you can use a loop as suggested in another answer.


However, be aware that while looping seems to work for efetch, for reasons that are not clear to me, the esearch developers have chosen to make it consume stdin. This means that the obvious, straightforward approach fails with esearch. See https://unix.stackexchange.com/q/682748/22222 for details.

terdon
  • 10,071
  • 5
  • 22
  • 48
2

If you're happy looping through each id in turn and each id in the single file (idfile.txt) is separate by a newline:

while read -r x; do
  efetch -db nuccore -id  "$x" -format fasta
done < idfile.txt > fetchout.fa 

Speculation If the ids are not separated by a newline but a comma on a single line that might give the simultaneous efetch you're wanting. I strongly suspect that each id would need to be surround in quotation marks in such a file. Again this is just speculation and the above solution is preferred IMO.

terdon
  • 10,071
  • 5
  • 22
  • 48
M__
  • 12,263
  • 5
  • 28
  • 47