I have one file with multiple SRA accession called Dataframe_with_accession.txt, for this example I just put one SRA in that file called :SRR8933535
And the idea is to create a nextflow pipeline to download the sra files in the Dataframe_with_accession.txt, zip the files using pigz and remove the previous fastq file
To do that I used this code :
params.inputFile = "/crex/proj/Dataframe_with_accession.txt"
params.outputDir = "/crex/proj/Output"
process DOWNLOAD_FASTQ {
publishDir params.outputDir, mode: 'symlink'
input:
each (accession)
output:
tuple path("params.outputDir/<span class="math-container">${accession}_1.fastq.gz")
path("params.outputDir/$</span>{accession}_2.fastq.gz")
script:
"""
fasterq-dump --threads 12 --outdir <span class="math-container">${params.outputDir} $</span>{accession}
pigz -p12 <span class="math-container">${params.outputDir}/$</span>{accession}_1.fastq
pigz -p12 <span class="math-container">${params.outputDir}/$</span>{accession}_2.fastq
"""
}
workflow {
// Read accessions from the input file
accList = file(params.inputFile).readLines()
DOWNLOAD_FASTQ(accList)
}
Then I run this nextflow file.
But I got the following error message :
[01/298f4b] process > DOWNLOAD_FASTQ (1) [100%] 1 of 1, failed: 1 ✘
Error executing process > 'DOWNLOAD_FASTQ (1)'
Caused by:
Missing output file(s) SRR8933535_1.fastq expected by process DOWNLOAD_FASTQ (1)
Command executed:
fasterq-dump --threads 12 --outdir /crex/proj/Output SRR8933535
Command exit status:
0
Command output:
(empty)
Command error:
spots read : 8,389,148
reads read : 16,778,296
reads written : 16,778,296
Work dir:
/crex/proj/Output/work/01/298f4be6dde2ee1856df31db169bec
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
And I do not undestrand what is going wrong here ? It says
Command output:
(empty)
But the file is created