3

I have one file with multiple SRA accession called Dataframe_with_accession.txt, for this example I just put one SRA in that file called :SRR8933535

And the idea is to create a nextflow pipeline to download the sra files in the Dataframe_with_accession.txt, zip the files using pigz and remove the previous fastq file

To do that I used this code :

params.inputFile = "/crex/proj/Dataframe_with_accession.txt"
params.outputDir = "/crex/proj/Output"

process DOWNLOAD_FASTQ {

publishDir params.outputDir, mode: 'symlink'

input:
each (accession)

output:
tuple path(&quot;params.outputDir/<span class="math-container">${accession}_1.fastq.gz")
path("params.outputDir/$</span>{accession}_2.fastq.gz&quot;)

script:
&quot;&quot;&quot;
fasterq-dump --threads 12 --outdir <span class="math-container">${params.outputDir}  $</span>{accession}
pigz -p12 <span class="math-container">${params.outputDir}/$</span>{accession}_1.fastq
pigz -p12 <span class="math-container">${params.outputDir}/$</span>{accession}_2.fastq
&quot;&quot;&quot;

}

workflow { // Read accessions from the input file accList = file(params.inputFile).readLines()

DOWNLOAD_FASTQ(accList)

}

Then I run this nextflow file.

But I got the following error message :

[01/298f4b] process > DOWNLOAD_FASTQ (1) [100%] 1 of 1, failed: 1 ✘
Error executing process > 'DOWNLOAD_FASTQ (1)'

Caused by: Missing output file(s) SRR8933535_1.fastq expected by process DOWNLOAD_FASTQ (1)

Command executed:

fasterq-dump --threads 12 --outdir /crex/proj/Output SRR8933535

Command exit status: 0

Command output: (empty)

Command error: spots read : 8,389,148 reads read : 16,778,296 reads written : 16,778,296

Work dir: /crex/proj/Output/work/01/298f4be6dde2ee1856df31db169bec

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

And I do not undestrand what is going wrong here ? It says

Command output:
  (empty)

But the file is created

Grendel
  • 155
  • 3

1 Answers1

2

Each nextflow process looks in its working directory for the files listed in output: . In your script: you already direct the output to its eventual destination. After running the script nextflow tries to find the output files and stops, because it can't find them. Solution: keep the output files in your workdir, nextflow will detect them and use the publishDir directive to copy them to their eventual destination.

Pallie
  • 697
  • 5
  • 11