My RNAseq analysis pipeline is as follows: fastqc (read quality is good, some overrepresentation of adaptor sequence) → trimmomatic (trimmed adaptor sequence, qc report after trimming suggests the overrepresented adaptors are gone) → HISAT2 (used a forward stranded alignment protocol) → featurecounts
My HISAT 2 command line:
hisat2 --rna-strandness F -x HISAT2/grch38/genome -U "$fastq_file" | samtools sort -o "HISAT2/${sample_name}_aligned.bam"
My featurecounts command line:
featureCounts -s 2 -a hg38/gencode.v44.chr_patch_hapl_scaff.annotation.gtf -o "quant/${sample_name}_featureCounts.txt" "$bam_file"
However, the strange thing is that if I use a forward stranded protocol in feature counts (with the forward stranded alignment I generated with HISAT2), the assignment rate is 5%, with the majority of unassigned reads being having no features. But if I use a reverse stranded protocol in feature counts, the assignment rate is 70%. Does this suggest my data is reverse stranded?
My library was prepared by the NEB directional RNA library prep kit that claims to produce forward stranded libraries, so I’m really confused by this result. How should I proceed with this data? Should I run an alignment with a reverse strand protocol and just proceed with a reverse strand assignment? Or should I proceed with the forward alignment and forward strand assignment data because that’s what the library is supposed to be?
Thank you so much for your help!
nfcore/rnaseqnextflowpipeline infers strandedness automatically and feeds in the "right" parameters to the tools used in the pipeline so you do not have to worry about this. You can even find the inferred strand info int he output directory of thenextflowrun. And if you want to check your reads, you can give a try withSalmonorRSeqQC. – haci Oct 20 '23 at 07:14