What RNA-Seq expression value would be closest to Microarray equivalent?

Question

I know this question may seem strange.

I'm using Spearman correlation between gene expression profiles for various reasons (I won't go into details here). As a result, I often compare RNA-Seq and Microarray samples. For preliminary analysis, I usually grab what version of the data is easily accessible (RPKM, FPKM,...), but I'd like to dig a bit more.

Intuitively, I'd think a value similar to RPKM or such would make more sense than raw counts, which is why I usually convert raw counts to RPKM (I know they're considered obsolete for statistical analysis of the data, what I'm doing relies on the correlation)

So I ask what RNA-Seq gene expression value would be closest to a microarray equivalent ?

i.e. What would (theoretically) maximize correlation between gene expression of the same sample profiled with RNA-seq and microarray. RPKM ? FPKM ? TPM ? etc.

“I know they're considered obsolete for statistical analysis of the data” — No, they’re considered obsolete for every purpose. Just don’t use them. Read https://bioinformatics.stackexchange.com/a/69/29. — Konrad Rudolph, Apr 08 '20 at 17:54

score 2 · Answer 1 · answered Apr 09 '20 at 14:32

I think it is very hard to say which are the closest because they are not really comparable. But since you are using Spearman correlation, I guess RPKM, FPKM, and TPM do not change the order of gene expression levels. You might also want to normalize RNA-seq and microarray data so that they are more comparable.

score 2 · Answer 2 · answered Apr 12 '20 at 08:17

I did a comparison of cDNA count data against microarray data that was published a few years ago:

For comparisons to published data (Fig. S2; Miller et al., 2012), a generalized linear model was fitted to the relationship between log-transformed microarray and VSTPk expression levels obtained from the ImmGen Project database, and was used to transform the microarray data into values comparable to our VSTPks.

I found that the Variance-Stabilizing Transformation that was carried out by DESeq2 was close to what I wanted, but there still seemed to be a length-based bias to the reads. I corrected this by dividing by the length of the longest gene isoform in kilobases (creating something that I called VSTPk).

After doing this, there were range differences between the microarray and cDNA data, so I did an additional linear transformation to get the data fitting as close as possible.

What RNA-Seq expression value would be closest to Microarray equivalent?

2 Answers2