Analyzing RNA-seq data

Introduction

Here we show two options for using limorhyde2 to analyze RNA-seq data: limma-voom and DESeq2. The two approaches give very similar results.

This vignette assumes you are starting with the output of tximport.

Load the data

You will need two objects:

txi, a list from tximport
metadata, a data.frame having one row per sample

The rows in metadata must correspond to the columns of the elements of txi.

library('limorhyde2')
# txi = ?
# metadata = ?

Filter out lowly expressed genes

There are many reasonable strategies to do this, here is one.

keep = rowSums(edgeR::cpm(txi$counts) >= 0.5) / ncol(txi$counts) >= 0.75

txiKeep = txi
for (name in c('counts', 'length')) {
  txiKeep[[name]] = txi[[name]][keep, ]}

Fill in counts for sample-gene pairs having zero counts

This avoids unrealistically low log2 CPM values and thus artificially inflated effect size estimates.

for (i in seq_len(nrow(txiKeep$counts))) {
  idx = txiKeep$counts[i, ] > 0
  txiKeep$counts[i, !idx] = min(txiKeep$counts[i, idx])}

Option 1: limma-voom

y = edgeR::DGEList(txiKeep$counts)
y = edgeR::calcNormFactors(y)

fit = getModelFit(y, metadata, ..., method = 'voom') # replace '...' as appropriate for your data

Option 2: DESeq2

The second and third arguments to DESeqDataSetFromTxImport() are required, but will not be used by limorhyde2.

y = DESeq2::DESeqDataSetFromTximport(txiKeep, metadata, ~1)

fit = getModelFit(y, metadata, ..., method = 'deseq2') # replace '...' as appropriate for your data

Continue using `limorhyde2`

Regardless of which option you choose, the next steps are the same: getPosteriorFit(), getRhythmStats(), etc.