--- title: "Analyzing RNA-seq data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analyzing RNA-seq data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = '#>', message = FALSE, fig.align = 'center', fig.retina = 2, eval = FALSE) foreach::registerDoSEQ() ``` ## Introduction Here we show two options for using `limorhyde2` to analyze RNA-seq data: [limma-voom](https://bioconductor.org/packages/release/bioc/html/limma.html) and [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html). The two approaches give very similar results. This vignette assumes you are starting with the output of [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html). ## Load the data You will need two objects: 1. `txi`, a list from `tximport` 2. `metadata`, a `data.frame` having one row per sample The rows in `metadata` must correspond to the columns of the elements of `txi`. ```{r} library('limorhyde2') # txi = ? # metadata = ? ``` ## Filter out lowly expressed genes There are many reasonable strategies to do this, here is one. ```{r} keep = rowSums(edgeR::cpm(txi$counts) >= 0.5) / ncol(txi$counts) >= 0.75 txiKeep = txi for (name in c('counts', 'length')) { txiKeep[[name]] = txi[[name]][keep, ]} ``` ## Fill in counts for sample-gene pairs having zero counts This avoids unrealistically low log2 CPM values and thus artificially inflated effect size estimates. ```{r} for (i in seq_len(nrow(txiKeep$counts))) { idx = txiKeep$counts[i, ] > 0 txiKeep$counts[i, !idx] = min(txiKeep$counts[i, idx])} ``` ## Option 1: limma-voom ```{r} y = edgeR::DGEList(txiKeep$counts) y = edgeR::calcNormFactors(y) fit = getModelFit(y, metadata, ..., method = 'voom') # replace '...' as appropriate for your data ``` ## Option 2: DESeq2 The second and third arguments to `DESeqDataSetFromTxImport()` are required, but will not be used by `limorhyde2`. ```{r} y = DESeq2::DESeqDataSetFromTximport(txiKeep, metadata, ~1) fit = getModelFit(y, metadata, ..., method = 'deseq2') # replace '...' as appropriate for your data ``` ## Continue using `limorhyde2` Regardless of which option you choose, the next steps are the same: `getPosteriorFit()`, `getRhythmStats()`, etc.