--- title: "Quantifying rhythmicity in one condition" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quantifying rhythmicity in one condition} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = '#>', message = FALSE, fig.align = 'center', fig.retina = 2) ``` ## Introduction Here we show how to use `limorhyde2` to quantify rhythmicity in data from one condition. The data are based on mouse liver samples from the [circadian gene expression atlas in mammals](https://doi.org/10.1073/pnas.1408886111) ([GSE54650](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54650)). ## Load packages ```{r load_packages} library('data.table') library('ggplot2') library('limorhyde2') library('qs') # doParallel::registerDoParallel() # register a parallel backend to minimize runtime theme_set(theme_bw()) ``` ## Load the data The expression data are in a matrix with one row per gene and one column per sample. The metadata are in a table with one row per sample. To save time and space, the expression data include only a subset of genes. ```{r load_data} y = GSE54650$y y[1:5, 1:5] metadata = GSE54650$metadata metadata ``` ## Fit linear models The first step is to fit a series of linear models based on periodic splines for each genomic feature, in this case each gene, using [limma](https://doi.org/doi:10.18129/B9.bioc.limma). `getModelFit()` takes several arguments besides the expression data and metadata, but here we just use the defaults. For example, the data are from one condition, so we leave `condColname` as `NULL`. Also, the expression data are from microarrays and already log-transformed, so we leave `method` as `'trend'`. ```{r model_fit} fit = getModelFit(y, metadata) ``` ## Get posterior fits The next step is obtain posterior estimates of the model coefficients using [multivariate adaptive shrinkage](https://doi.org/10.1038/s41588-018-0268-8) ([mashr](https://stephenslab.github.io/mashr/index.html)), which learns patterns in the data and accounts for noise in the original fits. ```{r posterior_fit} fit = getPosteriorFit(fit) ``` ## Get rhythm statistics We can now use the posterior fits to compute rhythm statistics, i.e. properties of the curve, for each gene. ```{r rhythm_stats} rhyStats = getRhythmStats(fit) ``` We can examine the distributions of the statistics in various ways, such as ranking genes by peak-to-trough amplitude (no p-values necessary) or plotting peak-to-trough amplitude vs. peak phase. ```{r plot_stats, fig.width = 5, fig.height = 4} print(rhyStats[order(-peak_trough_amp)], nrows = 10L) ggplot(rhyStats) + geom_point(aes(x = peak_phase, y = peak_trough_amp), alpha = 0.2) + xlab('Peak phase (h)') + ylab('Peak-to-trough amplitude (norm.)') + scale_x_continuous(breaks = seq(0, 24, 4)) ``` ## Get observed and fitted time-courses We can also compute the expected measurements for one or more genes at one or more time-points, which correspond to the fitted curves. Here we plot the posterior fits and observed expression for three of the top rhythmic genes (converting from gene id to gene symbol). ```{r expected_meas} genes = data.table( id = c('13088', '13170', '13869'), symbol = c('Cyp2b10', 'Dbp', 'Erbb4')) measFit = getExpectedMeas(fit, times = seq(0, 24, 0.5), features = genes$id) measFit[genes, symbol := i.symbol, on = .(feature = id)] print(measFit, nrows = 10L) ``` Next we combine the observed expression data and metadata. The curves show how `limorhyde2` is able to fit non-sinusoidal rhythms. ```{r plot_timecourse, fig.width = 7, fig.height = 2.25} measObs = mergeMeasMeta(y, metadata, features = genes$id) measObs[genes, symbol := i.symbol, on = .(feature = id)] print(measObs, nrows = 10L) ggplot() + facet_wrap(vars(symbol), scales = 'free_y', nrow = 1) + geom_line(aes(x = time, y = value), data = measFit) + geom_point(aes(x = time %% 24, y = meas), shape = 21, size = 1.5, data = measObs) + labs(x = 'Circadian time (h)', y = 'Expression (norm.)') + scale_x_continuous(breaks = seq(0, 24, 4)) ```