Title: | Regularized Supervised Learning for Data from Rhythmic Systems |
---|---|
Description: | Method for predicting the value of a periodic variable from a high-dimensional observation. See Hughey et al. (2016) <doi:10.1093/nar/gkw030> and Hughey (2017) <doi:10.1186/s13073-017-0406-4>. |
Authors: | Jake Hughey [aut, cre] |
Maintainer: | Jake Hughey <[email protected]> |
License: | GPL-2 |
Version: | 2.1.3 |
Built: | 2025-01-19 03:46:06 UTC |
Source: | https://github.com/hugheylab/zeitzeiger |
Calculate circular difference.
getCircDiff(x, y, period = 1, towardZero = TRUE)
getCircDiff(x, y, period = 1, towardZero = TRUE)
x |
Numeric vector or matrix. |
y |
Numeric vector or matrix. |
period |
Period of the periodic variable. |
towardZero |
If |
Vector or matrix corresponding to x - y
.
Calculate the expected value of each feature.
predictIntensity(fitCoef, time, period = 1, knots = NULL)
predictIntensity(fitCoef, time, period = 1, knots = NULL)
fitCoef |
Matrix of coefficients from the spline fits, where rows correspond to features and columns correspond to variables in the model. |
time |
Vector of values of the periodic variable for the observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
period |
Period for the periodic variable. |
knots |
Optional vector of knots. This argument is designed for internal use. |
Matrix of predicted measurements, where rows correspond to time-points and columns correspond to features.
Train and test a ZeitZeiger predictor, calling the necessary functions.
zeitzeiger( xTrain, timeTrain, xTest, nKnots = 3, nTime = 10, useSpc = TRUE, sumabsv = 2, orth = TRUE, nSpc = 2, timeRange = seq(0, 1 - 0.01, 0.01) )
zeitzeiger( xTrain, timeTrain, xTest, nKnots = 3, nTime = 10, useSpc = TRUE, sumabsv = 2, orth = TRUE, nSpc = 2, timeRange = seq(0, 1 - 0.01, 0.01) )
xTrain |
Matrix of measurements for training data, observations in rows and features in columns. |
timeTrain |
Vector of values of the periodic variable for training observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
xTest |
Matrix of measurements for test data, observations in rows and features in columns. |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nTime |
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matri for which the SPCs will be calculated. |
useSpc |
Logical indicating whether to use |
sumabsv |
L1-constraint on the SPCs, passed to |
orth |
Logical indicating whether to require left singular vectors
be orthogonal to each other, passed to |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
fitResult |
Output of |
spcResult |
Output of |
predResult |
Output of |
zeitzeigerFit()
, zeitzeigerSpc()
, zeitzeigerPredict()
Train and test a predictor on multiple datasets independently, using
sva::ComBat()
to correct for batch effects prior to running zeitzeiger()
.
zeitzeigerBatch( ematList, trainStudyNames, sampleMetadata, studyColname, batchColname, timeColname, nKnots = 3, nTime = 10, useSpc = TRUE, sumabsv = 2, orth = TRUE, nSpc = 2, timeRange = seq(0, 1 - 0.01, 0.01), covariateName = NA, featuresExclude = NULL, dopar = TRUE )
zeitzeigerBatch( ematList, trainStudyNames, sampleMetadata, studyColname, batchColname, timeColname, nKnots = 3, nTime = 10, useSpc = TRUE, sumabsv = 2, orth = TRUE, nSpc = 2, timeRange = seq(0, 1 - 0.01, 0.01), covariateName = NA, featuresExclude = NULL, dopar = TRUE )
ematList |
Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names. |
trainStudyNames |
Character vector of names in |
sampleMetadata |
data.frame containing relevant information for each
sample across all datasets. Must have a column named |
studyColname |
Name of column in |
batchColname |
Name of column in |
timeColname |
Name of column in |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nTime |
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated. |
useSpc |
Logical indicating whether to use |
sumabsv |
L1-constraint on the SPCs, passed to |
orth |
Logical indicating whether to require left singular vectors
be orthogonal to each other, passed to |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
covariateName |
Name of column(s) in |
featuresExclude |
Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets. |
dopar |
Logical indicating whether to process the folds in parallel. Use
|
spcResultList |
List of output from |
timeDepLike |
3-D array of likelihood, with dimensions for each test
observation (across all datasets), each element of |
mleFit |
List (for each element in |
timePred |
Matrix of predicted times for test observations by values of
|
Make predictions by finding the maximum of the sum of the log-likelihoods.
zeitzeigerEnsembleLikelihood(timeDepLike, timeRange)
zeitzeigerEnsembleLikelihood(timeDepLike, timeRange)
timeDepLike |
List or 3-D array of time-dependent likelihood from
|
timeRange |
Vector of time-points at which the likelihood was calculated. |
timeDepLike |
Matrix of likelihood for observations by time-points. |
timePred |
Vector of predicted times. Each predicted time will be an element of timeRange. |
zeitzeigerPredict()
, zeitzeigerEnsembleMean()
Make predictions by calculating the circular mean of the predictions across members of the ensemble.
zeitzeigerEnsembleMean(timePredInput, timeMax = 1, naRm = TRUE)
zeitzeigerEnsembleMean(timePredInput, timeMax = 1, naRm = TRUE)
timePredInput |
Matrix of predicted times in which rows correspond to observations and columns correspond to members of the ensemble. |
timeMax |
Maximum value of the periodic variable, i.e., the value that is equivalent to zero. |
naRm |
Logical indicating whether |
Matrix with a row for each observation and columns for the predicted time and the normalized magnitude of the circular mean. The latter can range from 0 to 1, with 1 indicating perfect agreement among members of the ensemble.
zeitzeigerPredict()
, zeitzeigerEnsembleLikelihood()
Fit a periodic smoothing spline to the measurements for each feature as a function of the periodic variable.
zeitzeigerFit(x, time, nKnots = 3)
zeitzeigerFit(x, time, nKnots = 3)
x |
Matrix of measurements, with observations in rows and features in columns. Missing values are allowed. |
time |
Vector of values of the periodic variable for the observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
xFitMean |
Matrix of coefficients, where rows correspond to features and columns correspond to variables in the fit. |
xFitResid |
Vector of root mean square of residuals, same length as
|
zeitzeigerSpc()
, zeitzeigerPredict()
Fit a periodic spline for each feature for each fold of cross-validation.
zeitzeigerFitCv(x, time, foldid, nKnots = 3)
zeitzeigerFitCv(x, time, foldid, nKnots = 3)
x |
Matrix of measurements, with observations in rows and features in columns. |
time |
Vector of values of the periodic variable for the observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
foldid |
Vector of values indicating the fold to which each observation belongs. |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
A list consisting of the result from zeitzeigerFit()
for each fold.
zeitzeigerFit()
, zeitzeigerSpcCv()
, zeitzeigerPredictCv()
Predict the value of the periodic variable for test observations given training data and SPCs.
zeitzeigerPredict( xTrain, timeTrain, xTest, spcResult, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01) )
zeitzeigerPredict( xTrain, timeTrain, xTest, spcResult, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01) )
xTrain |
Matrix of measurements for training data, observations in rows and features in columns. |
timeTrain |
Vector of values of the periodic variable for training observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
xTest |
Matrix of measurements for test data, observations in rows and features in columns. |
spcResult |
Output of |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
timeDepLike |
3-D array of likelihood, with dimensions for each test
observation, each element of |
mleFit |
List (for each element in |
timePred |
Matrix of predicted times for test observations by values of
|
zeitzeigerFit()
, zeitzeigerSpc()
Make predictions for each observation for each fold of cross-validation.
zeitzeigerPredictCv( x, time, foldid, spcResultList, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01), dopar = TRUE )
zeitzeigerPredictCv( x, time, foldid, spcResultList, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01), dopar = TRUE )
x |
Matrix of measurements, observations in rows and features in columns. |
time |
Vector of values of the periodic variable for observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
foldid |
Vector of values indicating the fold to which each observation belongs. |
spcResultList |
Output of |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
dopar |
Logical indicating whether to process the folds in parallel.
Use |
A list of the same structure as zeitzeigerPredict()
, combining the
results from each fold of cross-validation.
timeDepLike |
3-D array of likelihood, with dimensions for each
observation, each element of |
mleFit |
List (for each element in |
timePred |
Matrix of predicted times for observations by values of
|
zeitzeigerPredict()
, zeitzeigerFitCv()
, zeitzeigerSpcCv()
Predict the value of the periodic variable for each group of test observations, where the amount of time between each observation in a group is known.
zeitzeigerPredictGroup( xTrain, timeTrain, xTest, groupTest, spcResult, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01) )
zeitzeigerPredictGroup( xTrain, timeTrain, xTest, groupTest, spcResult, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01) )
xTrain |
Matrix of measurements for training data, observations in rows and features in columns. |
timeTrain |
Vector of values of the periodic variable for training observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
xTest |
Matrix of measurements for test data, observations in rows and features in columns. |
groupTest |
data.frame with one row per observation in |
spcResult |
Output of |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
A list with the following elements, where the groups will be sorted by their names.
timeDepLike |
3-D array of likelihood, with dimensions for each group of
test observations, each element of |
mleFit |
List (for each element in |
timePred |
Matrix of predicted times for each group of test observations
by values of |
Predict corresponding time for each group of observations in cross-validation. Thus, each fold is equivalent to a group.
zeitzeigerPredictGroupCv( x, time, foldid, spcResultList, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01), dopar = TRUE )
zeitzeigerPredictGroupCv( x, time, foldid, spcResultList, nKnots = 3, nSpc = NA, timeRange = seq(0, 1 - 0.01, 0.01), dopar = TRUE )
x |
Matrix of measurements, observations in rows and features in columns. |
time |
Vector of values of the periodic variable for observations, where 0 corresponds to the lowest possible value and 1 corresponds to the highest possible value. |
foldid |
Vector of values indicating the fold to which each observation belongs. |
spcResultList |
Result from |
nKnots |
Number of internal knots to use for the periodic smoothing spline. |
nSpc |
Vector of the number of SPCs to use for prediction. If |
timeRange |
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer. |
dopar |
Logical indicating whether to process the folds in parallel. Use
|
A list of the same structure as zeitzeigerPredictGroup
, combining
the results from each fold of cross-validation. Folds (i.e, groups) will be
sorted by foldid.
timeDepLike |
3-D array of likelihood, with dimensions for each fold,
each element of |
mleFit |
List (for each element in |
timePred |
Matrix of predicted times for folds by values of |
zeitzeigerFitCv()
, zeitzeigerSpcCv()
, zeitzeigerPredictGroup()
Calculate the SPCs given the time-dependent means and the residuals from
zeitzeigerFit()
.
zeitzeigerSpc( xFitMean, xFitResid, nTime = 10, useSpc = TRUE, sumabsv = 1, orth = TRUE, ... )
zeitzeigerSpc( xFitMean, xFitResid, nTime = 10, useSpc = TRUE, sumabsv = 1, orth = TRUE, ... )
xFitMean |
List of bigsplines, length is number of features. |
xFitResid |
Matrix of residuals, dimensions are observations by features. |
nTime |
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated. |
useSpc |
Logical indicating whether to use |
sumabsv |
L1-constraint on the SPCs, passed to |
orth |
Logical indicating whether to require left singular vectors
be orthogonal to each other, passed to |
... |
Other arguments passed to |
Output of PMA::SPC()
, unless useSpc
is FALSE
, then output of
base::svd()
.
zeitzeigerFit()
, zeitzeigerPredict()
Calculate SPCs for each fold of cross-validation.
zeitzeigerSpcCv( fitResultList, nTime = 10, useSpc = TRUE, sumabsv = 1, orth = TRUE, dopar = TRUE )
zeitzeigerSpcCv( fitResultList, nTime = 10, useSpc = TRUE, sumabsv = 1, orth = TRUE, dopar = TRUE )
fitResultList |
Output of |
nTime |
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated. |
useSpc |
Logical indicating whether to use |
sumabsv |
L1-constraint on the SPCs, passed to |
orth |
Logical indicating whether to require left singular vectors
be orthogonal to each other, passed to |
dopar |
Logical indicating whether to process the folds in parallel. Use
|
A list consisting of the result from zeitzeigerSpc()
for each fold.
zeitzeigerSpc()
, zeitzeigerFitCv()
, zeitzeigerPredictCv()