Fit tensor product splines to longitudinal data
TPSfit.Rd
TPSfit()
is used to fit multidimensional tensor product splines to
longitudinal data with three or more variables of interest prior to
implementation of a clustering algorithm. Data with one or two variables will
be fit using cubic regression splines to each variable individually.
Arguments
- data
A longitudinal dataset in long form with multiple variables measured over time.
- time
Name of the time variable (e.g. "Time").
- vars
A character vector containing the variables of interest.
- ID
Name of the subject ID variable.
- knots_time
A numeric vector of knots for spline-fitting the time variable. Must supply
knots_time
orkt
.- kt
Number of evenly spaced knots for spline-fitting the time variable if
knots_time
is not given.- fit_times
Optional vector for times where fitted values will be calculated. If
fit_times
andn_fit_times
are not given, fitted values are calculated at knots.- n_fit_times
Number of evenly spaced times where fitted values will be calculated if
fit_times
are not given.- st
Logical expression indicating whether each variable should be standardized.
Value
An object of class 'TPSfit
' containing the following components:
GAMsfitted
A data frame containing the fitted spline values.GAMscoef
A data frame containing the tensor product spline coefficientsdata_long
A data frame containing data in long format for both time and variableknots
A list of two vectors containing the variable and time knotsindiv_means
A list containing a data frame of individual means for each of the variables of interestGAMs
A list containing the generalized additive models for fitting splines on each individualnsubject
The number of subjects in the datasetIDmatch
A data frame matching the original subject ID and new consecutive ID numberserror_subjects
A vector of individuals that encountered errors in the spline-fitting process
Details
TPSfit()
employs package mgcv to fit a tensor product splines to each
individual using a generalized additive model. The fitted splines are
two-dimensional, with one dimension being the variable identifier and the
other being time. An adequate number of observed time points (greater than the
number of knots) are required for each individual, and the number of knots
should be less than the smallest number of time points. If splines are unable
to be fit for an individual, an error message will be shown, but splines will
be fit for remaining individuals. A vector of identifiers for individuals with
errors is included in the output as error_subjects
, and these subjects are
not included in the output GAMSsfitted
or GAMscoef
.
See also
The mgcv
R package: https://cran.r-project.org/web/packages/mgcv/index.html
Examples
data(TS.sim)
fitsplines <- TPSfit(TS.sim, vars=c("Var1", "Var2", "Var3"), time="Time",
ID="SubjectID", knots_time=c(0, 91, 182, 273, 365), n_fit_times=10)
fitsplines2 <- TPSfit(TS.sim, vars=c("Var1", "Var2", "Var3"),
time="Time", ID="SubjectID", knots_time=c(0, 91, 182, 273, 365),
fit_times=c(46, 91, 137, 182, 228, 273, 319))