Fit tensor product splines to longitudinal data

TPSfit() is used to fit multidimensional tensor product splines to longitudinal data with three or more variables of interest prior to implementation of a clustering algorithm. Data with one or two variables will be fit using cubic regression splines to each variable individually.

Usage

TPSfit(data, time, vars, ID, knots_time, kt, fit_times, n_fit_times, st = TRUE)

Arguments

data: A longitudinal dataset in long form with multiple variables measured over time.
time: Name of the time variable (e.g. "Time").
vars: A character vector containing the variables of interest.
ID: Name of the subject ID variable.
knots_time: A numeric vector of knots for spline-fitting the time variable. Must supply knots_time or kt.
kt: Number of evenly spaced knots for spline-fitting the time variable if knots_time is not given.
fit_times: Optional vector for times where fitted values will be calculated. If fit_times and n_fit_times are not given, fitted values are calculated at knots.
n_fit_times: Number of evenly spaced times where fitted values will be calculated if fit_times are not given.
st: Logical expression indicating whether each variable should be standardized.

Value

An object of class 'TPSfit' containing the following components:

GAMsfitted A data frame containing the fitted spline values.
GAMscoef A data frame containing the tensor product spline coefficients
data_long A data frame containing data in long format for both time and variable
knots A list of two vectors containing the variable and time knots
indiv_means A list containing a data frame of individual means for each of the variables of interest
GAMs A list containing the generalized additive models for fitting splines on each individual
nsubject The number of subjects in the dataset
IDmatch A data frame matching the original subject ID and new consecutive ID numbers
error_subjects A vector of individuals that encountered errors in the spline-fitting process

Details

TPSfit() employs package mgcv to fit a tensor product splines to each individual using a generalized additive model. The fitted splines are two-dimensional, with one dimension being the variable identifier and the other being time. An adequate number of observed time points (greater than the number of knots) are required for each individual, and the number of knots should be less than the smallest number of time points. If splines are unable to be fit for an individual, an error message will be shown, but splines will be fit for remaining individuals. A vector of identifiers for individuals with errors is included in the output as error_subjects, and these subjects are not included in the output GAMSsfitted or GAMscoef.

Examples