Title: | Longitudinal Gaussian Process Regression |
---|---|
Description: | Interpretable nonparametric modeling of longitudinal data using additive Gaussian process regression. Contains functionality for inferring covariate effects and assessing covariate relevances. Models are specified using a convenient formula syntax, and can include shared, group-specific, non-stationary, heterogeneous and temporally uncertain effects. Bayesian inference for model parameters is performed using 'Stan'. The modeling approach and methods are described in detail in Timonen et al. (2021) <doi:10.1093/bioinformatics/btab021>. |
Authors: | Juho Timonen [aut, cre] |
Maintainer: | Juho Timonen <[email protected]> |
License: | GPL (>=3) |
Version: | 1.2.4 |
Built: | 2025-02-09 06:04:51 UTC |
Source: | https://github.com/jtimonen/lgpr |
Interpretable nonparametric modeling of longitudinal data
using additive Gaussian process regression. Contains functionality
for inferring covariate effects and assessing covariate relevances.
Models are specified using a convenient formula syntax, and can include
shared, group-specific, non-stationary, heterogeneous and temporally
uncertain effects. Bayesian inference for model parameters is performed
using 'Stan' (rstan
). The modeling approach and methods
are described in detail in
Timonen et al. (2021).
Main functionality of the package consists of creating and fitting an additive GP model:
lgp
: Specify and fit an additive GP model with one
command.
create_model
: Define an lgpmodel object.
sample_model
: Fit a model by sampling the posterior
distribution of its parameters and create an lgpfit object.
pred
: Computing model predictions and inferred
covariate effects after fitting a model.
relevances
: Assessing covariate relevances after
fitting a model.
prior_pred
: Prior predictive sampling to check
if your prior makes sense.
plot_pred
: Plot model predictions.
plot_components
: Visualize inferred model components.
plot_draws
: Visualize parameter draws.
plot_data
: Visualize longitudinal data.
The data that you wish to analyze with 'lgpr' should be in an R
data.frame
where columns correspond to measured variables and rows
correspond to observations. Some functions that can help working with such
data frames are:
new_x
: Creating new test points where the posterior
distribution of any function component or sum of all components, or the
posterior predictive distribution can be computed after model fitting.
Other functions: add_factor
,
add_factor_crossing
, add_dis_age
,
adjusted_c_hat
.
See https://jtimonen.github.io/lgpr-usage/index.html. The tutorials focus on code and use cases, whereas the Mathematical description of lgpr models vignette describes the statistical models and how they can be customized in 'lgpr'.
Run citation("lgpr")
to get citation information.
Bug reports, PRs, enhancement ideas or user experiences in general are welcome and appreciated. Create an issue in Github or email the author.
Juho Timonen (first.last at iki.fi)
Timonen, J. et al. (2021). lgpr: an interpretable non-parametric method for inferring covariate effects from longitudinal data. Bioinformatics, url.
Carpenter, B. et al. (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76(1).
Useful links:
Creates the disease-related age covariate vector based on the disease initiation times and adds it to the data frame
add_dis_age(data, t_init, id_var = "id", time_var = "age")
add_dis_age(data, t_init, id_var = "id", time_var = "age")
data |
the original data frame |
t_init |
A named vector containing the observed initiation or onset
time for each individual. The names, i.e. |
id_var |
name of the id variable in |
time_var |
name of the time variable in |
A data frame with one column added. The new column will
be called dis_age
. For controls, its value will be NaN
.
Other data frame handling functions:
add_factor()
,
add_factor_crossing()
,
adjusted_c_hat()
,
new_x()
,
split()
Easily add a categorical covariate to a data frame
add_factor(data, x, id_var = "id")
add_factor(data, x, id_var = "id")
data |
the original data frame |
x |
A named vector containing the category for each individual. The names should specify the individual id. |
id_var |
name of the id variable in |
A data frame with one column added. The new column will
have same name as the variable passed as input x
.
Other data frame handling functions:
add_dis_age()
,
add_factor_crossing()
,
adjusted_c_hat()
,
new_x()
,
split()
Add a crossing of two factors to a data frame
add_factor_crossing(data, fac1, fac2, new_name)
add_factor_crossing(data, fac1, fac2, new_name)
data |
a data frame |
fac1 |
name of first factor, must be found in |
fac2 |
name of second factor, must be found in |
new_name |
name of the new factor |
a data frame
Other data frame handling functions:
add_dis_age()
,
add_factor()
,
adjusted_c_hat()
,
new_x()
,
split()
Creates the c_hat
input for lgp
,
so that it accounts for normalization between data points in the
"poisson"
or "nb"
observation model
adjusted_c_hat(y, norm_factors)
adjusted_c_hat(y, norm_factors)
y |
response variable, vector of length |
norm_factors |
normalization factors, vector of length |
a vector of length n
, which can be used as
the c_hat
input to the lgp
function
Other data frame handling functions:
add_dis_age()
,
add_factor()
,
add_factor_crossing()
,
new_x()
,
split()
Apply variable scaling
apply_scaling(scaling, x, inverse = FALSE)
apply_scaling(scaling, x, inverse = FALSE)
scaling |
an object of class lgpscaling |
x |
object to which apply the scaling (numeric) |
inverse |
whether scaling should be done in inverse direction |
a similar object as x
Other variable scaling functions:
create_scaling()
Character representations of different formula objects
## S4 method for signature 'lgpexpr' as.character(x) ## S4 method for signature 'lgpterm' as.character(x) ## S4 method for signature 'lgpformula' as.character(x)
## S4 method for signature 'lgpexpr' as.character(x) ## S4 method for signature 'lgpterm' as.character(x) ## S4 method for signature 'lgpformula' as.character(x)
x |
an object of some S4 class |
a character representation of the object
See the Mathematical description of lgpr models vignette for more information about the connection between different options and the created statistical model.
create_model( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian") )
create_model( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian") )
formula |
The model formula, where
See the "Model formula syntax" section below ( |
data |
A |
likelihood |
Determines the observation model. Must be either
|
prior |
A named list, defining the prior distribution of model
(hyper)parameters. See the "Defining priors" section below
( |
c_hat |
The GP mean. This should only be given if
where |
num_trials |
This argument (number of trials) is only needed when
likelihood is |
options |
A named list with the following possible fields:
If |
prior_only |
Should likelihood be ignored? See also
|
verbose |
Should some informative messages be printed? |
sample_f |
Determines if the latent function values are sampled
(must be |
An object of class lgpmodel, containing the
Stan input created based on parsing the specified formula
,
prior
, and other options.
Other main functions:
draw_pred()
,
get_draws()
,
lgp()
,
pred()
,
prior_pred()
,
sample_model()
Parse the covariates and model components from given data and formula
create_model.covs_and_comps(data, model_formula, x_cont_scl, verbose)
create_model.covs_and_comps(data, model_formula, x_cont_scl, verbose)
data |
A |
model_formula |
an object of class lgpformula |
x_cont_scl |
Information on how to scale the continuous covariates. This can either be
|
verbose |
Should some informative messages be printed? |
parsed input to Stan and covariate scaling, and other info
Other internal model creation functions:
create_model.formula()
,
create_model.likelihood()
,
create_model.prior()
Checks if formula is in advanced format and translates if not.
create_model.formula(formula, data, verbose = FALSE)
create_model.formula(formula, data, verbose = FALSE)
formula |
The model formula, where
See the "Model formula syntax" section below ( |
data |
A |
verbose |
Should some informative messages be printed? |
an object of class lgpformula
Other internal model creation functions:
create_model.covs_and_comps()
,
create_model.likelihood()
,
create_model.prior()
Parse the response variable and its likelihood model
create_model.likelihood( data, likelihood, c_hat, num_trials, y_name, sample_f, verbose )
create_model.likelihood( data, likelihood, c_hat, num_trials, y_name, sample_f, verbose )
data |
A |
likelihood |
Determines the observation model. Must be either
|
c_hat |
The GP mean. This should only be given if
where |
num_trials |
This argument (number of trials) is only needed when
likelihood is |
y_name |
Name of response variable |
sample_f |
Determines if the latent function values are sampled
(must be |
verbose |
Should some informative messages be printed? |
a list of parsed options
Other internal model creation functions:
create_model.covs_and_comps()
,
create_model.formula()
,
create_model.prior()
Parse the given modeling options
create_model.options(options, verbose)
create_model.options(options, verbose)
options |
A named list with the following possible fields:
If |
verbose |
Should some informative messages be printed? |
a named list of parsed options
Parse given prior
create_model.prior(prior, stan_input, verbose)
create_model.prior(prior, stan_input, verbose)
prior |
A named list, defining the prior distribution of model
(hyper)parameters. See the "Defining priors" section below
( |
stan_input |
a list of stan input fields |
verbose |
Should some informative messages be printed? |
a named list of parsed options
Other internal model creation functions:
create_model.covs_and_comps()
,
create_model.formula()
,
create_model.likelihood()
Helper function for plots
create_plot_df(object, x = "age", group_by = "id")
create_plot_df(object, x = "age", group_by = "id")
object |
model or fit |
x |
x-axis variable name |
group_by |
grouping variable name (use |
a data frame
Create a standardizing transform
create_scaling(x, name)
create_scaling(x, name)
x |
variable measurements (might contain |
name |
variable name |
an object of class lgpscaling
Other variable scaling functions:
apply_scaling()
Using the same parametrization as Stan. More info here.
dinvgamma_stanlike(x, alpha, beta, log = FALSE) qinvgamma_stanlike(p, alpha, beta)
dinvgamma_stanlike(x, alpha, beta, log = FALSE) qinvgamma_stanlike(p, alpha, beta)
x |
point where to compute the density |
alpha |
positive real number |
beta |
positive real number |
log |
is log-scale used? |
p |
quantile (must be between 0 and 1) |
density/quantile value
Other functions related to the inverse-gamma distribution:
plot_invgamma()
,
priors
Draw pseudo-observations from predictive distribution.
If pred
contains draws from the component posterior (prior)
distributions, then the output is draws from the posterior (prior)
predictive distribution. If pred
is not specified, then
whether output draws are from prior or posterior predictive distribution
depends on whether fit
is created using the lgp
option prior_only=TRUE
or not.
draw_pred(fit, pred = NULL)
draw_pred(fit, pred = NULL)
fit |
An object of class lgpfit that has been created
using the |
pred |
An object of class Prediction, containing
draws of each model component. If |
An array with shape , where
is the number of
draws that
pred
contains and is the length of each
function draw.
Each row
of the output is one vector drawn from the
predictive distribution, given parameter draw
.
Other main functions:
create_model()
,
get_draws()
,
lgp()
,
pred()
,
prior_pred()
,
sample_model()
Quick way to create an example lgpfit, useful for debugging
example_fit( formula = y ~ id + age + age | SEX + age | LOC, likelihood = "gaussian", chains = 1, iter = 30, num_indiv = 6, num_timepoints = 5, ... )
example_fit( formula = y ~ id + age + age | SEX + age | LOC, likelihood = "gaussian", chains = 1, iter = 30, num_indiv = 6, num_timepoints = 5, ... )
formula |
model formula |
likelihood |
observation model |
chains |
number of chains to run |
iter |
number of iterations to run |
num_indiv |
number of individuals (data simulation) |
num_timepoints |
number of time points (data simulation) |
... |
additional arguments to |
An lgpfit object created by fitting the example model.
Print a fit summary.
fit_summary(fit, ignore_pars = c("f_latent", "eta", "teff_raw", "lp__"))
fit_summary(fit, ignore_pars = c("f_latent", "eta", "teff_raw", "lp__"))
fit |
an object of class lgpfit |
ignore_pars |
parameters and generated quantities to ignore from output |
object
invisibly.
An S4 class to represent analytically computed predictive distributions (conditional on hyperparameters) of an additive GP model
## S4 method for signature 'GaussianPrediction' show(object) ## S4 method for signature 'GaussianPrediction' component_names(object) ## S4 method for signature 'GaussianPrediction' num_components(object) ## S4 method for signature 'GaussianPrediction' num_paramsets(object) ## S4 method for signature 'GaussianPrediction' num_evalpoints(object)
## S4 method for signature 'GaussianPrediction' show(object) ## S4 method for signature 'GaussianPrediction' component_names(object) ## S4 method for signature 'GaussianPrediction' num_components(object) ## S4 method for signature 'GaussianPrediction' num_paramsets(object) ## S4 method for signature 'GaussianPrediction' num_evalpoints(object)
object |
GaussianPrediction object for which to apply a class method. |
show(GaussianPrediction)
: Print a summary about the object.
component_names(GaussianPrediction)
: Get names of components.
num_components(GaussianPrediction)
: Get number of components.
num_paramsets(GaussianPrediction)
: Get number of parameter combinations
(different parameter vectors) using which predictions were computed.
num_evalpoints(GaussianPrediction)
: Get number of points where
predictions were computed.
f_comp_mean
component means
f_comp_std
component standard deviations
f_mean
signal mean (on normalized scale)
f_std
signal standard deviation (on normalized scale)
y_mean
predictive mean (on original data scale)
y_std
predictive standard deviation (on original data scale)
x
a data frame of points (covariate values) where the function posteriors or predictive distributions have been evaluated
Uses extract
with permuted = FALSE
and inc_warmup = FALSE
.
get_draws(object, draws = NULL, reduce = NULL, ...)
get_draws(object, draws = NULL, reduce = NULL, ...)
object |
An object of class lgpfit or |
draws |
Indices of the parameter draws. |
reduce |
Function used to reduce all parameter draws into
one set of parameters. Ignored if |
... |
Additional arguments to |
The return value is always a 2-dimensional array of shape
num_param_sets
x num_params
.
Other main functions:
create_model()
,
draw_pred()
,
lgp()
,
pred()
,
prior_pred()
,
sample_model()
NOTE: It is not recommended for users to call this. Use
pred
instead.
get_pred(fit, draws = NULL, reduce = NULL, verbose = TRUE)
get_pred(fit, draws = NULL, reduce = NULL, verbose = TRUE)
fit |
An object of class lgpfit. |
draws |
Indices of parameter draws to use, or |
reduce |
Reduction for parameters draws. Can be a function that
is applied to reduce all parameter draws into one parameter set, or
|
verbose |
Should more information and a possible progress bar be printed? |
an object of class GaussianPrediction or Prediction
These have STAN_kernel_*
counterparts. These R versions
are provided for reference and are not optimized for speed. These are
used when generating simulated data, and not during model inference.
kernel_eq(x1, x2, alpha = 1, ell) kernel_ns(x1, x2, alpha = 1, ell, a) kernel_zerosum(x1, x2, M) kernel_bin(x1, x2, pos_class = 0) kernel_cat(x1, x2) kernel_varmask(x1, x2, a, vm_params) kernel_beta(beta, idx1_expand, idx2_expand)
kernel_eq(x1, x2, alpha = 1, ell) kernel_ns(x1, x2, alpha = 1, ell, a) kernel_zerosum(x1, x2, M) kernel_bin(x1, x2, pos_class = 0) kernel_cat(x1, x2) kernel_varmask(x1, x2, a, vm_params) kernel_beta(beta, idx1_expand, idx2_expand)
x1 |
vector of length |
x2 |
vector of length |
alpha |
marginal std (default = 1) |
ell |
lengthscale |
a |
steepness of the warping function rise |
M |
number of categories |
pos_class |
binary (mask) kernel function has value one if both inputs have this value, other wise it is zero |
vm_params |
vector of two mask function parameters. |
beta |
a parameter vector (row vector) of length |
idx1_expand |
integer vector of length |
idx2_expand |
integer vector of length |
A matrix of size x
.
kernel_eq()
: Uses the exponentiated quadratic kernel.
kernel_ns()
: Uses the non-stationary kernel (input warping + squared
exponential).
kernel_zerosum()
: Uses the zero-sum kernel. Here, x1
and
x2
must be integer vectors (integers denoting different categories).
Returns a binary matrix.
kernel_bin()
: Uses the binary (mask) kernel. Here, x1
and
x2
must be integer vectors (integers denoting different categories).
Returns a binary matrix.
kernel_cat()
: Uses the categorical kernel. Here, x1
and
x2
must be integer vectors (integers denoting different categories).
Returns a binary matrix.
kernel_varmask()
: Computes variance mask multiplier matrix. NaN
's
in x1
and x2
will be replaced by 0.
kernel_beta()
: Computes the heterogeneity multiplier matrix.
NOTE: idx_expand
needs to be given so that
idx_expand[j]-1
tells the index of the beta parameter that should be
used for the th observation. If observation
doesn't
correspond to any beta parameter, then
idx_expand[j]
should be 1.
An S4 class to represent input for kernel matrix computations
## S4 method for signature 'KernelComputer' show(object) ## S4 method for signature 'KernelComputer' num_components(object) ## S4 method for signature 'KernelComputer' num_evalpoints(object) ## S4 method for signature 'KernelComputer' num_paramsets(object) ## S4 method for signature 'KernelComputer' component_names(object)
## S4 method for signature 'KernelComputer' show(object) ## S4 method for signature 'KernelComputer' num_components(object) ## S4 method for signature 'KernelComputer' num_evalpoints(object) ## S4 method for signature 'KernelComputer' num_paramsets(object) ## S4 method for signature 'KernelComputer' component_names(object)
object |
The object for which to call a class method. |
show(KernelComputer)
: Print a summary about the object.
num_components(KernelComputer)
: Get number of components.
num_evalpoints(KernelComputer)
: Get number of evaluation points.
num_paramsets(KernelComputer)
: Get number of parameter sets.
component_names(KernelComputer)
: Get component names.
input
Common input (for example parameter values).
K_input
Input for computing kernel matrices between data points
(N
x N
). A list.
Ks_input
Input for computing kernel matrices between data and output
points (P
x N
). A list.
Kss_input
Input for computing kernel matrices between output
points (P
x P
). A list, empty if full_covariance=FALSE
.
comp_names
Component names (character vector).
full_covariance
Boolean value determining if this can compute full predictive covariance matrices (or just marginal variance at each point).
no_separate_output_points
Boolean value determining if
Ks_input
and Kss_input
are the same thing. Using this
knowledge can reduce unnecessary computations of kernel matrices.
STREAM
external pointer (for calling 'Stan' functions)
Creates an additive Gaussian process model using
create_model
and fits it using sample_model
.
See the
Mathematical description of lgpr models
vignette for more information about the connection between different options
and the created statistical model.
lgp( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian"), quiet = FALSE, skip_postproc = sample_f, ... )
lgp( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian"), quiet = FALSE, skip_postproc = sample_f, ... )
formula |
The model formula, where
See the "Model formula syntax" section below ( |
data |
A |
likelihood |
Determines the observation model. Must be either
|
prior |
A named list, defining the prior distribution of model
(hyper)parameters. See the "Defining priors" section below
( |
c_hat |
The GP mean. This should only be given if
where |
num_trials |
This argument (number of trials) is only needed when
likelihood is |
options |
A named list with the following possible fields:
If |
prior_only |
Should likelihood be ignored? See also
|
verbose |
Can messages be printed during model creation? Has no
effect if |
sample_f |
Determines if the latent function values are sampled
(must be |
quiet |
Should all output messages be suppressed? You need to set
also |
skip_postproc |
Should all postprocessing be skipped? If this is
|
... |
Optional arguments passed to
|
Returns an object of the S4 class lgpfit.
There are two ways to define the model formula:
Using a common formula
-like syntax, like in
y ~ age +
age|id
+ sex
. Terms can consist of a
single variable, such as age
, or an interaction of two variables,
such as age|id
. In single-variable terms, the variable can be either
continuous (numeric) or categorical (factor), whereas in interaction terms
the variable on the left-hand side of the vertical bar (|
) has to
be continuous and the one on the right-hand side has to be categorical.
Formulae specified using this syntax are translated to the advanced format
so that
single-variable terms become gp(x)
if
variable x
is numeric and zs(x)
if x
is a factor
interaction terms x|z
become gp(x)*zs(z)
Using the advanced syntax, like in y ~ gp(age) +
gp(age)*zs(id) +
het(id)*gp_vm(disAge)
.
This creates lgprhs objects, which consist of
lgpterms, which consist of lgpexprs.
This approach must be used if creating nonstationary, heterogeneous or
temporally uncertain components.
Either one of the approaches should be used and they should not be mixed.
The prior
argument must be a named list, like
list(alpha=student_t(4), wrp=igam(30,10))
. See examples in tutorials.
Possible allowed names are
"alpha"
= component magnitude parameters
"ell"
= component lengthscale parameters
"wrp"
= input warping steepness parameters
"sigma"
= noise magnitude (Gaussian obs. model)
"phi"
= inv. overdispersion (negative binomial obs. model)
"gamma"
= overdispersion (beta-binomial obs. model)
"beta"
= heterogeneity parameters
"effect_time"
= uncertain effect time parameters
"effect_time_info"
= additional options for the above
See priors
for functions that can be
used to define the list elements. If a parameter of a model is not given
in this list, a default prior will be used for it.
It is not recommended to use default priors blindly. Rather, priors should
be specified according to the knowledge about the problem at hand, as in any
Bayesian analysis. In lgpr
this is especially important when
Using a non-Gaussian likelihood or otherwise setting
sample_f = TRUE
. In this case the response variable is not
normalized, so the scale on which the data varies must be taken into
account when defining priors of the signal magnitude parameters
alpha
and possible noise parameters (sigma
, phi
,
gamma
). Also it should be checked if c_hat
is set in a
sensible way.
Using a model that contains a gp_ns(x)
or gp_vm(x)
expression in its formula. In this case the corresponding covariate
x
is not normalized, and the prior for the input warping steepness
parameter wrp
must be set according to the expected width of the
window in which the nonstationary effect of x
occurs. By default,
the width of this window is about 36, which has been set assuming that
the unit of x
is months.
Other main functions:
create_model()
,
draw_pred()
,
get_draws()
,
pred()
,
prior_pred()
,
sample_model()
An S4 class to represent an lgp expression
covariate
name of a covariate
fun
function name
See operations
for performing arithmetics
on lgprhs, lgpterm and lgpexpr
objects.
lgp
functionAn S4 class to represent the output of the lgp
function
## S4 method for signature 'lgpfit' show(object) ## S4 method for signature 'lgpfit' component_names(object) ## S4 method for signature 'lgpfit' num_components(object) ## S4 method for signature 'lgpfit' postproc(object, verbose = TRUE) ## S4 method for signature 'lgpfit' contains_postproc(object) ## S4 method for signature 'lgpfit' clear_postproc(object) ## S4 method for signature 'lgpfit' get_model(object) ## S4 method for signature 'lgpfit' get_stanfit(object) ## S4 method for signature 'lgpfit' is_f_sampled(object) ## S4 method for signature 'lgpfit,missing' plot(x, y)
## S4 method for signature 'lgpfit' show(object) ## S4 method for signature 'lgpfit' component_names(object) ## S4 method for signature 'lgpfit' num_components(object) ## S4 method for signature 'lgpfit' postproc(object, verbose = TRUE) ## S4 method for signature 'lgpfit' contains_postproc(object) ## S4 method for signature 'lgpfit' clear_postproc(object) ## S4 method for signature 'lgpfit' get_model(object) ## S4 method for signature 'lgpfit' get_stanfit(object) ## S4 method for signature 'lgpfit' is_f_sampled(object) ## S4 method for signature 'lgpfit,missing' plot(x, y)
object |
The object for which to apply a class method. |
verbose |
Can the method print any messages? |
x |
an lgpfit object to visualize |
y |
unused argument |
show(lgpfit)
: Print information and summary about the fit object.
component_names(lgpfit)
: Get names of model components.
num_components(lgpfit)
: Get number of model components. Returns a
positive integer.
postproc(lgpfit)
: Apply postprocessing. Returns an updated
lgpfit object (copies data).
contains_postproc(lgpfit)
: Check if object contains postprocessing information.
clear_postproc(lgpfit)
: Returns an updated (copies data)
lgpfit object without any postprocessing information.
get_model(lgpfit)
: Get the stored lgpmodel object.
Various properties of the returned object can be accessed as explained
in the documentation of lgpmodel.
get_stanfit(lgpfit)
: Get the stored stanfit
object.
Various properties of the returned object can be accessed or plotted
as explained
here
or in the documentation of stanfit
.
is_f_sampled(lgpfit)
: Determine if inference was done by sampling
the latent signal f
(and its components).
plot(x = lgpfit, y = missing)
: Visualize parameter draws using plot_draws
.
stan_fit
An object of class stanfit
.
model
An object of class lgpmodel
.
num_draws
Total number of parameter draws.
postproc_results
A named list containing possible postprocessing results.
For extracting parameter draws, see get_draws
,
or the rstan
methods for stanfit
objects.
For more detailed plotting functions, see plot_draws
,
plot_beta
, plot_warp
,
plot_effect_times
An S4 class to represent an lgp formula
terms
an object of class lgprhs
y_name
name of the response variable
call
original formula call
See operations
for performing arithmetics
on lgprhs, lgpterm and lgpexpr
objects.
An S4 class to represent an additive GP model
## S4 method for signature 'lgpmodel' show(object) ## S4 method for signature 'lgpmodel' parameter_info(object, digits = 3) ## S4 method for signature 'lgpmodel' component_info(object) ## S4 method for signature 'lgpmodel' num_components(object) ## S4 method for signature 'lgpmodel' covariate_info(object) ## S4 method for signature 'lgpmodel' component_names(object) ## S4 method for signature 'lgpmodel' is_f_sampled(object)
## S4 method for signature 'lgpmodel' show(object) ## S4 method for signature 'lgpmodel' parameter_info(object, digits = 3) ## S4 method for signature 'lgpmodel' component_info(object) ## S4 method for signature 'lgpmodel' num_components(object) ## S4 method for signature 'lgpmodel' covariate_info(object) ## S4 method for signature 'lgpmodel' component_names(object) ## S4 method for signature 'lgpmodel' is_f_sampled(object)
object |
The object for which to apply a class method. |
digits |
number of digits to show for floating point numbers |
show(lgpmodel)
: Print information and summary about the object.
Returns object
invisibly.
parameter_info(lgpmodel)
: Get a parameter summary (bounds and
priors). Returns a data.frame
.
component_info(lgpmodel)
: Get a data frame with information about each model
component.
num_components(lgpmodel)
: Get number of model components. Returns a
positive integer.
covariate_info(lgpmodel)
: Get covariate information.
component_names(lgpmodel)
: Get names of model components.
is_f_sampled(lgpmodel)
: Determine if inference of the model requires sampling
the latent signal f
(and its components).
formula
An object of class lgpformula
data
The original unmodified data.
stan_input
The data to be given as input to rstan::sampling
var_names
List of variable names grouped by type.
var_scalings
A named list with fields
y
- Response variable normalization function and its
inverse operation. Must be an lgpscaling object.
x_cont
- Continuous covariate normalization functions and
their inverse operations. Must be a named list with each element is an
lgpscaling object.
var_info
A named list with fields
x_cat_levels
- Names of the levels of categorical covariates
before converting from factor to numeric.
info
Other info in text format.
sample_f
Whether the signal f
is sampled or marginalized.
full_prior
Complete prior information.
An S4 class to represent the right-hand side of an lgp formula
summands
a list of one or more lgpterms
See operations
for performing arithmetics
on lgprhs, lgpterm and lgpexpr
objects.
An S4 class to represent variable scaling
loc
original location (mean)
scale
original scale (standard deviation)
var_name
variable name
An S4 class to represent a data set simulated using the additive GP formalism
## S4 method for signature 'lgpsim' show(object) ## S4 method for signature 'lgpsim,missing' plot(x, y, ...)
## S4 method for signature 'lgpsim' show(object) ## S4 method for signature 'lgpsim,missing' plot(x, y, ...)
object |
an lgpsim object |
x |
an lgpsim object to plot |
y |
not used |
... |
optional arguments passed to |
show(lgpsim)
: Show summary of object.
plot(x = lgpsim, y = missing)
: Plot the data and generating process. For more
information see plot_sim
.
data
the actual data
response
name of the response variable in the data
components
the drawn function components
kernel_matrices
the covariance matrices for each gp
info
A list with fields
par_ell
the used lengthscale parameters
par_cont
the parameters used to generate the continuous
covariates
p_signal
signal proportion
effect_times
A list with fields
true
possible true effect times that generate the disease
effect
observed
possible observed effect times
An S4 class to represent one formula term
factors
a list of at most two lgpexprs
See operations
for performing arithmetics
on lgprhs, lgpterm and lgpexpr
objects.
Print a model summary.
model_summary(object, digits = 3) param_summary(object, digits = 3)
model_summary(object, digits = 3) param_summary(object, digits = 3)
object |
a model or fit |
digits |
number of digits to round floats to |
object
invisibly.
Replaces a continuous variable x
in the data frame, and
possibly another continuous variable x_ns
derived from it, with new
values, for each level of a grouping factor (usually id)
new_x(data, x_values, group_by = "id", x = "age", x_ns = NULL)
new_x(data, x_values, group_by = "id", x = "age", x_ns = NULL)
data |
A data frame. Can also be an lgpfit or lgpmodel object, in which case data is extracted from it. |
x_values |
the values of |
group_by |
name of the grouping variable, must be a factor
in |
x |
of the variable along which to extend,
must be a numeric in |
x_ns |
of a nonstationary variable derived from |
a data frame containing the following columns
all factors in the original data
x
x_ns
(unless it is NULL)
Other data frame handling functions:
add_dis_age()
,
add_factor()
,
add_factor_crossing()
,
adjusted_c_hat()
,
split()
Operations on formula terms and expressions
## S4 method for signature 'lgprhs,lgprhs' e1 + e2 ## S4 method for signature 'lgpterm,lgpterm' e1 + e2 ## S4 method for signature 'lgprhs,lgpterm' e1 + e2 ## S4 method for signature 'lgpterm,lgpterm' e1 * e2
## S4 method for signature 'lgprhs,lgprhs' e1 + e2 ## S4 method for signature 'lgpterm,lgpterm' e1 + e2 ## S4 method for signature 'lgprhs,lgpterm' e1 + e2 ## S4 method for signature 'lgpterm,lgpterm' e1 * e2
e1 |
The first sum, term or expression |
e2 |
The second sum, term or expression |
The behaviour and return type depend on the types of e1
and e2
.
You can
Data frames specified in arguments df
,
and df_err
must have a format where
The first column is the grouping factor (usually id).
The second column is the x-axis variable (usually age).
The third column is the coloring factor. If name of the third
column is NA
, coloring is not done.
A column named y
must contain the y-axis variable
(not for df_err
).
A column named lower
(upper
) must contain the lower
(upper) bound of error bar (only for df_err
).
The posterior draw using which the fit has been computed can be
specified with a factor named _draw_
(only for df
).
plot_api_c( df, df_err = NULL, alpha = 1, alpha_err = 0.2, no_err = FALSE, no_line = FALSE )
plot_api_c( df, df_err = NULL, alpha = 1, alpha_err = 0.2, no_err = FALSE, no_line = FALSE )
df |
a data frame |
df_err |
a data frame |
alpha |
line opacity |
alpha_err |
ribbon opacity |
no_err |
hide error bar even when it would normally be plotted? |
no_line |
hide line even when it would normally be plotted? |
A ggplot
object.
Other internal plot API functions:
plot_api_g()
Data frames specified in arguments df_data
,
df_signal
, df_fit
, and df_fit_err
must have a format
where
the first column is the grouping factor (usually id)
the second column is the x-axis variable (usually age)
a column named y
must contain the y-axis variable
(not for df_fit_err
)
a column named lower
(upper
) must contain the lower
(upper) bound of error bar (only for df_fit_err
)
a column named draw
must be a factor that
specifies the posterior draw using which the fit has been computed
(only for df_fit
)
plot_api_g( df_data, df_signal = NULL, df = NULL, df_err = NULL, teff_signal = NULL, teff_obs = NULL, i_test = NULL, color_signal = color_palette(2)[1], color = color_palette(2)[2], color_err = colorset("red", "light_highlight"), color_vlines = colorset("gray", "mid_highlight"), alpha = 1, alpha_err = 0.5, nrow = NULL, ncol = NULL, y_transform = function(x) x )
plot_api_g( df_data, df_signal = NULL, df = NULL, df_err = NULL, teff_signal = NULL, teff_obs = NULL, i_test = NULL, color_signal = color_palette(2)[1], color = color_palette(2)[2], color_err = colorset("red", "light_highlight"), color_vlines = colorset("gray", "mid_highlight"), alpha = 1, alpha_err = 0.5, nrow = NULL, ncol = NULL, y_transform = function(x) x )
df_data |
A data frame containing the observations. |
df_signal |
A data frame containing the true signal. Omitted if
|
df |
A data frame containing the model fit, or a list of data
frames. The list version can be used for example so that each list element
corresponds to the fit computed using one parameter draw. Omitted if
|
df_err |
A data frame containing error bars. Omitted if |
teff_signal |
A named vector containing true effect times used to
generate the signal. Omitted if |
teff_obs |
A named vector containing observed effect times. Omitted if
|
i_test |
Indices of test points. |
color_signal |
Line color for true signal. |
color |
Line color for model fit. |
color_err |
Color of the error ribbon. |
color_vlines |
Two line colors for vertical lines (true and obs. effect time). |
alpha |
Line opacity for model fit. |
alpha_err |
Opacity of the error ribbon. |
nrow |
number of rows, an argument for
|
ncol |
number of columns, an argument for
|
y_transform |
A function to be applied to the third column of
|
A ggplot
object.
Other internal plot API functions:
plot_api_c()
This calls plot_f
for all model components.
plot_components( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), color_by = NA, no_err = FALSE, ylim = NULL, draw = TRUE, nrow = NULL, ncol = NULL, gg_add = NULL, x = NULL, ... )
plot_components( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), color_by = NA, no_err = FALSE, ylim = NULL, draw = TRUE, nrow = NULL, ncol = NULL, gg_add = NULL, x = NULL, ... )
fit |
An object of class lgpfit. |
pred |
An object of class GaussianPrediction or
Prediction. If |
group_by |
name of the grouping variable (use |
t_name |
name of the x-axis variable |
MULT_STD |
a multiplier for standard deviation |
verbose |
Can this print any messages? |
draws |
Only has effect if |
reduce |
Only has effect if |
color_by |
Names of coloring factors. Can have length 1 or equal to
the number of components. See the |
no_err |
Should the error ribbons be skipped even though they
otherwise would be shown? Can have length 1 or equal to number of
components + 1. See the |
ylim |
a vector of length 2 (upper and lower y-axis limits), or NULL |
draw |
if this is TRUE, the plot grid is drawn using
|
nrow |
number of grid rows |
ncol |
number of grid columns |
gg_add |
additional ggplot obejct to add to each plot |
x |
Deprecated argument. This is now taken from the |
... |
additional arguments to |
a list of ggplot objects invisibly
Other main plot functions:
plot_draws()
,
plot_pred()
Vizualizing longitudinal data
plot_data( data, x_name = "age", y_name = "y", group_by = "id", facet_by = NULL, color_by = NULL, highlight = NULL, main = NULL, sub = NULL )
plot_data( data, x_name = "age", y_name = "y", group_by = "id", facet_by = NULL, color_by = NULL, highlight = NULL, main = NULL, sub = NULL )
data |
A data frame. |
x_name |
Name of x-axis variable. |
y_name |
Name of the y-axis variable. |
group_by |
Name of grouping variable (must be a factor). |
facet_by |
Name of the faceting variable (must be a factor). |
color_by |
Name of coloring variable (must be a factor). |
highlight |
Value of category of the |
main |
main plot title |
sub |
plot subtitle |
a ggplot
object
Visualize the distribution of parameter draws
plot_draws( fit, type = "intervals", regex_pars = c("alpha", "ell", "wrp", "sigma", "phi", "gamma"), ... ) plot_beta(fit, type = "dens", verbose = TRUE, ...) plot_warp( fit, num_points = 300, window_size = 48, color = colorset("red", "dark"), alpha = 0.5 ) plot_effect_times(fit, type = "areas", verbose = TRUE, ...)
plot_draws( fit, type = "intervals", regex_pars = c("alpha", "ell", "wrp", "sigma", "phi", "gamma"), ... ) plot_beta(fit, type = "dens", verbose = TRUE, ...) plot_warp( fit, num_points = 300, window_size = 48, color = colorset("red", "dark"), alpha = 0.5 ) plot_effect_times(fit, type = "areas", verbose = TRUE, ...)
fit |
an object of class lgpfit |
type |
plot type, allowed options are "intervals", "dens", "areas", and "trace" |
regex_pars |
regex for parameter names to plot |
... |
additional arguments for the |
verbose |
Can any output be printed? |
num_points |
number of plot points |
window_size |
width of time window |
color |
line color |
alpha |
line alpha |
a ggplot
object or list of them
plot_draws()
: visualizes the distribution of any set of
model parameters (defaults to kernel hyperparameters and possible
observation model parameters)
plot_beta()
: visualizes the distribution of the
individual-specific disease effect magnitude parameter draws
plot_warp()
: visualizes the input warping function for
different draws of the warping steepness parameter
plot_effect_times()
: visualizes the input warping function for
different parameter draws
Other main plot functions:
plot_components()
,
plot_pred()
Visualize input warping function with several steepness parameter values
plot_inputwarp(wrp, x, color = colorset("red", "dark"), alpha = 0.5)
plot_inputwarp(wrp, x, color = colorset("red", "dark"), alpha = 0.5)
wrp |
a vector of values of the warping steepness parameter |
x |
a vector of input values |
color |
line color |
alpha |
line alpha |
a ggplot
object
Plot the inverse gamma-distribution pdf
plot_invgamma( alpha, beta, by = 0.01, log = FALSE, IQR = 0.95, return_quantiles = FALSE, linecolor = colorset("red", "dark"), fillcolor = colorset("red", "mid") )
plot_invgamma( alpha, beta, by = 0.01, log = FALSE, IQR = 0.95, return_quantiles = FALSE, linecolor = colorset("red", "dark"), fillcolor = colorset("red", "mid") )
alpha |
positive real number |
beta |
positive real number |
by |
grid size |
log |
is log-scale used? |
IQR |
inter-quantile range width |
return_quantiles |
should this return a list |
linecolor |
line color |
fillcolor |
fill color |
a ggplot
object
Other functions related to the inverse-gamma distribution:
dinvgamma_stanlike()
,
priors
Function draws at data points can be visualized using
plot_pred
. If the pred
argument is NULL
, it
is computed using the pred
function with x=NULL
.
The total signal f
or any of its
additive components can be plotted using plot_f
.
plot_pred( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), x = NULL, ... ) plot_f( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), comp_idx = NULL, color_by = NA, x = NULL, ... )
plot_pred( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), x = NULL, ... ) plot_f( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), comp_idx = NULL, color_by = NA, x = NULL, ... )
fit |
An object of class lgpfit. |
pred |
An object of class GaussianPrediction or
Prediction. If |
group_by |
name of the grouping variable (use |
t_name |
name of the x-axis variable |
MULT_STD |
a multiplier for standard deviation |
verbose |
Can this print any messages? |
draws |
Only has effect if |
reduce |
Only has effect if |
x |
Deprecated argument. This is now taken from the |
... |
additional arguments to |
comp_idx |
Index of component to plot. The total sum is plotted
if this is |
color_by |
name of coloring factor |
a ggplot
object
Other main plot functions:
plot_components()
,
plot_draws()
Visualize an lgpsim object (simulated data)
plot_sim( simdata, group_by = "id", x_name = "age", h_name = "h", y_name = "y", comp_idx = NULL, color_by = NA, verbose = TRUE, ... )
plot_sim( simdata, group_by = "id", x_name = "age", h_name = "h", y_name = "y", comp_idx = NULL, color_by = NA, verbose = TRUE, ... )
simdata |
an object of class lgpsim |
group_by |
grouping factor |
x_name |
name of x-axis variable |
h_name |
name of the signal in |
y_name |
name of response variable |
comp_idx |
Possible index of a component to be shown. If this is NULL, the data and total signal are shown. |
color_by |
coloring factor |
verbose |
should some information be printed? |
... |
additional arguments to |
a ggplot
object
Graphical posterior predictive checks
ppc(fit, data = NULL, fun = default_ppc_fun(fit), verbose = TRUE, ...)
ppc(fit, data = NULL, fun = default_ppc_fun(fit), verbose = TRUE, ...)
fit |
An object of class lgpfit that can been created
with |
data |
the original data frame (deprecated argument with no effect, now obtained from fit object) |
fun |
|
verbose |
Can this print any messages? |
... |
additional arguments passed to the default
|
a ggplot
object
Introduction to graphical posterior predictive checks:
here.
Prior predictive check can be done by calling
prior_pred
and then bayesplot::pp_check()
.
If fit
is for a model that marginalizes the latent
signal f
(i.e. is_f_sampled(fit)
is FALSE
), this
computes the analytic conditional posterior
distributions of each model component, their sum, and the conditional
predictive distribution. All these are computed for
each (hyper)parameter draw (defined by draws
), or other parameter
set (obtained by a reduction defined by reduce
). Results are stored
in a GaussianPrediction object which is then returned.
If fit
is for a model that samples the latent
signal f
(i.e. is_f_sampled(fit)
is TRUE
), this will
extract these function samples, compute their sum, and a version of the
sum f
that is transformed through the inverse link function.
If x
is not NULL
, the function draws are extrapolated
to the points specified by x
using kernel regression.
Results are stored in a Prediction
object which is then returned.
pred( fit, x = NULL, reduce = function(x) base::mean(x), draws = NULL, verbose = TRUE, STREAM = get_stream(), c_hat_pred = NULL, force = FALSE, debug_kc = FALSE )
pred( fit, x = NULL, reduce = function(x) base::mean(x), draws = NULL, verbose = TRUE, STREAM = get_stream(), c_hat_pred = NULL, force = FALSE, debug_kc = FALSE )
fit |
An object of class lgpfit. |
x |
A data frame of points where function posterior distributions
and predictions should be computed or sampled.
The function |
reduce |
Reduction for parameters draws. Can be a function that
is applied to reduce all parameter draws into one parameter set, or
|
draws |
Indices of parameter draws to use, or |
verbose |
Should more information and a possible progress bar be printed? |
STREAM |
External pointer. By default obtained with
|
c_hat_pred |
This is only used if the latent signal |
force |
This is by default |
debug_kc |
If this is |
An object of class GaussianPrediction or Prediction.
Other main functions:
create_model()
,
draw_pred()
,
get_draws()
,
lgp()
,
prior_pred()
,
sample_model()
An S4 class to represent prior or posterior draws from an additive function distribution.
## S4 method for signature 'Prediction' show(object) ## S4 method for signature 'Prediction' component_names(object) ## S4 method for signature 'Prediction' num_components(object) ## S4 method for signature 'Prediction' num_paramsets(object) ## S4 method for signature 'Prediction' num_evalpoints(object)
## S4 method for signature 'Prediction' show(object) ## S4 method for signature 'Prediction' component_names(object) ## S4 method for signature 'Prediction' num_components(object) ## S4 method for signature 'Prediction' num_paramsets(object) ## S4 method for signature 'Prediction' num_evalpoints(object)
object |
Prediction object for which to apply a class method. |
show(Prediction)
: Print a summary about the object.
component_names(Prediction)
: Get names of components.
num_components(Prediction)
: Get number of components.
num_paramsets(Prediction)
: Get number of parameter combinations
(different parameter vectors) using which predictions were computed.
num_evalpoints(Prediction)
: Get number of points where
predictions were computed.
f_comp
component draws
f
signal draws
h
predictions (signal draws + scaling factor c_hat
,
transformed through inverse link function)
x
a data frame of points (covariate values) where the functions/predictions have been evaluated/sampled
extrapolated
Boolean value telling if the function draws are original MCMC draws or if they have been created by extrapolating such draws.
These functions take an lgpmodel object, and
prior_pred
samples from the prior predictive distribution of
the model
sample_param_prior
samples only its parameter prior using
sampling
prior_pred( model, verbose = TRUE, quiet = FALSE, refresh = 0, STREAM = get_stream(), ... ) sample_param_prior(model, verbose = TRUE, quiet = FALSE, ...)
prior_pred( model, verbose = TRUE, quiet = FALSE, refresh = 0, STREAM = get_stream(), ... ) sample_param_prior(model, verbose = TRUE, quiet = FALSE, ...)
model |
An object of class lgpmodel. |
verbose |
Should more information and a possible progress bar be printed? |
quiet |
This forces |
refresh |
Argument for |
STREAM |
External pointer. By default obtained with
|
... |
Additional arguments for |
prior_pred
returns a list with components
y_draws
: A matrix containing the prior predictive draws
as rows. Can be passed to bayesplot::pp_check()
for
graphical prior predictive checking.
pred_draws
: an object of class Prediction,
containing prior draws of each model component and their sum
param_draws
: a stanfit
object of prior parameter
draws (obtained by calling sample_param_prior
internally)
sample_param_prior
returns
an object of class stanfit
Other main functions:
create_model()
,
draw_pred()
,
get_draws()
,
lgp()
,
pred()
,
sample_model()
Convert given prior to numeric format
prior_to_num(desc)
prior_to_num(desc)
desc |
Prior description as a named list, containing fields
Other list fields are interpreted as hyperparameters. |
a named list of parsed options
These use the same parametrizations as defined in the 'Stan' documentation. See the docs for gamma and inverse gamma distributions.
uniform(square = FALSE) normal(mu, sigma, square = FALSE) student_t(nu, square = FALSE) gam(shape, inv_scale, square = FALSE) igam(shape, scale, square = FALSE) log_normal(mu, sigma, square = FALSE) bet(a, b)
uniform(square = FALSE) normal(mu, sigma, square = FALSE) student_t(nu, square = FALSE) gam(shape, inv_scale, square = FALSE) igam(shape, scale, square = FALSE) log_normal(mu, sigma, square = FALSE) bet(a, b)
square |
is prior for a square-transformed parameter? |
mu |
mean |
sigma |
standard deviation |
nu |
degrees of freedom |
shape |
shape parameter (alpha) |
inv_scale |
inverse scale parameter (beta) |
scale |
scale parameter (beta) |
a |
shape parameter |
b |
shape parameter |
a named list
Other functions related to the inverse-gamma distribution:
dinvgamma_stanlike()
,
plot_invgamma()
# Log-normal prior log_normal(mu = 1, sigma = 1) # Cauchy prior student_t(nu = 1) # Exponential prior with rate = 0.1 gam(shape = 1, inv_scale = 0.1) # Create a similar priors as in LonGP (Cheng et al., 2019) # Not recommended, because a lengthscale close to 0 is possible. a <- log(1) - log(0.1) log_normal(mu = 0, sigma = a / 2) # for continuous lengthscale student_t(nu = 4) # for interaction lengthscale igam(shape = 0.5, scale = 0.005, square = TRUE) # for sigma
# Log-normal prior log_normal(mu = 1, sigma = 1) # Cauchy prior student_t(nu = 1) # Exponential prior with rate = 0.1 gam(shape = 1, inv_scale = 0.1) # Create a similar priors as in LonGP (Cheng et al., 2019) # Not recommended, because a lengthscale close to 0 is possible. a <- log(1) - log(0.1) log_normal(mu = 0, sigma = a / 2) # for continuous lengthscale student_t(nu = 4) # for interaction lengthscale igam(shape = 0.5, scale = 0.005, square = TRUE) # for sigma
Function for reading the built-in proteomics data
read_proteomics_data(parentDir = NULL, protein = NULL, verbose = TRUE)
read_proteomics_data(parentDir = NULL, protein = NULL, verbose = TRUE)
parentDir |
Path to local parent directory for the data.
If this is |
protein |
Index or name of protein. |
verbose |
Can this print some output? |
a data.frame
Assess component relevances
relevances(fit, reduce = function(x) base::mean(x), verbose = TRUE, ...)
relevances(fit, reduce = function(x) base::mean(x), verbose = TRUE, ...)
fit |
an object of class |
reduce |
a function to apply to reduce the relevances given each parameter draw into one value |
verbose |
Can this print any messages? |
... |
currently has no effect |
a named vector with length equal to num_comps + 1
S4 generics for lgpfit, lgpmodel, and other objects
parameter_info(object, digits) component_info(object) covariate_info(object) component_names(object) get_model(object) is_f_sampled(object) get_stanfit(object) postproc(object, ...) contains_postproc(object) clear_postproc(object) num_paramsets(object) num_evalpoints(object) num_components(object)
parameter_info(object, digits) component_info(object) covariate_info(object) component_names(object) get_model(object) is_f_sampled(object) get_stanfit(object) postproc(object, ...) contains_postproc(object) clear_postproc(object) num_paramsets(object) num_evalpoints(object) num_components(object)
object |
object for which to apply the generic |
digits |
number of digits to show |
... |
additional optional arguments to pass |
parameter_info
returns a data frame with
one row for each parameter and columns
for parameter name, parameter bounds, and the assigned prior
component_info
returns a data frame with one row for
each model component, and columns encoding information about
model components
covariate_info
returns a list with names
continuous
and categorical
, with information about
both continuous and categorical covariates
component_names
returns a character vector with
component names
is_f_sampled
returns a logical value
get_stanfit
returns a stanfit
(rstan)
postproc
applies postprocessing and returns an
updated lgpfit
clear_postproc
removes postprocessing information and
returns an updated lgpfit
num_paramsets
, num_evalpoints
and
num_components
return an integer
parameter_info()
: Get parameter information (priors etc.).
component_info()
: Get component information.
covariate_info()
: Get covariate information.
component_names()
: Get component names.
get_model()
: Get lgpmodel object.
is_f_sampled()
: Determine if signal f is sampled or marginalized.
get_stanfit()
: Extract stanfit object.
postproc()
: Perform postprocessing.
contains_postproc()
: Determine if object contains postprocessing
information.
clear_postproc()
: Clear postprocessing information (to reduce
size of object).
num_paramsets()
: Get number of parameter sets.
num_evalpoints()
: Get number of points where posterior is evaluated.
num_components()
: Get number of model components.
To find out which methods have been implemented for which classes, see lgpfit, lgpmodel, Prediction and GaussianPrediction.
sample_model
takes an lgpmodel
object and fits it using sampling
.
optimize_model
takes an lgpmodel
object and fits it using optimizing
.
sample_model( model, verbose = TRUE, quiet = FALSE, skip_postproc = is_f_sampled(model), ... ) optimize_model(model, ...)
sample_model( model, verbose = TRUE, quiet = FALSE, skip_postproc = is_f_sampled(model), ... ) optimize_model(model, ...)
model |
An object of class lgpmodel. |
verbose |
Can messages be printed? |
quiet |
Should all output messages be suppressed? You need to set
also |
skip_postproc |
Should all postprocessing be skipped? If this is
|
... |
Optional arguments passed to
|
sample_model
returns an object of class lgpfit
containing the parameter draws, the original model
object,
and possible postprocessing results. See documentation of
lgpfit for more information.
optimize_model
directly returns the list returned by
optimizing
. See its documentation for more information.
Other main functions:
create_model()
,
draw_pred()
,
get_draws()
,
lgp()
,
pred()
,
prior_pred()
select
performs strict selection, returning either TRUE
or FALSE
for each component.
select.integrate
is like select
, but instead of
a fixed threshold, computes probabilistic selection by integrating over
a threshold density.
select_freq
performs the selection separately using
each parameter draw and returns the frequency at which each
component was selected.
select_freq.integrate
is like select_freq
, but
instead of a fixed threshold, computes probabilistic selection
frequencies by integrating over a threshold density.
select(fit, reduce = function(x) base::mean(x), threshold = 0.95, ...) select_freq(fit, threshold = 0.95, ...) select.integrate( fit, reduce = function(x) base::mean(x), p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ... ) select_freq.integrate( fit, p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ... )
select(fit, reduce = function(x) base::mean(x), threshold = 0.95, ...) select_freq(fit, threshold = 0.95, ...) select.integrate( fit, reduce = function(x) base::mean(x), p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ... ) select_freq.integrate( fit, p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ... )
fit |
An object of class |
reduce |
The |
threshold |
Threshold for relevance sum. Must be a value between 0 and 1. |
... |
Additional arguments to |
p |
A threshold density over interval [0,1]. |
h |
A discretization parameter for computing a quadrature. |
verbose |
Should this show a progress bar? |
See description.
Printing formula object info using the show generic
## S4 method for signature 'lgpformula' show(object) ## S4 method for signature 'lgprhs' show(object) ## S4 method for signature 'lgpterm' show(object)
## S4 method for signature 'lgpformula' show(object) ## S4 method for signature 'lgprhs' show(object) ## S4 method for signature 'lgpterm' show(object)
object |
an object of some S4 class |
the object invisibly
Simulate latent function components for longitudinal data analysis
sim.create_f( X, covariates, relevances, lengthscales, X_affected, dis_fun, bin_kernel, steepness, vm_params, force_zeromean )
sim.create_f( X, covariates, relevances, lengthscales, X_affected, dis_fun, bin_kernel, steepness, vm_params, force_zeromean )
X |
input data matrix (generated by |
covariates |
Integer vector that defines the types of covariates (other than id and age). Different integers correspond to the following covariate types:
|
relevances |
Relative relevance of each component. Must have be a vector
so that |
lengthscales |
A vector so that |
X_affected |
which individuals are affected by the disease |
dis_fun |
A function or a string that defines the disease effect. If
this is a function, that function is used to generate the effect.
If |
bin_kernel |
Should the binary kernel be used for categorical
covariates? If this is |
steepness |
Steepness of the input warping function. This is only used if the disease component is in the model. |
vm_params |
Parameters of the variance mask function. This is only
needed if |
force_zeromean |
Should each component (excluding the disease age component) be forced to have a zero mean? |
a data frame FFF where one column corresponds to one additive component
Create an input data frame X for simulated data
sim.create_x( N, covariates, names, n_categs, t_data, t_jitter, t_effect_range, continuous_info )
sim.create_x( N, covariates, names, n_categs, t_data, t_jitter, t_effect_range, continuous_info )
N |
Number of individuals. |
covariates |
Integer vector that defines the types of covariates (other than id and age). If not given, only the id and age covariates are created. Different integers correspond to the following covariate types:
|
names |
Covariate names. |
n_categs |
An integer vector defining the number of categories
for each categorical covariate, so that |
t_data |
Measurement times (same for each individual, unless
|
t_jitter |
Standard deviation of the jitter added to the given measurement times. |
t_effect_range |
Time interval from which the disease effect times are sampled uniformly. Alternatively, This can any function that returns the (possibly randomly generated) real disease effect time for one individual. |
continuous_info |
Info for generating continuous covariates. Must be a
list containing fields
|
a list
Simulate noisy observations
sim.create_y(noise_type, f, snr, phi, gamma, N_trials)
sim.create_y(noise_type, f, snr, phi, gamma, N_trials)
noise_type |
Either "gaussian", "poisson", "nb" (negative binomial), "binomial", or "bb" (beta-binomial). |
f |
The underlying signal. |
snr |
The desired signal-to-noise ratio. This argument is valid
only when |
phi |
The inverse overdispersion parameter for negative binomial data.
The variance is |
gamma |
The dispersion parameter for beta-binomial data. |
N_trials |
The number of trials parameter for binomial data. |
A list out
, where
out$h
is f
mapped through an inverse link function
(times N_trials
if noise_type
is binomial or beta-binomial)
out$y
is the noisy response variable.
Compute all kernel matrices when simulating data
sim.kernels( X, types, lengthscales, X_affected, bin_kernel, useMaskedVarianceKernel, steepness, vm_params )
sim.kernels( X, types, lengthscales, X_affected, bin_kernel, useMaskedVarianceKernel, steepness, vm_params )
X |
covariates |
types |
vector of covariate types, so that
|
lengthscales |
vector of lengthscales |
X_affected |
which individuals are affected by the disease |
bin_kernel |
whether or not binary (mask) kernel should be used for categorical covariates (if not, the zerosum kernel is used) |
useMaskedVarianceKernel |
should the masked variance kernel be used for drawing the disease component |
steepness |
steepness of the input warping function |
vm_params |
parameters of the variance mask function |
a 3D array
Generate an artificial longitudinal data set.
simulate_data( N, t_data, covariates = c(), names = NULL, relevances = c(1, 1, rep(1, length(covariates))), n_categs = rep(2, sum(covariates %in% c(2, 3))), t_jitter = 0, lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))), f_var = 1, noise_type = "gaussian", snr = 3, phi = 1, gamma = 0.2, N_affected = round(N/2), t_effect_range = "auto", t_observed = "after_0", c_hat = 0, dis_fun = "gp_warp_vm", bin_kernel = FALSE, steepness = 0.5, vm_params = c(0.025, 1), continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)), N_trials = 1, force_zeromean = TRUE )
simulate_data( N, t_data, covariates = c(), names = NULL, relevances = c(1, 1, rep(1, length(covariates))), n_categs = rep(2, sum(covariates %in% c(2, 3))), t_jitter = 0, lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))), f_var = 1, noise_type = "gaussian", snr = 3, phi = 1, gamma = 0.2, N_affected = round(N/2), t_effect_range = "auto", t_observed = "after_0", c_hat = 0, dis_fun = "gp_warp_vm", bin_kernel = FALSE, steepness = 0.5, vm_params = c(0.025, 1), continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)), N_trials = 1, force_zeromean = TRUE )
N |
Number of individuals. |
t_data |
Measurement times (same for each individual, unless
|
covariates |
Integer vector that defines the types of covariates (other than id and age). If not given, only the id and age covariates are created. Different integers correspond to the following covariate types:
|
names |
Covariate names. |
relevances |
Relative relevance of each component. Must have be a vector
so that |
n_categs |
An integer vector defining the number of categories
for each categorical covariate, so that |
t_jitter |
Standard deviation of the jitter added to the given measurement times. |
lengthscales |
A vector so that |
f_var |
variance of f |
noise_type |
Either "gaussian", "poisson", "nb" (negative binomial), "binomial", or "bb" (beta-binomial). |
snr |
The desired signal-to-noise ratio. This argument is valid
only when |
phi |
The inverse overdispersion parameter for negative binomial data.
The variance is |
gamma |
The dispersion parameter for beta-binomial data. |
N_affected |
Number of diseased individuals that are affected by the
disease. This defaults to the number of diseased individuals. This argument
can only be given if |
t_effect_range |
Time interval from which the disease effect times are sampled uniformly. Alternatively, This can any function that returns the (possibly randomly generated) real disease effect time for one individual. |
t_observed |
Determines how the disease effect time is observed. This
can be any function that takes the real disease effect time as an argument
and returns the (possibly randomly generated) observed onset/initiation time.
Alternatively, this can be a string of the form |
c_hat |
a constant added to f |
dis_fun |
A function or a string that defines the disease effect. If
this is a function, that function is used to generate the effect.
If |
bin_kernel |
Should the binary kernel be used for categorical
covariates? If this is |
steepness |
Steepness of the input warping function. This is only used if the disease component is in the model. |
vm_params |
Parameters of the variance mask function. This is only
needed if |
continuous_info |
Info for generating continuous covariates. Must be a
list containing fields
|
N_trials |
The number of trials parameter for binomial data. |
force_zeromean |
Should each component (excluding the disease age component) be forced to have a zero mean? |
An object of class lgpsim.
# Generate Gaussian data dat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3) # Generate negative binomially (NB) distributed count data dat <- simulate_data( N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb", phi = 2 )
# Generate Gaussian data dat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3) # Generate negative binomially (NB) distributed count data dat <- simulate_data( N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb", phi = 2 )
split_by_factor
splits according to given factor
split_within_factor
splits according to given
data point indices within the same level of a factor
split_within_factor_random
selects k points
from each level of a factor uniformly at random as test data
split_random
splits uniformly at random
split_data
splits according to given data rows
split_by_factor(data, test, var_name = "id") split_within_factor(data, idx_test, var_name = "id") split_within_factor_random(data, k_test = 1, var_name = "id") split_random(data, p_test = 0.2, n_test = NULL) split_data(data, i_test, sort_ids = TRUE)
split_by_factor(data, test, var_name = "id") split_within_factor(data, idx_test, var_name = "id") split_within_factor_random(data, k_test = 1, var_name = "id") split_random(data, p_test = 0.2, n_test = NULL) split_data(data, i_test, sort_ids = TRUE)
data |
a data frame |
test |
the levels of the factor that will be used as test data |
var_name |
name of a factor in the data |
idx_test |
indices point indices with the factor |
k_test |
desired number of test data points per each level of the factor |
p_test |
desired proportion of test data |
n_test |
desired number of test data points (if NULL, |
i_test |
test data row indices |
sort_ids |
should the test indices be sorted into increasing order |
a named list with names train
, test
, i_train
and i_test
Other data frame handling functions:
add_dis_age()
,
add_factor()
,
add_factor_crossing()
,
adjusted_c_hat()
,
new_x()
A very small artificial test data, used mostly for unit tests
testdata_001
testdata_001
A data frame with 24 rows and 6 variables:
individual id, a factor with levels: 1, 2, 3, 4
age
disease-related age
a continuous variable
a factor with 2 levels: Male, Female
a continuous variable
Other built-in datasets:
testdata_002
Medium-size artificial test data, used mostly for tutorials
testdata_002
testdata_002
A data frame with 96 rows and 6 variables:
individual id, a factor with levels: 01-12
age
disease-related age
a factor with 2 levels: Male, Female
a factor with 2 levels: Case, Control
a continuous variable
Other built-in datasets:
testdata_001
Validate S4 class objects
validate_lgpexpr(object) validate_lgpformula(object) validate_lgpscaling(object) validate_lgpfit(object) validate_GaussianPrediction(object) validate_Prediction(object)
validate_lgpexpr(object) validate_lgpformula(object) validate_lgpscaling(object) validate_lgpfit(object) validate_GaussianPrediction(object) validate_Prediction(object)
object |
an object to validate |
TRUE
if valid, otherwise reasons for invalidity
Variance masking function
var_mask(x, stp)
var_mask(x, stp)
x |
a vector of length |
stp |
a positive real number (steepness of mask function) |
a vector of length
Other kernel utility functions:
warp_input()
Input warping function
warp_input(x, a)
warp_input(x, a)
x |
a vector of length |
a |
steepness of the warping function rise |
a vector of warped inputs , length
Other kernel utility functions:
var_mask()