\section{predict.coxph} This function produces various types of predicted values from a Cox model. The arguments are \begin{description} \item [obejct] The result of a call to [[coxph]]. \item [newdata] Optionally, a new data set for which prediction is desired. If this is absent predictions are for the observations used fit the model. \item[type] The type of prediction \begin{itemize} \item lp = the linear predictor for each observation \item risk = the risk score $exp(lp)$ for each observation \item expected = the expected number of events \item terms = a matrix with one row per subject and one column for each term in the model. \end{itemize} \item[se.fit] Whether or not to return standard errors of the predictions. \item[na.action] What to do with missing values \emph{if} there is new data. \item[term] The terms that are desired. This option is almost never used, so rarely in fact that it's hard to justify keeping it. \item[collapse] An optional vector of subject identifiers, over which to sum or `collapse' the results \item[\ldots] All predict methods need to have a \ldots argument; we make no use of it however. \end{description} <>= predict.coxph <- function(object, newdata, type=c("lp", "risk", "expected", "terms"), se.fit=FALSE, na.action=na.pass, terms=names(object$assign), collapse, ...) { type <-match.arg(type) n <- object$n <> <> <> <> } @ For all of the options what is returned is a \emph{relative} prediction which compares each observation to the average for the data set. Partly this is practical. Say for instance that a treatment covariate was coded as 0=control and 1=treatment. If the model were refit using a new coding of 3=control 4=treatment, the results of the Cox model would be exactly the same with respect to coefficients, variance, tests, etc. The raw linear predictor $X\beta$ however would change, increasing by a value of $3\beta$. The relative predictor $$ \eta_i = X_i\beta - (1/n)\sum_j X_j\beta$$ will stay the same. The second reason for doing this is that the Cox model is a relative risks model rather than an absolute risks model, and thus relative predictions are almost certainly what the user was thinking of. When the fit was for a stratified Cox model, results need to be normalized \emph{by stratum}. This reasons are again both practical and statistical. If the normalization is global, then the results of predict will change under variable recoding for any model with a strata by covariate interaction. Statistically, the relative risks are only defined within strata so a ``global'' relative risk is undefined. %''`` Unfortunately, this important issue was not realized until Oct 2009 when a puzzling query was sent to the author. (What does SAS do?) If there is a new data set, the normalization constants are the The first task of the routine is to reconsruct necessary data elements that were not saved as a part of the [[coxph]] fit. For reasons of numerical stability the mean of each covariate was subtracted from the data during the fit, and those means are retained as a part of the [[coxph]] object; however, this is not sufficient if the model was stratified. We will need the following components: \begin{itemize} \item the new data matrix, if any \item for type=`expected' residuals we need baseline hazard for the original model (one per strata). If there is no new data this can be obtained indirectly from the residuals and the original y. If there is new data a call to survfit is required. \item for any stratified model, the X matrix and strata from the original model \item for any call asking for standard errors, the X matrix (either new or old) \item if the model has an offset, we'll need it whenever we need X. \end{itemize} First, pull in the orignal model via it's [[terms]] decomposition and find out if there was a strata in the original model. Drop any cluster terms from the model as we don't want or need those.