ijsmr

ijsmr logo-pdf 1349088093

The Methodology of Human Diseases Risk Prediction Tools
Pages 239-248
H. Mannan, R. Ahmed, M. Sanagou, S. Ivory and R. Wolfe
DOI:
http://dx.doi.org/10.6000/1929-6029.2013.02.03.9
Published: 31 July 2013


Abstract: Disease risk prediction tools are used for population screening and to guide clinical care. They identify which individuals have particularly elevated risk of disease. The development of a new risk prediction tool involves several methodological components including: selection of a general modelling framework and specific functional form for the new tool, making decisions about the inclusion of risk factors, dealing with missing data in those risk factors, and performing validation checks of a new tool’s performance. There have been many methodological developments of relevance to these issues in recent years. Developments of importance for disease detection in humans were reviewed and their uptake in risk prediction tool development illustrated. This review leads to guidance on appropriate methodology for future risk prediction development activities.

Keywords: Disease risk prediction, missing data, model validation, model updating, model utility.
Download Full Article

ijsmr logo-pdf 1349088093

Enriched-Data Problems and Essential Non-Identifiability
Pages 16-44
Geert Molenberghs, Edmund Njeru Njagi, Michael G. Kenward and Geert Verbeke
DOI:
http://dx.doi.org/10.6000/1929-6029.2012.01.01.02
Published: 24 September 2012


Abstract: There are two principal ways in which statistical models extend beyond the data available. First, the data may be coarsened, that is, what is actually observed is less detailed than what is planned, owing to, for example, attrition, censoring, grouping, or a combination of these. Second, the data may be augmented, that is, the observed data are hypothetically but conveniently supplemented with structures such as random effects, latent variables, latent classes, or component membership in mixture distributions. These two settings together will be referred to as enriched data. Reasons for modelling enriched data include the incorporation of substantive information, such as the need for predictions, advantages in interpretation, and mathematical and computational convenience. The fitting of models for enriched data combine evidence arising from empirical data with non-verifiable model components, i.e., that are purely assumption driven. This has important implications for the interpretation of statistical analyses in such settings. While widely known, the exploration and discussion of these issues is somewhat scattered. The user should be fully aware of the potential dangers and pitfalls that follows from this. Therefore, we provide a unified framework for enriched data and show in general that to any given model an entire class of models can be assigned, with all of its members producing the same fit to the observed data but arbitrary regarding the unobservable parts of the enriched data. The implications of this are explored for several specific settings, namely that of latent classes, finite mixtures, factor analysis, random-effects models, and incomplete data. The results are applied to a range of relevant examples.

Keywords: Compound-symmetry, empirical bayes, enriched data, exponential random effects, gamma random effects, linear mixed model, missing at random, missing completely at random, non-future dependence, pattern-mixture model, selection model, shared-parameter model..
Download Full Article

ijsmr logo-pdf 1349088093

Bayesian Analysis of Transition Model for Longitudinal Ordinal Response Data: Application to Insomnia Data
Pages 148-161
S. Noorian and M. Ganjali
DOI:
http://dx.doi.org/10.6000/1929-6029.2012.01.02.08
Published: 21 December 2012


Abstract: In this paper, we present a Bayesian framework for analyzing longitudinal ordinal response data. In analyzing longitudinal data, the possibility of correlations between responses given by the same individual needs to be taken into account. Various models can be used to handle such correlations such as marginal modeling, random effect modeling and transition (Markov) modeling. Here a transition modeling is used and a Bayesian approach is presented for analyzing longitudinal data. A cumulative logistic regression model and the Bayesian method, using MCMC, are implemented for obtaining the parameters estimates. Our approach is applied on a two-period longitudinal Insomnia data where the Bayesian estimate for measure of association, , between the initial and follow-up ordinal responses is obtained in each level of a treatment variable. Then, the sensitivity of posterior summaries to changes of prior hyperparameters is investigated. We also use Bayes factor criterion for testing some important hypotheses.

Keywords: Bayesian Analysis, Bayes Factor, Conditional Predictive Ordinate, Logistic Regression, Markov Model.
Download Full Article

ijsmr logo-pdf 1349088093

Power Calculations for Two-Wave, Change from Baseline to Follow-Up Study Designs
Pages 45-50
M. Colin Ard and Steven D. Edland
DOI:
http://dx.doi.org/10.6000/1929-6029.2012.01.01.03
Published: 24 September 2012


Abstract: Change in a quantitative trait is commonly employed as an endpoint in two-wave longitudinal studies. For example, early phase clinical trials often use two-wave designs with biomarker endpoints to confirm that a treatment affects the putative target treatment pathway before proceeding to larger scale clinical efficacy trials. Power calculations for such designs are straightforward if pilot data from longitudinal investigations of similar duration to the proposed study are available. Often longitudinal pilot data of similar duration are not available, and simplifying assumptions are used to calculate sample size from cross-sectional data, one standard approach being to use a formula based on variance estimated from cross sectional data and correlation estimates abstracted from the literature or inferred from experience with similar endpoints. An implicit assumption of this standard approach is that the variance of the quantitative trait is the same at baseline and follow-up. In practice, this assumption rarely holds, and sample size estimates by this standard formula can be dramatically anti-conservative. Even when longitudinal pilot data for estimating parameters required in sample size calculations are available, sample size calculations will be biased if the interval from baseline to follow-up is not of similar duration to that proposed for the study being designed. In this paper we characterize the magnitude of bias in sample size estimates when formula assumptions do not hold and derive alternative conservative formulas for sample size required to achieve nominal power.

Keywords: Sample Size, Phase 2, Phase II, Clinical Trial, Rate of Change, Compound Symmetr.
Download Full Article

ijsmr logo-pdf 1349088093

Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots
Pages 162-173
Niccolò Bassani, Federico Ambrogi, Danila Coradini, Patrizia Boracchi and Elia Biganzoli
DOI:
http://dx.doi.org/10.6000/1929-6029.2012.01.02.09
Published: 21 December 2012


Abstract: High-throughput genomic assays are used in molecular biology to explore patterns of joint expression of thousands of genes.

These methodologies had relevant developments in the last decade, and concurrently there was a need for appropriate methods for analyzing the massive data generated.

Identifying sets of genes and samples characterized by similar values of expression and validating these results are two critical issues related to these investigations because of their clinical implication. From a statistical perspective, unsupervised class discovery methods like Cluster Analysis are generally adopted.

However, the use of Cluster Analysis mainly relies on the use of hierarchical techniques without considering possible use of other methods. This is partially due to software availability and to easiness of representation of results through a heatmap, which allows to simultaneously visualize clusterization of genes and samples on the same graphical device. One drawback of this strategy is that clusters’ stability is often neglected, thus leading to over-interpretation of results.

Moreover, validation of results using external datasets is still subject of discussion, since it is well known that batch effects may condition gene expression results even after normalization.

In this paper we compared several clustering algorithms (hierarchical, k-means, model-based, Affinity Propagation) and stability indices to discover common patterns of expression and to assess clustering reliability, and propose a rank-based passive projection of Principal Components for validation purposes.

Results from a study involving 23 tumor cell lines and 76 genes related to a specific biological pathway and derived from a publicly available dataset, are presented.

Keywords: Microarrays, cluster stability, multivariate visualization, Principal Components Analysis, cell polarity.
Download Full Article