Course Catalog

Applied Computational Modeling

As data has become more readily available, companies and institutions desire to harness this information for predictive purposes. Chemical structure-activity modeling, for example, seeks to uncover relationship between structural predictors and compound activity.

While many computational modeling methods are readily available through a variety of commercial and freely available software, practitioners often have limited knowledge and experience with a majority of the models. This lack of knowledge often leads to practitioners’ reliance on one, or just a few, computational models, which may or may not be the best for the data at hand.

The purpose of this course is to provide a foundational explanation of general model building strategies and an explanation of traditional and modern regression and classification models. After completing this course, the participant will be able to appropriately pre-process data, and build, tune, and validate more than a dozen different kinds of classical and modern regression and classification techniques. The participant will also be able to implement these techniques in R, a freely available analysis software. In addition, the attendee will understand how to avoid common model building pitfalls.

Course Details

Key Topics

  • General strategies for building computational models, including data pre-processing and model tuning.
  • Example-based explanations of modern regression models
  • Example-based explanations of modern classification models
  • Hands-on computational exercises in R
  • Other computational modeling considerations such as variable selection, effect of noise on detecting predictive signal, and categorization of continuous variables.

Information

As data has become more readily available, companies and institutions desire to harness this information for predictive purposes. Chemical structure-activity modeling, for example, seeks to uncover relationship between structural predictors and compound activity.

While many computational modeling methods are readily available through a variety of commercial and freely available software, practitioners often have limited knowledge and experience with a majority of the models. This lack of knowledge often leads to practitioners’ reliance on one, or just a few, computational models, which may or may not be the best for the data at hand.

The purpose of this course is to provide a foundational explanation of general model building strategies and an explanation of traditional and modern regression and classification models. After completing this course, the participant will be able to appropriately pre-process data, and build, tune, and validate more than a dozen different kinds of classical and modern regression and classification techniques. The participant will also be able to implement these techniques in R, a freely available analysis software. In addition, the attendee will understand how to avoid common model building pitfalls.

The course is based on the soon-to-be-released book by the instructors, Applied Predictive Modeling, Springer; 2013 edition (May 31, 2013).

Who Should Attend

This course is intended for a broad audience as both an introduction to computational predictive models as well as a guide to applying them. Anyone who builds computational or predictive models (such as computational chemists) will benefit from this course. Fundamental model building principles, as well as the modeling techniques are presented in ways to invoke participants’ intuition, with mathematical equations only presented when necessary. While the course includes hands-on exercises in R (a freely available statistical software package), participants are not required to have any prior experience with this software.

Benefits

Upon completion of this course, the participant should be able to

  • appropriately pre-process data,
  • build, tune, and validate more than a dozen different kinds of classical and modern regression and classification techniques, and
  • avoid several common model building pitfalls.

Applying the concepts presented in this training will yield a better understanding of the predictive ability of the underlying data as well as a higher degree of confidence in the model predictions. Ultimately, the principles taught in this course will help to reduce false-positive and false-negative findings, thus producing a higher degree of confidence in model predictions and enabling better decision making.

Agenda

  • Introduction, terminology, and overview of course examples
  • General model building strategies
    • Methods for spending data and model validation
  • Data pre-processing
    • Why pre-process?
    • The need for centering, scaling, and methods for dimension reduction
    • Hands-on in R
  • Regression methods
    • Measures of regression model performance
      • Coefficient of determination (R2)
      • Root mean squared error (RMSE)
    • Models
      • Multiple linear regression
      • Partial least squares
      • Neural networks
      • Multivariate adaptive regression splines
      • Support vector machines
      • Regression trees
      • Ensembles of trees (bagging, boosting, and random forests)
    • Hands-on in R
    • Empirical comparison of regression models’ performance
  • Advice on selecting the best model
  • Classification methods
    • Measures of classification model performance
      • Accuracy
      • Kappa
      • ROC curves; sensitivity and specificity
      • Softmax
    • Models
      • Linear, quadratic, regularized, flexible, and partial least squares discriminant analysis
      • Ensembles of trees (bagging, boosting, and random forests)
      • Neural networks
      • Support vector machines
      • K-nearest neighbors
      • Na├»ve Bayes
    • Hands-on in R
    • Empirical comparison of regression models’ performance
  • Other considerations
    • Variable/feature selection
    • Effects of predictor noise on model performance
    • Categorization of a continuous response

Course Locations

Date

TBA

Check-in opens at 7:30 a.m. on the first day of the course.

Course runs from 8:30 a.m. to 5:00 p.m. each day.

Register Via Mail

Venue

The course fee includes a course binder and a continental breakfast each day.

Five for Four! Register five people for one course, one person for five courses, or any combination in between and your fifth registration is free. Note: This discount is only available if you register by fax, mail or phone and mention this discount. This discount may not be combined with any other offer.


Pricing
  Member Non-Member
Advanced $1,695 $1,895
Standard $1,895 $2,095

About the Instructors

  • Max Kuhn

    is a Director at Pfizer Global R&D providing support for early drug discovery, target discovery, computational biology and chemistry.

  • Kjell Johnson

    has worked as a statistician in the pharmaceutical industry for the past 12 years supporting a variety of drug discovery areas with a focus on computer aided drug discovery and pharmacokinetics, dynamics and metabolism.