Posts

Showing posts from 2013

Introduction to Predictive Analytics (video blog)

I recently presented at an event in Nashville introducing Predictive Analytics to the audience and demonstrating some live applications.  Here is the video: http://vimeo.com/80917063

Customer Profiling (guest author: Paul Cook)

Image
Profiling is a data mining technique used to find patterns and trends in customer data. In today's post we will explore the application of this technique.
As we will demonstrate in this article, profiling is both simple and powerful. Its great strength is its simplicity. It is ideal for communicating large amounts of information in a user-friendly way. Profiles are clear, comprehensive, and easy to read, which makes them ideal for communicating with a business or non-technical audience.
Profiling describes a group of people by summarizing information about them. Profiles are typically used to answer questions like: What do my customers look like?Which prospects are most likely to buy?What drives customer churn?
Often, a profile is all you need to answer these questions. Other times, you may choose to use a profile for exploratory data analysis before multivariate modelling.
Profiling is often used to find hot prospects for marketing campaigns. By comparing past purchasers and non-purch…

Introduction to Classification & Regression Trees (CART)

Image
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables).  In today's post, we discuss the CART decision tree methodology.  The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees:

Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees: where the target variable is continuous and tree is used to predict it's value.
The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be.  The result of these questions is a tree like structure where the ends are terminal node…

Data Mining and Airline Safety

Image
In today's post, we examine the use of data mining to improve airline safety.  Over the past several decades, air travel has become, statistically, one of the safest modes of transportation.  In the following chart, you will observe that there has been a substantial decline in the fatal accident rate from 1950 through about 1980, even though the actual number of departures has increased significantly:

[Source: Handbook of Statistical Analysis and Data Mining; Nisbet, Elder, Miner, pp 378]

Since 1980 however, the decline in fatalities has somewhat stabilized which probably indicates that new thinking and new safety approaches are needed to further push down the rate of fatalities.  One such approach could be the use of data mining in determining the causes of fatalities so that preventative action may be taken.  In this post, we will use publicly available data on airline safety to identify main causes of accidents and thereafter identify which the main predictors of accidents are. …