Tech Tip: Analysing Binary DataThe analysis of binary data, in which the response is either a yes or a no, is becoming increasingly important in many areas. Examples of binary data are whether or not an insurance claim is made in a given period, whether a lead becomes a customer, or whether a machine fails under different circumstances. The data can be coded as either a 0 or a 1. The most useful model for this type of data is logistic regression. A logistic regression is a particular case of what is known as a generalised linear model. These models are generalisations of the ordinary regression model that allow different types of data, such as binary data, and different types of links. A link connects the explanatory part of the model to the response. A logistic link means that the fitted values for the model stay between 0 and 1. To fit a logistic regression, use NAG function G02GBF (Fortran Library) / g02gbc (C Library), which is for generalised linear models with binomial data. Set the link parameter appropriately for using a logistic link and set all elements of the denominator array to 1. Often the explanatory part of the model is given by category variables that define groups: for example, occupation or age in ten-year groups. To add these to the model they need to be converted to a set of 0/1 variables that define the groups (these are often known as dummy variables). These dummy variables can be calculated using the NAG function G04EAF/g04eac. As well as being a library routine, G02GBF is available in the NAG Statistical Add-Ins for Excel as BINOMIAL_GLM. For specific technical advice in using NAG's products, please contact our technical experts. Return to Technical Tips & Hints index page. |
© Numerical Algorithms Group
Visit NAG on the web at:
www.nag.co.uk (Europe and ROW)
www.nag.com (North America)
www.nag-j.co.jp (Japan)
http://nag.com/techtips/techtip002.asp