Subject to testing is the distribution of numbers in a table showing qualitative data. The ACTUAL table is compared with a THEORETICAL table of 'expected' numbers. The table type most often tested in this manner is the 2 * 2 table, with two groups and one 'YES/NO' variable.
Such a table may be logically extended into a n * m table, showing the distribution in n groups of a qualitative variable with m categories. The expected numbers are calculated from the marginal sums, under the assumption that the probability of being in a given category is the same in all groups. This relationship is often described as 'independence' between groups and variables. Chi squared is calculated so that it measures the difference between the actual and the expected numbers, i.e. the lack of independence (= association).
One may also compare counts in n groups, of just ONE variable, with expected numbers supplied by the user, an n * 1 table. Where two YES/NO variables have been evaluated in a group of n individuals (or one YES/NO variable in a group of n tied pairs), there is a 'paired' situation, and McNemar's test is used. Yates's correction is used to hinder computed alpha from being larger than the wanted, theoretical value. The correction solves 'small number' problems and can in fact be used always.
The tests performed here are of a general nature, able to discover any kind of deviation from the hypothesis of independence between group allocation and variable value. However, if one were looking for a certain form of departure from the hypothesis, a test might be designed which was more sensitive to this situation, although perhaps less sensitive in discovering other forms of departure. E.g., when testing a 2*2 table, making the test one-sided if the hypothesis postulates difference between groups in one direction only. Moreover, the tests performed by the present program are exclusively designed for significance testing. In many cases where qualitative data have been collected, the estimation of parameters like proportions, risks, odds ratios are important. Hence, before using the present program, one should discuss one's hypotheses, and whether these are sufficiently specific to warrant more advanced tests and computations.
The DEGREES OF FREEDOM (DF) is a parameter of the chi squared distribution, determining its shape. There are simple rules for setting the DF. Here in general the DF are set by the program. However, there is an exception where this cannot be done, the option of testing numbers in a one-way table against expected numbers provided by the user. In this case, the DF depends on the way expected numbers were calculated. Their sum must be equal to the sum of the observed numbers, which entails a loss of 'freedom' and hence a loss of one DF. If in addition the user wants the expected numbers to conform to a model, e.g. the normal distribution, they must be calculated with this in mind. The normal distribution is defined by TWO parameters - 'mean' and 'standard deviation'. Hence, use of a normal distribution to calculate the expected numbers leads to a loss of TWO more DF's with respect to the test. The example shows the general rule, which is to subtract one DF for each restriction laid on the calculation of the expected numbers. The example also shows an important use of the present test: To see if observed data could stem from a population distributed in one particular way.
The chi squared tests are large sample tests, a well-known thumb rule stating that they may be un-conservative if one or more cells contain a number smaller than 5. Some authorities have set the limit to 10 or even to 20. To avoid the problem, one may add the content of two or more cells into a new cell for replacing them. A second solution is to use Yates's correction. A third solution is to use the 'exact' tests, which examine the probability of the observed outcome, or outcome less probable under the independence hypothesis, among all possible outcomes having the same marginal totals. This is essence of the Fisher-Irving (FI) test, which has been included here for control when testing 2 x 2 tables where numbers are small. The FI test is 'asymmetric' so that the P resulting from a two sided test is not necessarily twice the P of a one-sided test. However, this approximation is often used. The author holds the view that the purpose of an 'exact' test is to be just that. Hence, when the FI test is used here, both P's are presented. In all other cases, only the two sided P is presented.
References: About elementary tests, see the book Statistics at square one by TDV Swinscow, published by the British Medical Association, London. About advanced tests, see a recent edition of the classic 'Statistical methods in medical research' by P. Armitage, Blackwell Scientific Publications, Oxford and Edinburgh.
Download Program (TABCHI.EXE)
Home page Department
Home page University
Department of Public Health and Primary Health Care, last updated 14.12.00
Hogne.Sandvik@isf.uib.no