SolitaryRoad.com

Website owner:  James Miller


[ Home ] [ Up ] [ Info ] [ Mail ]

Chi-square test. Contingency tables. Yates’ correction. Coefficient of Contingency.


ole.gif

Observed and theoretical frequencies. Suppose that in an experiment a set of possible events E1, E2, .... , Ek are observed to occur with frequencies o1, o2, o3, .... , ok, called observed frequencies, and that according to probability rules they are expected to occur with frequencies e1, e2, e3, .... , ek, called expected or theoretical frequencies. See Table 1. We often wish to know whether observed frequencies differ significantly from expected frequencies. We now treat that problem.


A measure of the discrepancy existing between observed and expected frequencies is supplied by the statistic χ2 (read chi-square) given by

ole1.gif

where if the total frequency is n, then

2)        ∑oj = ∑ej = n .


It can be shown that 1) is equivalent to

ole2.gif

Proof


If χ2 = 0, observed and theoretical frequencies agree exactly. If χ2 > 0, they do not agree exactly. The larger the value of χ2, the greater is the discrepancy between observed and expected frequencies.


The sampling distribution of χ2 is approximated very closely by the chi-square distribution


ole3.gif


ole4.gif

if expected frequencies are at least equal to 5, the approximation improving for larger values. See Fig. 1.


The number of degrees of freedom ν is given by


(a) ν = k - 1 if expected frequencies can be computed without having to estimate population parameters from sample statistics. Note that we subtract 1 from k because of the constraint condition 2) which states that if we know k - 1 of the expected frequencies the remaining frequency can be determined.


(b) ν = k - 1 - m if the expected frequencies can be computed only by estimating m population parameters from sample statistics.


Significance tests. In practice, expected frequencies are computed on the basis of a hypothesis H0. Using the null hypothesis we compte the expected frequencies and then the value of χ2. If the value of χ2 is greater than some critical value n (such as ole5.gif or ole6.gif which are the critical values at the .05 and .01 significance levels respectively) we conclude that the observed frequencies differ significantly from expected frequencies and reject H0. Otherwise, we accept H0 (or at least not reject it).


This procedure is called the chi-square test of hypothesis or significance.


The chi-square test for goodness of fit. The chi-square test can be used to determine how well theoretical distributions, such as the normal, binomial, etc., fit empirical distributions (i.e. those obtained from sample data).


Contingency tables. Table 1 above, in which observed frequencies occupy a single row, is called a one-way classification table. Since the number of columns is k, this is also called a 1 × k (read “1 by k”) table. By extending these ideas we arrive at two-way classification tables or h × k tables in which the observed frequencies occupy h rows and k columns. Such tables are often called contingency tables.


Corresponding to each frequency in an h × k contingency table, there is an expected or theoretical frequency which is computed subject to some hypothesis according to rules of probability. These frequencies which occupy the cells of a contingency table are called cell frequencies. The total frequency in each row or each column is called the marginal frequency.


To investigate agreement between observed and expected frequencies, we compte the statistic

ole7.gif

where the sum is taken over all cells in the contingency table, the symbols oj and ej representing respectively the observed and expected frequencies in the jth cell. This sum which is analogous to 1) contains hk terms. The sum of all observed frequencies is denoted by n and is equal to the sum of all expected frequencies.


As before, the statistic 5) has a sampling distribution given very closely by 4), provided expected frequencies are not too small. The number of degrees of freedom ν of this chi-square distribution is given for h > 1, k > 1 by


(a) ν = (h -1)(k - 1) if the expected frequencies can be computed without having to estimate population parameters from sample statistics.


(b) ν = (h -1)(k - 1) - m if the expected frequencies can be computed only by estimating population parameters from sample statistics.


Significance tests for h×k tables are similar to those for 1×k tables. Expected frequencies are found subject to a particular hypothesis H0. A hypothesis commonly assumed is that the two classifications are independent of each other.


Contingency tables can be extended to higher dimensions. Thus, for example, we can have h×k×l tables where 3 classifications are present.


ole8.gif

Yates’ correction for continuity. When results for continuous distributions are applied to discrete data, certain corrections for continuity can be made. The correction consists in rewriting 1) as


ole9.gif

and is usually referred to as Yates’ correction. An analogous modification of 5) also exists.


In general, the correction is made only when the number of degrees of freedom is ν = 1. For large samples this yields practically the same results as the uncorrected χ2, but difficulties can arise near critical values. For small samples where each expected frequency is between 5 and 10, it is perhaps best to compare both the corrected and uncorrected values of χ2.


Simple formulas for computing χ2. Simple formulas for computing χ2 which involve only the observed frequencies can be derived. See Fig. 2 and Fig. 3 for the cases of 2×2 and 2×3 tables.


Coefficient of Contingency. A measure

ole10.gif

of the degree of relationship, association or dependence of the classifications in a contingency table is given by


ole11.gif

which is called the coefficient of contingency. The larger the value of C, the greater is the degree of association. The number of rows and columns in the contingency table determines the maximum value of C, which is never greater than one. If the number of rows and columns of a contingency table is equal to k, the maximum value of C is given by ole12.gif .


Correlation of attributes. Because classifications in a contingency table often describe characteristics of individuals or objects, they are often referred to as attributes and the degree of dependence, association or relationship is called correlation of attributes. For k × k tables we define

ole13.gif  

as the correlation coefficient between attributes or classifications. This coefficient lies between 0 and 1. For 2 × 2 tables in which k = 2, the correlation is often called tetrachoric correlation.


Additive property of χ2. Suppose the results of repeated experiments yield sample values of χ2 given by ole14.gif with ν1, ν2, ν3, ... degrees of freedom respectively. Then the result of all these experiments can be considered equivalent to a χ2 value given by ole15.gif with ν1 + ν2 + ν3 + .... degrees of freedom.



Portions excerpted from Murray R. Spiegel. Statistics. Schaum.


References

 Murray R Spiegel. Statistics (Schaum Publishing Co.)



 


ole16.gif

ole17.gif









More from SolitaryRoad.com:

The Way of Truth and Life

God's message to the world

Jesus Christ and His Teachings

Words of Wisdom

Way of enlightenment, wisdom, and understanding

Way of true Christianity

America, a corrupt, depraved, shameless country

On integrity and the lack of it

The test of a person's Christianity is what he is

Who will go to heaven?

The superior person

On faith and works

Ninety five percent of the problems that most people have come from personal foolishness

Liberalism, socialism and the modern welfare state

The desire to harm, a motivation for conduct

The teaching is:

On modern intellectualism

On Homosexuality

On Self-sufficient Country Living, Homesteading

Principles for Living Life

Topically Arranged Proverbs, Precepts, Quotations. Common Sayings. Poor Richard's Almanac.

America has lost her way

The really big sins

Theory on the Formation of Character

Moral Perversion

You are what you eat

People are like radio tuners --- they pick out and listen to one wavelength and ignore the rest

Cause of Character Traits --- According to Aristotle

These things go together

Television

We are what we eat --- living under the discipline of a diet

Avoiding problems and trouble in life

Role of habit in formation of character

The True Christian

What is true Christianity?

Personal attributes of the true Christian

What determines a person's character?

Love of God and love of virtue are closely united

Walking a solitary road

Intellectual disparities among people and the power in good habits

Tools of Satan. Tactics and Tricks used by the Devil.

On responding to wrongs

Real Christian Faith

The Natural Way -- The Unnatural Way

Wisdom, Reason and Virtue are closely related

Knowledge is one thing, wisdom is another

My views on Christianity in America

The most important thing in life is understanding

Sizing up people

We are all examples --- for good or for bad

Television --- spiritual poison

The Prime Mover that decides "What We Are"

Where do our outlooks, attitudes and values come from?

Sin is serious business. The punishment for it is real. Hell is real.

Self-imposed discipline and regimentation

Achieving happiness in life --- a matter of the right strategies

Self-discipline

Self-control, self-restraint, self-discipline basic to so much in life

We are our habits

What creates moral character?


[ Home ] [ Up ] [ Info ] [ Mail ]