Classification of Data
In most research studies, voluminous raw data collected through a survey need to be reduced into homogeneous groups for any meaningful analysis. This necessitates classification of data, which in simple terms is the process of arranging data in groups or classes on the basis of some characteristics. The process of grouping into different classes or sub classes according to some characteristics is known as classification, tabulation is concerned with the systematic arrangement and presentation of classified data.
Thus classification is the first step in tabulation.
For Example, letters in the post office are classified according to their destinations viz., Delhi, Chennai, Bangalore, Mumbai etc.
Objectives of Classification:
The following are main objectives of classifying the data:
1. It condenses the mass of data.
2. It eliminates unnecessary details.
3. It facilitates comparison and highlights the significant aspect of data.
4. It enables one to get a mental picture of the information and helps in drawing inferences.
5. It helps in the statistical treatment of the information collected.
Types of classification:
Statistical data are classified in respect of their characteristics. Broadly there are four basic types of classification namely
a) Chronological classification
b) Geographical classification
c) Qualitative classification
d) Quantitative classification
a) Chronological classification:
In chronological classification the collected data are arranged according to the order of time expressed in years, months, weeks, etc., The data is generally classified in ascending order of time. For example, the data related with population, sales of a firm, imports and exports of a country are always subjected to chronological classification.
b) Geographical classification:
In this type of classification the data are classified according to geographical region or place. For instance, the production of paddy in different states in India, production of wheat in different countries etc.
c) Qualitative classification:
In this type of classification data are classified on the basis of same attributes or quality like gender, literacy, religion, employment etc. Such attributes cannot be measured along with a scale.
For example, if the population to be classified in respect to one attribute, say gender, then we can classify them into two namely that of males and females. Similarly, they can also be classified into ‘employed’ or ‘unemployed’ on the basis of another attribute ‘ employment’ .
Thus when the classification is done with respect to one attribute, which is dichotomous in nature, two classes are formed, one possessing the attribute and the other not possessing the attribute. This type of classification is called simple classification.
A simple classification may be shown as under
The classification, where two or more attributes are considered and several classes are formed, is called a manifold classification.
For example, if we classify population simultaneously with respect to two attributes, e.g gender and employment, then population are first classified with respect to ‘gender’ into ‘males’ and ‘ females’ . Each of these classes may then be further classified into ‘employment’ and ‘unemployment’ on the basis of attribute ‘employment’ and as such Population are classified into four classes namely.
(i) Male employed
(ii) Male unemployed
(iii) Female employed
(iv) Female unemployed
Still the classification may be further extended by considering other attributes like marital status etc. This can be explained by the following chart
d) Quantitative classification:
Quantitative classification refers to the classification of data according to some characteristics that can be measured such as height, weight, etc., For example the students of a college may be classified according to weight as given below.
In this type of classification there are two elements, namely
(i) the variable (i.e) the weight in the above example,
(ii) the frequency in the number of students in each class.
There are 50 students having weights ranging from 90 to 100 kg, 200 students having weight ranging between 100 to 110 kgs and so on.
Tabulation is the process of summarizing classified or grouped data in the form of a table so that it is easily understood and an investigator is quickly able to locate the desired information.
A table is a systematic arrangement of classified data in columns and rows. Thus, a statistical table makes it possible for the investigator to present a huge mass of data in a detailed and orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise not obvious.
Classification and ‘Tabulation’ , as a matter of fact, are not two distinct processes. Actually they go together. Before tabulation data are classified and then displayed under different columns and rows of a table.
Advantages of Tabulation:
Statistical data arranged in a tabular form serve following objectives:
1. It simplifies complex data and the data presented are easily understood.
2. It facilitates comparison of related facts.
3. It facilitates computation of various statistical measures like averages, dispersion, correlation etc.
4. It presents facts in minimum possible space and unnecessary repetitions and explanations are avoided. Moreover, the needed information can be easily located.
5. Tabulated data are good for references and they make it easier to present the information in the form of graphs and diagrams.
Types of Tables:
Tables can be classified according to their purpose, stage of enquiry, nature of data or number of characteristics used. On the basis of the number of characteristics, tables may be classified as follows:
1. Simple or one-way table
2. Two way table
3. Manifold table
Simple or one-way Table:
A simple or one-way table is the simplest table which contains data of one characteristic only. A simple table is easy to construct and simple to follow. For example, the blank table given below may be used to show the number of adults in different occupations in a locality.
A table, which contains data on two characteristics, is called a two-way table. In such case, therefore, either stub or caption is divided into two co-ordinate parts. In the given table, as an example the caption may be further divided in respect of ‘gender’ . This subdivision is shown in two-way table, which now contains two characteristics namely, occupation and gender.
The number of adults in a locality in respect of occupation and gender.
Thus, more and more complex tables can be formed by including other characteristics. For example, we may further classify the caption sub-headings in the above table in respect of “marital status”, “ religion” and “socio-economic status” etc. A table ,which has more than two characteristics of data is considered as a manifold table. For instance , table shown below shows three characteristics namely, occupation, gender and marital status.
Manifold tables, though complex are good in practice as these enable full information to be incorporated and facilitate analysis of all related facts. Still, as a normal practice, not more than four characteristics should be represented in one table to avoid confusion. Other related tables may be formed to show the remaining characteristics.