×

Close

Type:
**Note**Institute:
**
Jomo Kenyatta University of Agriculture and technology
**Downloads:
**80**Views:
**1808**Uploaded:
**10 months ago**Add to Favourite

SMA 2272/STA 2270 STATISTICS
PURPOSE
By the end of the course the student should be proficient in representing data graphically and handling
summary statistics, simple correlation and best fitting line, and handling probability and probability
distributions including expectation and variance of a discrete random variable.
DESCRIPTION
Classical and axiomatic approaches to probability. Compound and conditional probability, including
Bayes' theorem. Concept of discrete random variable: expectation and variance. Data: sources,
collection, classification and processing. Frequency distributions and graphical representation of data,
including bar diagrams, histograms and stem-and-leaf diagrams. Measures of central tendency and
dispersion. Skewness and kurtosis. Correlation. Fitting data to a best straight line.
Pre-Requisites: STA 2104 Calculus for statistics I, SMA 2104 Mathematics for Science.
COURSE TEXT BOOKS
1. Uppal, S. M., Odhiambo, R. O. & Humphreys, H. M. Introduction to Probability and Statistics. JKUAT
Press, 2005. ISBN 9966-923-95-0
2. J Crawshaw & J Chambers A concise course in A-Level statistics, with worked examples, 3rd ed. Stanley
Thornes, 1994 ISBN 0-534- 42362-0.
COURSE JOURNALS
1. Journal of Applied Statistics (J. Appl. Stat.) [0266-4763; 1360-0532]
2. Statistics (Statistics) [0233-1888]
FURTHER REFERENCE TEXT BOOKS AND JOURNALS
1. GM Clarke & D Cooke A Basic Course in Statistics. 5th ed. Arnold, 2004 ISBN13: 978-0-340-81406-2
ISBN10: 0-340-81406-3.
2. S Ross A first course in Probability 4th ed. Prentice Hall, 1994 ISBN-10: 0131856626 ISBN-13:
9780131856622.
3. P.S. Mann. Introductory Statistics. John Wiley & Sons Ltd, 2001 ISBN 13: 9780471395119.
4. Statistical Science (Stat. Sci.) [0883-4237]
5. Journal of Mathematical Sciences
6. Journal of Teaching Statistics
1

Introduction
What is statistics?
The Word statistics has been derived from Latin word “Status” or the Italian word “Statista”, the
meaning of these words is “Political State” or a Government. Early applications of statistical thinking
revolved around the needs of states to base policy on demographic and economic data.
Definition
Statistics: a branch of science that deals with collection presentation, analysis, and interpretation of
data. The definition points out 4 key aspects of statistics namely
(i) Data collection
(iii) Data analysis, and
(ii) Data presentation,
(iv) Data interpretation
Statistics is divided into 2 broad categories namely descriptive and inferential statistics.
Descriptive Statistics: summary values and presentations which gives some information about the
data Eg the mean height of a 1st year student in JKUAT is170cm. 170cm is a statistics which describes
the central point of the heights data.
Inferential Statistics: summary values calculated from the sample in order to make conclusions about
the target population.
Types of Variables
Qualitative Variables: Variables whose values fall into groups or categories. They are called
categorical variables and are further divided into 2 classes namely nominal and ordinal variables
a) Nominal variables: variables whose categories are just names with no natural ordering. Eg gender
marital status, skin colour, district of birth etc
b) Ordinal variables: variables whose categories have a natural ordering. Eg education level,
performance category, degree classifications etc
Quantitative Variables: these are numeric variables and are further divided into 2 classes namely
discrete and continuous variables
a) Discrete variables: can only assume certain values and there are gaps between them. Eg the
number of calls one makes in a day, the number of vehicles passing through a certain point etc
b) Continuous variables: can assume any value in a specified range. Eg length of a telephone call,
height of a 1st year student in JKUAT etc
1. Data Collection:
1.1 Sources of Data
There are 2 sources for data collection namely Primary, and Secondary data
Primary data:- freshly collected ie for the first time. They are original in character ie they are the
first hand information collected, compiled and published for some purpose. They haven’t undergone
any statistical treatment
Secondary Data:- 2nd hand information mainly obtained from published sources such as statistical
abstracts books encyclopaedias periodicals, media reports eg census report CD-roms and other
electronic devices, internet. They are not original in character and have undergone some statistical
treatment at least once.
1.2 Data Collection Methods
The 1st step in any investigation (inquiry) is data collection. Information can either be collected
directly or indirectly from the entire population or a sample.
There are many methods of collecting data which includes the ones illustrated in the flow chart below
2

Methods of data collection
Experimental or laboratory methods
Simulation
Lab methods
Non experimental methods
Field expt
Field methods
Sample case
Surveys study
Field
study
Library methods
Census
Experimental methods are so called because in them the investigator in a laboratory tests the
hypothesis about the cause and effect relationship by manipulating the independent variables under
controlled conditions.
Non-Experimental methods are so called because in them the investigator does not control or change
any aspect of the situation under study but simply describes what naturally occurs at a certain point or
period of time.
Non-Experimental methods are widely used in social sciences. Some of the Non-Experimental
methods used for data collection are outlined below.
a) Field study:- aims at testing hypothesis in natural life situations. It differs from field experiment in
that the researcher does not control or manipulate the independent variables but both of them are
carried out in natural conditions
Merits:
(i) The method is realistic as it is carried out in natural conditions
(ii) It’s easy to obtain data with large number of variables.
Demerits
(iii) Independent variables are not manipulated.
(iv) Co-operation of the organization is often difficult to obtain.
(v) Data is likely to contain unknown sampling biasness.
(vi) The dross rate (proportion of irrelevant data) may be high in such studies.
(vii)
Measurement is not precise as in laboratory because of influence of confounding
variables.
b) Census. A census is a study that obtains data from every member of a population (totality of
individuals /items pertaining to certain characteristics). In most studies, a census is not practical,
because of the cost and/or time required.
c) Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to
estimate population attributes/ characteristics. Surveys of human populations and institutions are
common in government, health, social science and marketing research.
d) Case study –It’s a method of intensively exploring and analyzing the life of a single social unit be it a
family, person, an institution, cultural group or even an entire community. In this method no attempt
is made to exercise experimental or statistical control and phenomena related to the unit are studied in
natural. The researcher has several discretion in gathering information from a variety of sources such
as diaries, letters, autobiographies, records in office, files or personal interviews.
Merits:
3

(i) The method is less expensive than other methods.
(ii) Very intensive in nature –aims at studying a few units rather than several
(iii) Data collection is flexible since the researcher is free to approach the problem from any angle.
(iv) Data is collected from natural settings.
Demerits
(i) It lacks internal validity which is basic to scientific evidence.
(ii) Only one unit of the defined population is studied. Hence the findings of case study cannot be
used as abase for generalization about a large population. They lack external validity.
(iii) Case studies are more time consuming than other methods.
e) Experiment. An experiment is a controlled study in which the researcher attempts to understand
cause-and-effect relationships. In experiments actual experiment is carried out on certain individuals /
units about whom information is drawn. The study is "controlled" in the sense that the researcher
controls how subjects are assigned to groups and which treatments each group receives.
f) Observational study. Like experiments, observational studies attempt to understand cause-and-effect
relationships. However, unlike experiments, the researcher is not able to control how subjects are
assigned to groups and/or which treatments each group receives. Under this method information, is
sought by direct observation by the investigator.
1.3 Population and Sample
Population: The entire set of individuals about which findings of a survey refer to.
Sample: A subset of population selected for a study.
Sample Design: The scheme by which items are chosen for the sample.
Sample unit: The element of the sample selected from the population.
Unit of analysis: Unit at which analysis will be done for inferring about the population. Consider that
you want to examine the effect of health care facilities in a community on prenatal care. What is the
unit of analysis: health facility or the individual woman?.
Sampling Frames
For probability sampling, we must have a list of all the individuals (units) in the population. This list
or sampling frame is the basis for the selection process of the sample. “A [sampling] frame is a clear
and concise description of the population under study, by virtue of which the population units can be
identified unambiguously and contacted, if desired, for the purpose of the survey” - Hedayet and
Sinha, 1991
Based on the sampling frame, the sampling design could also be classified as:
Individual Surveys if List of individuals is available or when the size of population is small
Special population
Household Surveys; If it’s Based on the census of the households and if the individual level
information is unlikely to be available In practice, it’s limited to small geographical areas and know
as “area sampling frame” Example: Demographic and Health Surveys (DHS)
Institutional Surveys If it’s Based on the census of say Hospital/clinic lists eg
i) 1990 National Hospital Discharge Survey
ii) National Ambulatory Medical Care Survey
Problems of Sampling Frame
(i) Missing elements
(ii) Noncoverage
(iii) Incomplete frame
(iv) Old list
(v) Undercoverage
(vi) May not be readily available
(vii)
Expensive to gather
4

## Leave your Comments