×
The future belongs to those who believe in the beauty of their dreams.
--Your friends at LectureNotes
Close

Probability and Statistics

by Steward OigoSteward Oigo
Type: NoteInstitute: JKUAT Offline Downloads: 142Views: 2946Uploaded: 1 year ago

Share it with your friends

Suggested Materials

Leave your Comments

Contributors

Steward Oigo
Steward Oigo
SMA 2272/STA 2270 STATISTICS PURPOSE By the end of the course the student should be proficient in representing data graphically and handling summary statistics, simple correlation and best fitting line, and handling probability and probability distributions including expectation and variance of a discrete random variable. DESCRIPTION Classical and axiomatic approaches to probability. Compound and conditional probability, including Bayes' theorem. Concept of discrete random variable: expectation and variance. Data: sources, collection, classification and processing. Frequency distributions and graphical representation of data, including bar diagrams, histograms and stem-and-leaf diagrams. Measures of central tendency and dispersion. Skewness and kurtosis. Correlation. Fitting data to a best straight line. Pre-Requisites: STA 2104 Calculus for statistics I, SMA 2104 Mathematics for Science. COURSE TEXT BOOKS 1. Uppal, S. M., Odhiambo, R. O. & Humphreys, H. M. Introduction to Probability and Statistics. JKUAT Press, 2005. ISBN 9966-923-95-0 2. J Crawshaw & J Chambers A concise course in A-Level statistics, with worked examples, 3rd ed. Stanley Thornes, 1994 ISBN 0-534- 42362-0. COURSE JOURNALS 1. Journal of Applied Statistics (J. Appl. Stat.) [0266-4763; 1360-0532] 2. Statistics (Statistics) [0233-1888] FURTHER REFERENCE TEXT BOOKS AND JOURNALS 1. GM Clarke & D Cooke A Basic Course in Statistics. 5th ed. Arnold, 2004 ISBN13: 978-0-340-81406-2 ISBN10: 0-340-81406-3. 2. S Ross A first course in Probability 4th ed. Prentice Hall, 1994 ISBN-10: 0131856626 ISBN-13: 9780131856622. 3. P.S. Mann. Introductory Statistics. John Wiley & Sons Ltd, 2001 ISBN 13: 9780471395119. 4. Statistical Science (Stat. Sci.) [0883-4237] 5. Journal of Mathematical Sciences 6. Journal of Teaching Statistics 1
Introduction What is statistics? The Word statistics has been derived from Latin word “Status” or the Italian word “Statista”, the meaning of these words is “Political State” or a Government. Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data. Definition Statistics: a branch of science that deals with collection presentation, analysis, and interpretation of data. The definition points out 4 key aspects of statistics namely (i) Data collection (iii) Data analysis, and (ii) Data presentation, (iv) Data interpretation Statistics is divided into 2 broad categories namely descriptive and inferential statistics. Descriptive Statistics: summary values and presentations which gives some information about the data Eg the mean height of a 1st year student in JKUAT is170cm. 170cm is a statistics which describes the central point of the heights data. Inferential Statistics: summary values calculated from the sample in order to make conclusions about the target population. Types of Variables Qualitative Variables: Variables whose values fall into groups or categories. They are called categorical variables and are further divided into 2 classes namely nominal and ordinal variables a) Nominal variables: variables whose categories are just names with no natural ordering. Eg gender marital status, skin colour, district of birth etc b) Ordinal variables: variables whose categories have a natural ordering. Eg education level, performance category, degree classifications etc Quantitative Variables: these are numeric variables and are further divided into 2 classes namely discrete and continuous variables a) Discrete variables: can only assume certain values and there are gaps between them. Eg the number of calls one makes in a day, the number of vehicles passing through a certain point etc b) Continuous variables: can assume any value in a specified range. Eg length of a telephone call, height of a 1st year student in JKUAT etc 1. Data Collection: 1.1 Sources of Data There are 2 sources for data collection namely Primary, and Secondary data Primary data:- freshly collected ie for the first time. They are original in character ie they are the first hand information collected, compiled and published for some purpose. They haven’t undergone any statistical treatment Secondary Data:- 2nd hand information mainly obtained from published sources such as statistical abstracts books encyclopaedias periodicals, media reports eg census report CD-roms and other electronic devices, internet. They are not original in character and have undergone some statistical treatment at least once. 1.2 Data Collection Methods The 1st step in any investigation (inquiry) is data collection. Information can either be collected directly or indirectly from the entire population or a sample. There are many methods of collecting data which includes the ones illustrated in the flow chart below 2
Methods of data collection Experimental or laboratory methods Simulation Lab methods Non experimental methods Field expt Field methods Sample case Surveys study Field study Library methods Census Experimental methods are so called because in them the investigator in a laboratory tests the hypothesis about the cause and effect relationship by manipulating the independent variables under controlled conditions. Non-Experimental methods are so called because in them the investigator does not control or change any aspect of the situation under study but simply describes what naturally occurs at a certain point or period of time. Non-Experimental methods are widely used in social sciences. Some of the Non-Experimental methods used for data collection are outlined below. a) Field study:- aims at testing hypothesis in natural life situations. It differs from field experiment in that the researcher does not control or manipulate the independent variables but both of them are carried out in natural conditions Merits: (i) The method is realistic as it is carried out in natural conditions (ii) It’s easy to obtain data with large number of variables. Demerits (iii) Independent variables are not manipulated. (iv) Co-operation of the organization is often difficult to obtain. (v) Data is likely to contain unknown sampling biasness. (vi) The dross rate (proportion of irrelevant data) may be high in such studies. (vii) Measurement is not precise as in laboratory because of influence of confounding variables. b) Census. A census is a study that obtains data from every member of a population (totality of individuals /items pertaining to certain characteristics). In most studies, a census is not practical, because of the cost and/or time required. c) Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes/ characteristics. Surveys of human populations and institutions are common in government, health, social science and marketing research. d) Case study –It’s a method of intensively exploring and analyzing the life of a single social unit be it a family, person, an institution, cultural group or even an entire community. In this method no attempt is made to exercise experimental or statistical control and phenomena related to the unit are studied in natural. The researcher has several discretion in gathering information from a variety of sources such as diaries, letters, autobiographies, records in office, files or personal interviews. Merits: 3
(i) The method is less expensive than other methods. (ii) Very intensive in nature –aims at studying a few units rather than several (iii) Data collection is flexible since the researcher is free to approach the problem from any angle. (iv) Data is collected from natural settings. Demerits (i) It lacks internal validity which is basic to scientific evidence. (ii) Only one unit of the defined population is studied. Hence the findings of case study cannot be used as abase for generalization about a large population. They lack external validity. (iii) Case studies are more time consuming than other methods. e) Experiment. An experiment is a controlled study in which the researcher attempts to understand cause-and-effect relationships. In experiments actual experiment is carried out on certain individuals / units about whom information is drawn. The study is "controlled" in the sense that the researcher controls how subjects are assigned to groups and which treatments each group receives. f) Observational study. Like experiments, observational studies attempt to understand cause-and-effect relationships. However, unlike experiments, the researcher is not able to control how subjects are assigned to groups and/or which treatments each group receives. Under this method information, is sought by direct observation by the investigator. 1.3 Population and Sample Population: The entire set of individuals about which findings of a survey refer to. Sample: A subset of population selected for a study. Sample Design: The scheme by which items are chosen for the sample. Sample unit: The element of the sample selected from the population. Unit of analysis: Unit at which analysis will be done for inferring about the population. Consider that you want to examine the effect of health care facilities in a community on prenatal care. What is the unit of analysis: health facility or the individual woman?. Sampling Frames For probability sampling, we must have a list of all the individuals (units) in the population. This list or sampling frame is the basis for the selection process of the sample. “A [sampling] frame is a clear and concise description of the population under study, by virtue of which the population units can be identified unambiguously and contacted, if desired, for the purpose of the survey” - Hedayet and Sinha, 1991 Based on the sampling frame, the sampling design could also be classified as: Individual Surveys if List of individuals is available or when the size of population is small Special population Household Surveys; If it’s Based on the census of the households and if the individual level information is unlikely to be available In practice, it’s limited to small geographical areas and know as “area sampling frame” Example: Demographic and Health Surveys (DHS) Institutional Surveys If it’s Based on the census of say Hospital/clinic lists eg i) 1990 National Hospital Discharge Survey ii) National Ambulatory Medical Care Survey Problems of Sampling Frame (i) Missing elements (ii) Noncoverage (iii) Incomplete frame (iv) Old list (v) Undercoverage (vi) May not be readily available (vii) Expensive to gather 4

Lecture Notes