## DISTRIBUTIONS AND DESCRIPTIVE STATISTICS

Assignment Overview
The assignments require that you provide a descriptive analysis of the population sampled within the Education Study database in SPSS. Click on the link for the file to open.
When one is exploring a population there are always several questions that must be asked to help you focus on the key issues or questions between the population from which the sample was taken and the study sample or the database in this case. (Hint: The primary question is usually related to the ability of the researcher to make certain statistical inferences about the study population from the sample.)
Assignment Overview
Part I – Complete Table 1 + 2 for all 12 variables per my first examples using the Education Study database.
Part II – Identify questions five at least, that you believe may be key to characterizing the subject population from which the Education Study database was sampled.
Part III – Develop a detailed text-based description of the database (per your five questions) to include only the pertinent information from SPSS:
Descriptive statistics as appropriate for the variables: N, Mean, SD, 95% Confidence Interval (CI), Skewness, Kurtosis, Etc.
Frequency tables: Percentages, etc.
Histogram graphs of the variables: English scores. Explain how the histogram depicts the distribution (normal/ not-normal), and the Skewness and Kurtosis statistics you obtained.
Note Part III above may be completed and submitted as late as Module 3 since SPSS is required and some students may encounter delays in receiving it for this course.
Assignment Expectations
Your assignment will be graded according to the Quantitative Grading Rubric. (To see the rubric, go to Assessments>Rubrics. Click the arrow next to the rubric name and choose Preview.)

All assignments and the reporting of statistical results MUST be in APA format.
Be assiduous in your work so that you do not miss on the variables and statistics.

General

This course is based on the assumption that students accepted into this course already have had the appropriate statistics and research methods courses in their earlier studies. However, it may have been a few years since some of the students have had an opportunity to review some of the material covered in this course. If you need to refresh your memory concerning statistics/biostatistics, please review some of the sources presented on the Background page for this module as well as Module 1 Background. It is recommended that you bookmark those reference sources you find as valuable. Also be sure to be very conversant with all Key Terms presented in each module of the course.

Introduction

Research problems are questions that can be answered by collecting facts. The field of study concerned with obtaining, describing, and interpreting facts is called statistics. The raw materials of research are data, and a major portion of scientific research involves statistical thinking about data.

In order for a scientist to be able to respond to research questions, he/she must first measure characteristics of people or objects. Measurement is the assignment of numeral event according to a set of rules. We assign numbers to abstracts, in concepts that are tangible. After a method of measurement for a concept is chosen, the concept is called a variable.

A Variable is a measured characteristic that can take different values.

What Researchers Seek (In general):

TO DESCRIBE
Example: The objective of this analysis is to describe the distribution of statistics examination scores for student in RMS608.

There is one variable (examination scores), hence this is a univariate analysis.
TO COMPARE
Example: The objective of this analysis is to determine if there is a significant difference between males and females with respect to statistics examination scores for students in RMS608.

There are two variables (gender and examination scores), and these are linked, therefore this is a bivariate analysis.
TO EXPLORE RELATIONSHIP
Example: The objective of this analysis is to determine if there is a significant relationship between mid-term and final statistics examinations scores for students in RMS608. Usually, the follow-up objective is to determine the extent to which final exam scores can be predicted from mid-term exam scores.

There are two variables (mid-term exam scores and final exam scores) and these are linked, hence a bivariate analysis.

For this assignment, the objectives should conform to #1 with appropriate measures (see below).
WHICH MEASURE(S) SHOULD I USE FOR DESCRIBING THE DATA?

Descriptive statistics should be the first step of statistical analysis, in order to reduce the raw data to a small number of representative figures. Datasets are summarized in order to describe two important features of the data distributions: the spread of scores and where within that range most of the data fall (e.g., the average of the scores). Measures of central tendency are used to assess the middle or average value in a distribution and measures of dispersion in order to estimate the amount of variability contained within the data and thus the degree to which the average is a typical value.

With nominal variables (unordered categories: e.g., ethnic groups), use the MODE (with frequencies and percentages for each alternative).
With ordinal variables (ordered categories: e.g., height reported as tall, medium, short), the median and mode are appropriate.
For true numerical variables, all the common measures of central tendency (mean, median, mode) and dispersion (standard deviation, variance, and range) are appropriate. If the distribution of the variables is skewed, emphasize the median in your report.
NOTE: It is very useful to include the confidence interval for the mean (usually 95% confidence level). This indicates the range of values within which you are 95% confident that the true population mean will fall. Remember that there will always be standard error in statistics, therefore, it is not exact to describe the entire population with only one value (hence a range or values or confidence interval). This is the basic concept of INFERENTIAL STATISTICS, which will be further explored in subsequent modules.
Always check your data for the presence of outliers: These are extreme values (a valid but atypical value, or resulting from coding or data entry errors) which could lead to unreliable results if not addressed.
NOTE

When describing the data, you must interpret the numerical outcomes (measures of central tendency, dispersion, and skewness) in conjunction with appropriate graphs. The numerical measures together with the graphs will allow you to confirm/discern the following features/characteristics:

Normal
Positively skewed
Negatively skewed
Bi-modal
Multi-modal
Important Concepts/Key Terms

Range Effects: The sampling procedure or measuring instrument can result in a study population which is restricted in range when compared to the general population. For instance, the range in heights of basketball players is much more restricted than that of the general population. In other words, when you measure a restricted group (5th graders compared to all grades, IQ of graduate students compared to the general population) the range of values will be much more restricted than that for the general population. Now, imagine how this fact could affect the results of various research studies!

Outliers: Many studies may be influenced by outliers. Depending on where the outliers are found in a data set they can influence the value of the sample mean and a correlation coefficient.

Measurement

Measurement is in fact a translation of concepts, which are unobservable, into measurable terms, which are observable. For example, the concept of trust is a mental representation of the honesty and integrity of a referent person or organization. Measuring trust might consist of four questions in a survey that taps perceptions of honesty and integrity of a referent person or organization. Alternatively, we might want to know something about the education of our respondent. We might ask the question stem “please indicate your level of education by checking one of the following boxes.” Included below would be a set of boxes indicating less than high school, high school graduate (or GED), associate degree, bachelors degree, masters degree, doctoral or professional degree (MD, Ph.D., Pharm.D., JD, DDS, EdD., etc.).

How we are dealing with these kinds of concepts? How we are measuring them?

Levels of Measurement

There are four levels of measurement that we will discuss in this class. They are called nominal, ordinal, interval, and ratio measures.

Nominal Measures

These measures classify elements into categories of the variable that are exhaustive and mutually exclusive. In the ordering of the four types of measures, nominal measures represent the lowest level. A common example of a nominal measure is sex. As we all know, gender has two categoriesmale and female. There is no rank-order relationship among the categories. As we shall see with the three other types of measures, additional characteristics are added as we move up the four levels of measurement.

Ordinal Measures

Ordinal measures refer to those variables whose attributes or characteristics may be logically rank-ordered along some progression. For example, hospital size may be ranked along the continuum of small, medium, or large. This ordinal measure contains the logical rank ordering of small, medium, and large. Ordinal measurement represents the next advancement along the levels of measurement. In addition to the rank-order function, it contains all the characteristics of nominal measures including classification, exhaustiveness, and exclusiveness.

Interval Measures

Interval measures refer to those variables whose characteristics are not only rank-ordered, but are separated by equal distances. Our Fahrenheit temperature scale is a good example of an interval measure. You know the difference between 50 and 60 degrees is 10 degrees because of these equal differences. This property of equal distances is the main reason why interval measures are more advanced than the previous two types. However, interval measures do not have a true zero. Remember that the Fahrenheit scale can have a temperature of zero. However, the zero temperature is relative and does not reflect the absence of temperature in any true physical sense. This is a property of the Kelvin temperature scale where zero represents the total absence of any heat. However, the Kelvin temperature scale is not an interval scale, rather it is a ratio scale. The Likert scale, which is commonly used in surveys, experiments, and evaluations, is also an interval measure.

Ratio Measures

As we just discussed, ratio scales have all the characteristics of interval measures and also have a true zero point. Age and education are good examples of ratio scales.

One reason why these levels of measurement are important is that different statistical methods can be used with each level. Statistical methods are generally not used with nominal measures because they contain no mathematical meaning. Ordinal measures have special statistics developed for them, based on their rank ordering, although they are not used as much as other statistics. The most powerful statistics are used with interval and ratio measures. Therefore, knowing the level of measurement of a particular question helps you to know what statistics can be applied. (We will deal with this issue in the following modules.)