EXST 700x
Lab #3: Frequencies and Chart Graphics
Tip: Read previous tips.
Objectives
1. Use a LABEL statement to clarify information about a
variable.
2. Use a LENGTH statement.
3. Use a FREQ procedure (frequency) to produce a frequency
table.
4. Use a CHART procedure to
a) produce a horizontal bar chart
with frequencies.
b) produce a vertical bar chart
(histogram).
c) produce other descriptive
charts (e.g. BLOCK and PIE charts).
More details on basic SAS
Statements and Procedures
The “results viewer” window is the default HTML window for
SAS output. However, output accumulates in this window with each successive run
of the program with the most recent output on the bottom. The following
statements will result in clearing the results window each time these statements
are run.
ods html close; ods html;
Obviously, this program should go near the
top; I suggest right after the
dm 'log;clear;output;clear';
code, assuming you include that. That code
will clear the log and the list each time the program is executed from the top.
For this week’s assignment I suggest you also include,
ods graphics on;
as this may produce some prettier pictures.
I would still include the statement to get output
ODS listing;
it would probably be wise to put all four of
these as the first four lines in your program.
Assignment 3 example
The example
program for today has is much more complicated (16 variables) than the assignment dataset (only 5 variables),
but I needed a large dataset to find a range of values that would produce
graphics similar to those in the assignment. The dataset is the “Healthy
Breakfast” dataset from DASL (The Data and Story Library; http://lib.stat.cmu.edu/DASL/).
Since the example dataset is more complex,
I had to use some statements that you will not have to use. For example, by
default the length of a character variable is equal to the length of the first
value that is read for that variable. If you have a variable GENDER and the
first value is “male” then when the value “female” occurs it will read “fema”. If the first occurrence is “female”, then there
would be no problem reading “male”. By default, the length of a variable should
not exceed 8 characters. However, longer variables can be read if the length is
specified in a “LENGTH” statement. In my example dataset some of the cereal
names (variable NAME) are very large. I specified the following statement after
the DATA statement to get 24 characters, and some names exceeded that length
(the maximum length is 32 characters).
LENGTH
name $ 24;
By the way, SAS variable names must begin
with a letter or an underscore. They can include numbers, but not blanks or
special characters.
Since I had so many variables, I also
included a large LABEL statement. Notice that everything, from the word LABEL
to the semicolon 14 lines later, is a single statement. The variable specifying
one of the 8 manufacturers was represented with a single character. I included
a definition of these codes as a COMMENT prior to the DATA step. You will need
a LABEL statement, but not as large as mine.
New procedures
PROC FREQ: The FREQ procedure is used to produce frequency
tables with percentages. Eventually we will also use this procedure to do Chi
Square Tests of Independence and Chi Square Tests of Goodness
of Fit.
Other
statements can be used together with the PROC FREQ statement such as:
1) TABLE statement: – controls the tables to be
printed. When a single variable is listed in the table statement the procedure
will output a table with the frequency, percent (relative frequency), cumulative
frequency and cumulative percent (relative cumulative frequency). The variable
in the TABLE statement can be either quantitative or qualitative. The following
statements will produce this table for the cereal calorie variable.
proc freq data=Cereal;
Title3 'Frequencies of calories';
table calories;
run;
When more than one variable is listed in a
PROC FREQ TABLE statement the procedure produces a two-way table, or series of
two-way tables for the variables. The following will produce a two-way table of
the shelf where the cereal is presented in the supermarket (shelf: 1=bottom,
2=middle and 3=top) and the list of the 8 cereal manufacturers (mfr). In addition to the frequency the table will have the
overall table percent, row percent and column percent for each cell of the
table and the marginal frequencies (row and column) with percentages.
proc freq data=Cereal;
Title3 'Frequencies of shelf &
manufacturer categories';
table shelf * mfr;
run;
2) BY statement: – can be used as with most other procedures.
3) And of course the ever popular TITLE statements can be
included.
PROC CHART:
PROC CHART is one of a
number of graphical procedures often used for data exploration and examination.
This procedure can be used to produce a number of different styles of graphic
depending on the statements that are included. The variable to be processed is
named in the statement. Some of these statements are
HBAR – a horizontal bar chart that will
also include information on frequency, percent (relative frequency), cumulative
frequency and cumulative percent (relative cumulative frequency)
proc chart data=Cereal;
hbar calories;
run;
VBAR – a vertical bar chart often called a
histogram
proc chart data=Cereal;
vbar sugars / midpoints=0 to 14 by 2;
run;
BLOCK – produces a 3D plot with two
variables (sugars and shelf) on a surface and blocks who’s height represent a
third “response” variable. The default for the response is frequency of
occurrence in each combination of the first two variables. The response
variable can also be percents, sums or means.
proc chart data=Cereal;
block sugars / discrete group=shelf midpoints=0 to 15 by 5;
run;
PIE, STAR, DONUT – yields pie chart and
similar charts
proc chart data=Cereal;
pie mfr;
run;
PROC CHART OPTIONS: A number of options are available to
modify the appearance of charts. We will not discuss size and resolution
options here, but some other important options are listed below. The options
below are placed on the chart type statement following a slash (i.e. /).
MIDPOINTS = midpoint_list – as discussed in
class, when a quantitative variable is turned into a frequency table it is
often necessary to divide the numbers into size categories. The same is true of
bar charts. By default SAS will determine groupings, or midpoints for
groupings. However, you can set your own midpoints with the MIDPOINT option;
see examples below.
vbar sugars / midpoints= 0 to 14 by 2; * by range and interval;
vbar sugars / midpoints= 3 6 9 12 15; * by specific values;
vbar sugars / midpoints= 2 4 8 16; * by unequal spacing;
DISCRETE – indicates that the quantitative
values are to be treated as discrete categories and not as a quantity so
midpoints will not be calculated.
GROUP = variable_name – This option
produces a groups of bars with the HBAR and VBAR statements. It specifies the
second axis in the BLOCK statement.
Assignment 3
The dataset (Table 1.1 from Freund, Wilson & Mohr):
The data is responses of 50 respondents on their level of happiness (Likert
scale: 1=Not too happy, 2=Pretty happy, 3=Very happy). Additional information
includes the age and sex of the respondent and the average number of hours of
TV watched daily. The complete dataset is available on the Lab web page.
respondent age sex happy tvhours
1 41 1 2 0
2 25 2 1 0
3 43 1 2 4
4 38 1 2 2
5 53 2 3 2
6 43 2 2 5
7 56
2 2 2
8 53 1 2 2
9 31 2 1 0
10 69
1 3 3
. . .
45 74
2 2 3
46 37
2 3 0
47 48
1 2 3
48
42 2 2 6
49 77
2 2 2
50 75
1 3 0
Suppose you are going to examine this data for
happiness. Answer any questions posed using SAS. Turn
in your program log and, for each question, please turn in the relevant SAS
output. Try to organize your responses for clarity. (1 point)
1) You will want to include in your program the “usual
statements” with option, comments and titles similar to those in ASSIGNMENT 01
and 02. (2 points)
Include appropriate title statements. (1 point)
Create a data step to enter the data set above. The data
step will include an input statement and a data statement. To these statements
add the following LABEL statement: (1 point)
label happy = 'Happyness
index: 3=happiest'
sex = '1 = Male';
2. Print data for all the respondents. (1
point)
3. Do a frequency procedure creating a
table for sex * tvhours. (1 point)
4. Prepare a horizontal bar chart for the
variable tvhours. (1 point)
5. Create a vertical bar chart of the
variable age. (1 point)
6. Redo the vertical bar chart of the
variable age grouped from 20 to 80 by 10. (1 point)
7. Create a BLOCK chart of the variable HAPPY
and the group SEX. Be sure to specify the option DISCRETE so the procedure does
not try to calculate midpoints for the variable HAPPY. (1 point)
8. Do the pie chart for the variable age,
also specifying discrete. (1 point)