Topic A1/A2 — Introducing Statistics + Tables & Graphs
Table of contents
Topic A1 — Introducing Statistics
Difference between descriptive and inferential statistics
Descriptive, well… describe a population or a sample. Tells about the features of the data. Examples of desciptive statistics?
- Climate change
- Road deaths
- [Econ example] Tax data
Inferential stats start from a sample and try to generalize to the population. It also tries to draw relationships between variable, testing hypotheses and even draw predictions.
Bottom line: we need representativeness in our samples
Data types
A good way to think about data types is to think in terms of two dimensions: time and individuals/subjects.
It’s a very simple model: time and subject can take either value ‘1’ or ‘many’. So a the start, you have 1 time point and 1 individual. But you can’t say much with that can you? That’s just anecdotal evidence. So let’s see what we get when we expand these dimensions.
Cross-sectional data
A bunch of individuals, at one period in time
- [Class ex] Mortality rate
- Guiness consumption in this class in September
Like a snapshot: you observe 100ppl at time T
The key is that it only looks at one specific time point or (or period, summarized as a point).
Time series data
One individual over time. It’s like a tracker measuring a specific variable over time. Like your step counter on your phone or smartwatch, or you personal daily consumption of Guiness over the month.
Panel data
It’s a tracker on a bunch of people. So it would be getting all of you guys’ step counter and appending them together to form one dataset: a panel data set. In terms of Guiness consumption, it’s your and your classmates daily consumption over the month, for example.
[Show examples of how it appears in a table, stat software]
Quantitative vs qualitative variables
Quantitative is basically stuff you can measure with numbers. Any examples?
- CO2 concentration in the atmosphere (continuous)
- Liters of Guiness (continuous)
- Pints of Guiness (0 to 2; more than 2 to 4; etc)
- Numbers of brothers and sisters (discrete, can’t slice it in smaller pieces)
- Elevators in a building (discrete)
Qualitative is describing a certain state or feature. Any examples?
- Computer is working / not working (state, nominal)
- Member of political party (nominal)
- Able to speak Irish (nominal)
- “Agree with this statement… (1) mostly not (2) not really (3) indifferent (4) a bit (5) mostly yes” (ordinal)
Coded in what is called a dummy variable.
Discrete vs continuous variables
[Done above]
Nominal vs ordinal variables
[Done above]
Interval vs Ratio
Interval when the interval between two values is meaningful.
Ratio has a clear definition of 0.0 of “it”: there is none (no quantity) when variable equals 0.
Temperature: C° or F° ar interval, Kelvins are ratio.
Topic A2 — Tables & Graphs
In this topic we divide quantitative data in classes. It can be arbitrary, or based on some classification.
Absolute frequency
It’s the absolute number of times this category appears in your data.
Relative frequency
It’s the absolute times it appears divided by the total number of observations
Cumulative frequency
When categories are ordered, you can “stack” the absolute frequencies (total goes to the total number of observations).
Cumulative relative frequency
Same as above but with relative frequencies (total 100%).
Histogram
Visual representation of frequency distribution. Two main characteristics:
- Heights: frenquency
- Width: class width
Polygon
Same, but linking the top of the bars together
Ogive
Plots the cumulative relative frequency. The x-axis represents the upper limits of each classes.
Scatter plots
It’s a spatial representation of two variables and shows their relationship together.
When hours of the day vary, does Guiness consumption vary? If you plot time as the x-axis and pints on the y-axis, can we draw the scatter plot?