Topic A1/A2 — Introducing Statistics + Tables & Graphs

Table of contents
  1. Topic A1/A2 — Introducing Statistics + Tables & Graphs
    1. Topic A1 — Introducing Statistics
      1. Difference between descriptive and inferential statistics
      2. Data types
      3. Cross-sectional data
      4. Time series data
      5. Panel data
      6. Quantitative vs qualitative variables
      7. Discrete vs continuous variables
      8. Nominal vs ordinal variables
      9. Interval vs Ratio
    2. Topic A2 — Tables & Graphs
      1. Absolute frequency
      2. Relative frequency
      3. Cumulative frequency
      4. Cumulative relative frequency
      5. Histogram
      6. Polygon
      7. Ogive
      8. Scatter plots

Topic A1 — Introducing Statistics

Difference between descriptive and inferential statistics

Descriptive, well… describe a population or a sample. Tells about the features of the data. Examples of desciptive statistics?

  • Climate change
  • Road deaths
  • [Econ example] Tax data

Inferential stats start from a sample and try to generalize to the population. It also tries to draw relationships between variable, testing hypotheses and even draw predictions.

Bottom line: we need representativeness in our samples

Data types

A good way to think about data types is to think in terms of two dimensions: time and individuals/subjects.

It’s a very simple model: time and subject can take either value ‘1’ or ‘many’. So a the start, you have 1 time point and 1 individual. But you can’t say much with that can you? That’s just anecdotal evidence. So let’s see what we get when we expand these dimensions.

Cross-sectional data

A bunch of individuals, at one period in time

  • [Class ex] Mortality rate
  • Guiness consumption in this class in September

Like a snapshot: you observe 100ppl at time T

The key is that it only looks at one specific time point or (or period, summarized as a point).

Time series data

One individual over time. It’s like a tracker measuring a specific variable over time. Like your step counter on your phone or smartwatch, or you personal daily consumption of Guiness over the month.

Panel data

It’s a tracker on a bunch of people. So it would be getting all of you guys’ step counter and appending them together to form one dataset: a panel data set. In terms of Guiness consumption, it’s your and your classmates daily consumption over the month, for example.

[Show examples of how it appears in a table, stat software]

Quantitative vs qualitative variables

Quantitative is basically stuff you can measure with numbers. Any examples?

  • CO2 concentration in the atmosphere (continuous)
  • Liters of Guiness (continuous)
  • Pints of Guiness (0 to 2; more than 2 to 4; etc)
  • Numbers of brothers and sisters (discrete, can’t slice it in smaller pieces)
  • Elevators in a building (discrete)

Qualitative is describing a certain state or feature. Any examples?

  • Computer is working / not working (state, nominal)
  • Member of political party (nominal)
  • Able to speak Irish (nominal)
  • “Agree with this statement… (1) mostly not (2) not really (3) indifferent (4) a bit (5) mostly yes” (ordinal)

Coded in what is called a dummy variable.

Discrete vs continuous variables

[Done above]

Nominal vs ordinal variables

[Done above]

Interval vs Ratio

Interval when the interval between two values is meaningful.

Ratio has a clear definition of 0.0 of “it”: there is none (no quantity) when variable equals 0.

Temperature: C° or F° ar interval, Kelvins are ratio.


Topic A2 — Tables & Graphs

In this topic we divide quantitative data in classes. It can be arbitrary, or based on some classification.

Absolute frequency

It’s the absolute number of times this category appears in your data.

Relative frequency

It’s the absolute times it appears divided by the total number of observations

Cumulative frequency

When categories are ordered, you can “stack” the absolute frequencies (total goes to the total number of observations).

Cumulative relative frequency

Same as above but with relative frequencies (total 100%).

Histogram

Visual representation of frequency distribution. Two main characteristics:

  • Heights: frenquency
  • Width: class width

Polygon

Same, but linking the top of the bars together

Ogive

Plots the cumulative relative frequency. The x-axis represents the upper limits of each classes.

Scatter plots

It’s a spatial representation of two variables and shows their relationship together.

When hours of the day vary, does Guiness consumption vary? If you plot time as the x-axis and pints on the y-axis, can we draw the scatter plot?