top of page

Lecture 4: Statistics: a review 

How to summarize a distribution using a single number? 

​

​

​

​

​

​

​

​

​

​

​

  1. Central tendency: median, mean, mode

  2. Measure of variability: measure of dispersal

  3. Measure of skewness

  4. Measure of Kurtosis: measure of the peak

 

Why use median rather than the mean?

  • Mean: if a data contains many outliers, the mean considers all outliers and fail to represent the overall data set.

  • Median: consider the skewness of a data. If there are many outliers and the data is skewed distributed, median should be used.

 

Regression Analysis

Regression analysis considers explanatory dependent variable and predicted independent variable.

  • Relations among variables

    • Negative, positive relations, non-linear relations

  • How many cases?

    • What is the minimum number of variables: at least 30 cases before doing statistical analysis

  • Too many variables?

    • Too many variables make the statistics suspicious: difficult to predict and find pattern of them

  • Residuals should be ___?

    • Should be hetero statistic or homo statistic

    • Homo statisity: evenly dispersed residuals

 

Regression Modeling:

Ordinary Least Squares (OLS): OLS is a global model of regression analysis. It provides a single regression equation to represent the process of the whole study area. This method finds the line that best fits the observation. It does this by finding the line that results in the lowest sum of squared residuals.  

 

Geographically weighted regression (GWR): GWR is a local model of regression analysis. It constructs an equation for every feature in the dataset incorporating the dependent and explanatory variable of features. By doing so, it predicts and understands variables in a data.

bottom of page