Skip to content

Statistics

atecon edited this page Jan 20, 2024 · 10 revisions

OLS regression

Estimation

The following example shows how to run a simple OLS regression and how to store post-estimation information.

open abdata.csv --quiet

ols ys const n w   #OPTIONAL: --robust

matrix coeff = $coeff  # point estimates
matrix stderr = $stderr  # std. error
series uhat = $uhat  # residuals
series yhat = $yhat  # fitted values

Specification tests

The modtest command provides various specification tests which can be conducted after having estimated a model. Another command is reset for running Ramsey's RESET test:. Here are examples:

open abdata.csv --quiet
ols ys const n w

modtest --normality --quiet
modtest --white --quiet
reset --squares-only --quiet

Hypothesis testing

Gretl allows you to test hypothesis in a simple manner.

omit variables

First you can call the omit command for testing zero restrictions on coefficients. Here is a simple example for testing the removal of two variables by means of an F-Test:

list X = n w
ols ys const X

# Test the restriction but do not re-estimate the model
omit X --test-only

# Test the restriction and re-estimate the model
omit X

The output is:

Test on Model 3:

  Null hypothesis: the regression parameters are zero for the variables
    n, w
  Test statistic: F(2, 1028) = 4.43312, p-value 0.0121052

Set of linear restrictions

The restrict-block command provides a powerful apparatus for testing set of (non-)linear restrictions. Here is an example using the --quiet option for avoiding detailed output. You may also try the --bootstrap option:

restrict --quiet # --bootstrap
    b[w] = 0.005
    b[n] - b[w] = 0
end restrict

This returns:

Restriction set
 1: b[w] = 0.005
 2: b[n] - b[w] = 0

Test statistic: F(2, 1028) = 0.224807, with p-value = 0.798709

Non-parametric test for differences between variables

This example illustrates on how to run non-parametric test to test for differences between variables. The example uses simulated series.

##################
## Non-parametric difference tests
##################
set seed 1234 	# only to ensure replicability
nulldata 100 	# cross-sectional dataset

# Create some random variables
series y = normal(0, 2)  # expected value 0
series x = normal(10, 2)  # expected value 2
list L = y x			# define a list of series which can be handy

# Stats and plot
summary L --simple
boxplot L --output=display
freq y --normal --plot=display

# Non-parametric difference tests
help difftest			# see the help for information

difftest y x --sign   # Sign test -- less powerful
printf "\nP-value of the Sign-test = %.2f (test-stat = %g)\n", $pvalue, $test

difftest y x --rank-sum   # Wilcoxon rank-sum test (aka Mann-Whitney U test)
printf "\nP-value of the Wilcoxon rank-sum test = %.2f (test-stat = %g)\n", $pvalue, $test

difftest y x --signed-rank   # Wilcoxon rank test

Parametric regression-based test for differences between categories

This example loads the cross-sectional and well-known MROZ dataset. By means of an OLS regression employing a level dummy, we want to test whether men earn higher wages in large cities compared to small cities.

open mroz87.gdt

boxplot HW CIT --factorized --output=display \
  { set title "Wages of men in small and large cities" font ',15'; }

# Regression for explaining Husband's wage by CIT (0: lives in small city, 1: lives in large city)
ols HW const CIT --robust   # robust standard errors wrt eventual heteroskedasticity
printf "\nThe null hypothesis that wages in large cities are equal  \n\
  to wages in small cities can be rejected at the %.2f pct. \n\
  significance level\n", $pvalue

# Run a restriction by hand
help restrict

# Test the null that hourly wages are on average 3$ higher in large cities
restrict --bootstrap
    b[CIT] = 3
end restrict
Clone this wiki locally