Statistics

Descriptive statistics

Basic example

While there are many statistical functions, the summary command is a simple way to compute descriptive statistics for a list of series. Here is an example only computing the most basic statistics using the --simple option:

open abdata.gdt --quiet
list Y = IND YEAR n 
summary Y --simple

# Store the results as a matrix
summary Y --simple
matrix stats = $result
print stats

The output is:

                 Mean     Median       S.D.        Min        Max
IND             5.123      5.000      2.678      1.000      9.000
YEAR             1980       1980      2.583       1976       1984
n               1.056     0.8272      1.342     -2.263      4.687

stats (3 x 5)

             Mean       Median         S.D.          Min          Max 
 IND       5.1232       5.0000       2.6781       1.0000       9.0000 
YEAR       1980.0       1980.0       2.5830       1976.0       1984.0 
   n       1.0560      0.82724       1.3415      -2.2634       4.6873

Grouped statistics

By means of the --by=Series option, you can also compute statistics for each category of some other variable. The following example prints basic statistics for series n and w for each value of series IND (industry ID):

set verbose off
open abdata.gdt --quiet
list Y = n w
summary Y --by=IND --simple

The output for the first three industries is:

IND = 1 (n = 122):

                 Mean     Median       S.D.        Min        Max
n               1.234      1.095      1.172    -0.5942      4.099
w               3.186      3.183     0.1511      2.757      3.581

IND = 2 (n = 88):

                 Mean     Median       S.D.        Min        Max
n               1.039     0.9792      1.387     -2.104      3.223
w               3.410      3.409     0.1363      2.870      3.812

IND = 3 (n = 89):

                 Mean     Median       S.D.        Min        Max
n              0.7006     0.4324      1.199     -1.726      3.030
w               3.287      3.331     0.1640      2.910      3.614

Aggregation

The aggregate() function is powerful and allows you to aggregate data (like Pivot tables) by means of some aggregation function. Here is a simple example on how to compute the mean values of series n and w for each unique combination of the discrete series IND and YEAR (only showing the initial rows)

open abdata.gdt --quiet
list Y = n w
list groupby = IND YEAR

matrix mean_values = aggregate(Y, groupby, "mean")
printf "\n%12.2f\n", mean_values

The output is:

         IND        YEAR       count           n           w
        1.00     1976.00        8.00        0.89        3.12
        1.00     1977.00       16.00        1.34        3.11
        1.00     1978.00       17.00        1.37        3.09
          .
          .
          .
        2.00     1976.00        8.00        1.51        3.58
        2.00     1977.00       12.00        1.14        3.50
        2.00     1978.00       12.00        1.13        3.44

OLS regression

Estimation

The following example shows how to run a simple OLS regression and how to store post-estimation information.

open abdata.csv --quiet

ols ys const n w   #OPTIONAL: --robust

matrix coeff = $coeff  # point estimates
matrix stderr = $stderr  # std. error
series uhat = $uhat  # residuals
series yhat = $yhat  # fitted values

Specification tests

The modtest command provides various specification tests which can be conducted after having estimated a model. Another command is reset for running Ramsey's RESET test:. Here are examples:

open abdata.csv --quiet
ols ys const n w

modtest --normality --quiet
modtest --white --quiet
reset --squares-only --quiet

Hypothesis testing

Gretl allows you to test hypothesis in a simple manner.

`omit` variables

First you can call the omit command for testing zero restrictions on coefficients. Here is a simple example for testing the removal of two variables by means of an F-Test:

list X = n w
ols ys const X

# Test the restriction but do not re-estimate the model
omit X --test-only

# Test the restriction and re-estimate the model
omit X

The output is:

Test on Model 3:

  Null hypothesis: the regression parameters are zero for the variables
    n, w
  Test statistic: F(2, 1028) = 4.43312, p-value 0.0121052

Set of linear restrictions

The restrict-block command provides a powerful apparatus for testing set of (non-)linear restrictions. Here is an example using the --quiet option for avoiding detailed output. You may also try the --bootstrap option:

restrict --quiet # --bootstrap
    b[w] = 0.005
    b[n] - b[w] = 0
end restrict

This returns:

Restriction set
 1: b[w] = 0.005
 2: b[n] - b[w] = 0

Test statistic: F(2, 1028) = 0.224807, with p-value = 0.798709

Non-parametric test for differences between variables

This example illustrates on how to run non-parametric test to test for differences between variables. The example uses simulated series.

##################
## Non-parametric difference tests
##################
set seed 1234 	# only to ensure replicability
nulldata 100 	# cross-sectional dataset

# Create some random variables
series y = normal(0, 2)  # expected value 0
series x = normal(10, 2)  # expected value 2
list L = y x			# define a list of series which can be handy

# Stats and plot
summary L --simple
boxplot L --output=display
freq y --normal --plot=display

# Non-parametric difference tests
help difftest			# see the help for information

difftest y x --sign   # Sign test -- less powerful
printf "\nP-value of the Sign-test = %.2f (test-stat = %g)\n", $pvalue, $test

difftest y x --rank-sum   # Wilcoxon rank-sum test (aka Mann-Whitney U test)
printf "\nP-value of the Wilcoxon rank-sum test = %.2f (test-stat = %g)\n", $pvalue, $test

difftest y x --signed-rank   # Wilcoxon rank test

Parametric regression-based test for differences between categories

This example loads the cross-sectional and well-known MROZ dataset. By means of an OLS regression employing a level dummy, we want to test whether men earn higher wages in large cities compared to small cities.

open mroz87.gdt

boxplot HW CIT --factorized --output=display \
  { set title "Wages of men in small and large cities" font ',15'; }

# Regression for explaining Husband's wage by CIT (0: lives in small city, 1: lives in large city)
ols HW const CIT --robust   # robust standard errors wrt eventual heteroskedasticity
printf "\nThe null hypothesis that wages in large cities are equal  \n\
  to wages in small cities can be rejected at the %.2f pct. \n\
  significance level\n", $pvalue

# Run a restriction by hand
help restrict

# Test the null that hourly wages are on average 3$ higher in large cities
restrict --bootstrap
    b[CIT] = 3
end restrict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Statistics

Descriptive statistics

Basic example

Grouped statistics

Aggregation

OLS regression

Estimation

Specification tests

Hypothesis testing

`omit` variables

Set of linear restrictions

Non-parametric test for differences between variables

Parametric regression-based test for differences between categories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Statistics

Descriptive statistics

Basic example

Grouped statistics

Aggregation

OLS regression

Estimation

Specification tests

Hypothesis testing

omit variables

Set of linear restrictions

Non-parametric test for differences between variables

Parametric regression-based test for differences between categories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`omit` variables