Skip to content

Data handling

atecon edited this page Jan 19, 2024 · 5 revisions

This section, shows some examples on how to handle datasets.

Create an artificial dataset

Cross-sectional dataset

set verbose off  # avoid detailed printouts
clear   # clear memory

nulldata 3  # three observations (rows)

# Create a normally-distributed random variable
scalar mean = 4
scalar std_dev = 0.5
series y = normal(mean, std_dev)

series y_sq = y^2
series log_y = log(y)
series exp_y = exp(y)
series x = {1, 2, 3}'
series z = y - x

print y y_sq log_y exp_y x z --byobs

Returns the output:

             y         y_sq        log_y        exp_y            x

1     3.035885      9.21660     1.110503      20.8194            1
2     4.053212     16.42853     1.399510      57.5821            2
3     4.993476     24.93480     1.608132     147.4481            3

             z

1     2.035885
2     2.053212
3     1.993476

Create binary dummies

Let's open some sample dataset shipped by Gretl and create a binary dummy which takes the value of 1 if series YEAR is either 1977 or 1980:

open abdata.gdt --quiet
series DUM = (YEAR == 1977 || YEAR == 1980)
print YEAR DUM -o --range=1:10    # print the first ten entries 

The output is:

            YEAR          DUM

1:1         1976            0
1:2         1977            1
1:3         1978            0
1:4         1979            0
1:5         1980            1
1:6         1981            0
1:7         1982            0
1:8         1983            0
1:9         1984            0
2:1         1976            0

Metadat

Add metadata to series

A series object in Gretl can include some metadata such as a descriptive labels. One can also set the description which should appear when plotting a series. Here is an example:

nulldata 3
series y = normal()

# Add a series description
setinfo y --description="Some random number"

# Instead of 'y' showing up in a graph, show another description
setinfo y --graph-name="Cool variable"

boxplot y --output=display   # See the output

Replace values

Series

Suppose you have a weirdly valued dataset such as:

set verbose off
nulldata 5
series weird_values = {5, 6, 10, 20, NA}'
print weird_values --byobs

By means of the replace() function, you we want to replace value 5 by 0, 6 by 1, 10 by 3, 20 by 4 and missing values (NA) by -1:

# Let’s replace values
help replace
matrix find = {5, 6, 10, 20, NA}
matrix replace_by = {0, 1, 2, 3, -1}

# Create new series y with replaced values 
series y = replace(weird_values, find, replace_by)

print weird_values y --byobs 

The result is:

  weird_values            y

1            5            0
2            6            1
3           10            2
4           20            3
5                        -1

More complicated example

Suppose you have a dataset with integer values ranging from 0 to 20. You to replace numbers from 0-5 by 1, 6-10 by 2, 11-20 by 3. How to do this? See here:

nulldata 40    # some empty dataset

# Discrete random numbers between 0 and 20
series old = randgen(i, 0, 20)
# print Var_alt --byobs

series new = NA  # Initialize an empty series

# Replace 0-5 by 1
matrix find = seq(0, 5)
scalar subst = 1
series new = replace(old, find, subst)

# Replace 6-10 by 2
matrix find = seq(6, 10)
scalar subst = 2
series new = replace(new, find, subst)

# Replace 11-20 by 3
matrix find = seq(11, 20)
scalar subst = 3
series new = replace(new, find, subst)

print old new --byobs

Gets you:

        Var_alt      Var_neu

 1           15            3
 2           13            3
 3           10            2
 4           20            3
 5            6            2
 6            5            1
 7            5            1
 8            8            2
 9           10            2
10           14            3

String valued series

In Gretl you can also create a string-valued series.

Clone this wiki locally