Below we have listed some frequently overlooked commands that we enjoy using. They will make your life a lot easier if you know about them. We provide a quick summary of what the commands do and where they can be used.
Where we appropriate, there is a page dedicated to specific commands that provides more examples of our common uses tips and tricks, and potential concerns to be aware of when using these particular command. For full documentation simply type help [commandname]
in Stata to read all about available options and examples for usage.
fillin
This is a simple to use, but yet powerful command. Frequently in cleaning data sets, you will have an unbalanced panel, i.e. you are missing an observation for one person for some time periods. For example, imagine you have a dataset of your sample that is supposed to have one observation for each survey round such as the baseline and two endlines. However, as is common, some people were not found in the endline surveys and thus there is no observation for them at that endline. You can use this command to create all of the pairwise combinations of values of two variables i.e. you would have every survey round observation for every person.
Frequently in cleaning data sets, you will have an unbalanced panel, i.e. you are missing an observation for one person for some time periods. For example, imagine you have a dataset of your sample that is supposed to have one observation for each survey round such as the baseline and two endlines. However, as is common, some people were not found in the endline surveys and thus there is no observation for them at that endline. See the example data below where you can see person 1 is missing an observation at endline 2.
ID | Survey_Round | Gender |
---|---|---|
1 | baseline | Female |
1 | endline1 | Female |
2 | baseline | Male |
2 | endline1 | Male |
2 | endline2 | Male |
We can use fillin ID Survey_Round
to get the following
ID | Survey_Round | Gender | _fillin |
---|---|---|---|
1 | baseline | Female | 0 |
1 | endline1 | Female | 0 |
1 | endline2 | . | 1 |
2 | baseline | Male | 0 |
2 | endline1 | Male | 0 |
2 | endline2 | Male | 0 |
A couple things to note:
- This command automatically creates an indicator variabel called _fillin that keeps track of observations that were created from the command
- Variables not specified in the
fillin
are filled in with a missing value. So after you runfillin
you will need to go back and replace observations as you see appropriate. For example, for typically constant variables such as gender you could replace this new missing value to match the one above it.
sort ID Survey_Round
replace Gender = Gender[_n-1] if ID == ID[_n-1] & _fillin == 1
labeldup
labeldup
is a user-written command that compares value labels which have duplicate contents. For example, if in your dataset the variable q1
has a value label named q1_label
with 0 "No" 1 "Yes"
and the variable q2
has the same value label (the SurveyCTO standard). By using labeldup, select
, the q1
and q2
will be combined to a single value label that describes both variables. This is an easy way to cut down on duplicate information in value labels.
labelrename
This user-written command allows you to rename value labels using similar syntax to the rename
command. Stata does not allow you to rename value labels using the label values
command. This command adds that functionality with the similar syntax as the rename
comman. One difference is that no wildcards are allowed.
levelsof
This command will provide you with a list of all of the unique values of a variable. This comes in handy all of the time when cleaning. A very helpful option that comes with this command is local()
in which you can store the list of values as a local. This makes looping through all of the values of a variable possible and easy.
lookfor
This command searches across all variable names and variable labels, and allows for searching for more than one string or a phrase through the use of "[string]"
. Since lookfor
searches across both variable names and variable labels at the same time, it will return a different set of results than ds
can.
missings
This is not a built-in Stata command and thus you will need to type ssc install missings
to get this command before you can use it or read the help file.
This group of commands allows for several ways to investigate the missing values in variables. Missing values are typically forgotten about or ignored, but that is a big mistake. They can complicate cleaning and analysis greatly. You should be aware of the missing values and variables in your dataset. This command includes ways to report, list, tag, and drop missing values and variables.
A very helpful command within this group is missings dropvars
which allows you to eliminate variables that are missing on all observations. This is much easier than looping through vars and obs to assert they are missing to drop them.
mmerge
This is not a built-in Stata command and can be installed by typing ssc install mmerge
.
mmerge
provides additional options for merge
that make merging datasets more user friendly by removing a signfiicant amount of preprocessing work for complex datasets:
- It allows for missing values in the match variable and let’s the user determine how they are treated with the
missing()
option. - It allows for differing merge variable names with the
umatch()
option - It allows for all merged variables from the using file to be prefixed as part of the merge with the
uname()
option - It stores shared variable names that are not merged in a local
r(common)
There are a number of other options and syntax changes. You can see these by typing help mmerge
once mmerge
is installed.
return list
This command allows you to see all of the stored results in working memory and their value. For example, if you ran summarize [variarble]
, the return list
command would show all of the scalars stored by summarize
. In this case that would be r(N)
, r(sum_w)
, and r(sum)
. Similar commands also allow you to see estimation and system commands using ereturn list
and creturn list
. The help file (help return list
) shows more options.
sencode
This is not a built-in Stata command and can be installed by typing ssc install sencode
.
sencode
makes a number of improvements on the Stata command encode
by including a options that allow you to replace the string variable with the numeric variable and set the order of encded values in a user-specified ways, among other user-friendly additions.