Stata

What is Stata?

Stata is a statistical software package that is commonly used in the social sciences and economics. It is widely used at IPA for data analysis and management. It offers a comprehensive library of methods for data cleaning, descriptive statistics, and econometric analysis. Stata is very well suited for research data workflows and research design tasks, including power calculations, sample design adjustments, panel data analysis, time series analysis, etc. See Stata Features for a full list of what Stata makes available.

How to install Stata?

IPA staff can download and install the relevant version (.exe for Windows, .dmg for MacOS, or .tar.gz for Linux) from IPA on the Box installation packages.

Coding Conventions

See the following resources for coding conventions in Stata:

Using Stata from Python

Within a Python script or Jupyter Notebook, you can call Stata using pystata.

import stata_setup

# set configuration to the path where Stata is installed and the flavor of Stata
# in the case below, we're using Stata 18 SE
stata_setup.config("C:/Program Files/Stata18/", "se")

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 18.5
___/   /   /___/   /   /___/       SE—Standard Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: Unlimited-user network, expiring 22 Jan 2025
Serial number: 401809300803
  Licensed to: Niall Keleher
               Innovations for Poverty Action

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

Call Stata using pystata API functions

from pystata import stata
stata.run(
    """
    sysuse auto, clear
    reg mpg price i.foreign
    """
)

. 
.     sysuse auto, clear
(1978 automobile data)

.     reg mpg price i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     23.01
       Model |  960.866305         2  480.433152   Prob > F        =    0.0000
    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932
-------------+----------------------------------   Adj R-squared   =    0.3761
       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597
             |
     foreign |
    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407
       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605
------------------------------------------------------------------------------

.     
. 

Or use IPython magic commands to run Stata code in a Jupyter Notebook.

%%stata
sysuse auto, clear
describe

. sysuse auto, clear
(1978 automobile data)

. describe

Contains data from C:\Program Files\Stata18/ado\base/a/auto.dta
 Observations:            74                  1978 automobile data
    Variables:            12                  13 Apr 2022 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear ratio
foreign         byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------
Sorted by: foreign

. 
%stata scatter mpg price

Data Visualization

Consider installing the ipaplots for the IPA graph schema in Stata.

Learning References

For more information on learning and using Stata, see the IPA-Stata-Trainings repository on GitHub.