Dataset Management

Variables should only keep necessary variables. Those variables should be ordered in a understandable way, and should be named and labeled. They should also be in the correct storage format for analysis. The clearest way to do this may vary, especially with variable order. The order that questions appear in the survey is a good candidate. Unique identifiers should always be first.

Any script that saves data should have code that identifies the variables saved, orders them, and describes them for readers. This will ensure that a reader can look at the code and understand what it produces without running a do file. An example codeblock for the end of a do file follows. Note that values are described using comments and the file ends with some commented marker.

**B. Sort and clean vars
	isid hhid // confirm Household ID is unqiue
	sort hhid // sort in a unique order

	*Create a local of variables
	loc vars						///
	hhid		enum_id					/// ID Variables
	cluster		survey_date		form_id		/// File source variables
	treatment	scto_rand				/// Treatment assignment 
	bl_hhh_age	bl_hhh_female		bl_hhh_educ	/// Baseline demos
	bl_hh_size						/// 
	bl_cons_veg_*	bl_cons_meat_*		bl_cons_purch_*	/// Consumption
	bl_cons_alc						/// 
	bl_loan_size	bl_loan_exp_pay_m*	bl_loan_miss_m*	/// Loan information
	bl_msf		bl_otaf					// Lender Fees

	*Keep necessary values
	qui ds `vars', not
	assert `: word count `r(varlist)'' == 0 // check no variables dropped
	keep `vars'

	*Order ID first
	order `vars'


**C. Save and close
	*Save data to the data folder
	save "${data}01a_baseline.dta", replace

	*Close the log
	log c


**EOF**