i'm cleaning variable - last_name
- names middle name included after comma, while names middle name stored in variable middle_name
.
here examples:
last_name smith, r anderson, jay epps,william mckinsey,f
this code:
split last_name, p(,) replace last_name = substr(last_name, 1, length(last_name)-3) if /// length(last_name2)==3
i put through forvalues loop , increase length of strings i'm dropping feels crude method. there cleaner way drop of values after comma (or other character)?
find position of (first) comma.
subtract 1.
that gives substring kept.
replace last_name = substr(last_name, 1, strpos(last_name, ",") - 1)
generalize other character.
but should done if there such character:
replace last_name = substr(last_name, 1, strpos(last_name, ",") - 1) if strpos(last_name, ",")
however, don't lose information may want later. it's better create new variable:
gen surname = substr(last_name, 1, strpos(last_name, ",") - 1) replace surname = last_name if missing(surname)
Comments
Post a Comment