|
On May 5, 1:22 pm, "Andrew Z." <ahz...@gmail.com> wrote:
> How do I reference the data set 'correct_domains' in the do loop, so I
> can look for close Levenshtein distances (to find misspelled domains)?
>
> data correct_domains;
> input domain $200.;
> infile datalines truncover;
> datalines;
> yahoo.com
> gmail.com
> hotmail.com
> aol.com
> comcast.net
> msn.com
> sbcglobal.net
> verizon.net
> bellsouth.net
> cox.net
> att.net
> ;;;;
> run;
>
> data check_these_domains;
> input domain $200.;
> infile datalines truncover;
> datalines;
> yahoo.cm
> gmail.co
> hotmial.com
> aol.com
> comcast.net
> ;;;;
> run;
proc sql;
create table matches as
select
a.domain as correct,
b.domain as found
from
correct_domains a,
check_these_domains b
where
complev(a.domain, b.domain) in (1,2)
;
quit;
Found here
http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/280fc617d09e8715/4efa2b32e287eee6?lnk=gst&q=COMPLEV#4efa2b32e287eee6
Andrew
|