unix - Need help in vlookup in linux using awk -


i have 2 data files. 1 having 1600 rows , other 1 having 2 million rows(tab delimited files). need vlookup between these 2 files. please see below example expected output , kindly let me know if it's possible. i've tried using awk, couldn't expected result.

file 1(small file)
bc1 10 100
bc2 20 200
bc3 30 300

file 2(large file)
bc1 xyz
bc2 abc
bc3 def

expected output:
bc1 10 100 xyz
bc2 20 200 abc
bc3 30 300 def

i tried join command. taking forever complete. please me find solution. thanks

this awk script scan line line each file, , try match number in bc column. once matched, print columns. if 1 of files not contain 1 of numbers, skipped in both files , search next one. loop until 1 of files ends. script accepts number of columns per file , number of files, long first column bc , number. this awk script assumes files ordered minor major number in bc column (like in example). otherwise not work.

to execute script, run command:

awk -f vlookup.awk smallfile bigfile 

the vlookup.awk file have content:

begin {files=1;lines=0;maxlines=0;filelines[1]=0;   #number of columns sod, prn, reference file  col_bc=1;  #initialize variables bc_now=0;  new_bc=0;  end_of_process=0;  aux=""; text_result=""; } { if(filename!=argv[1])exit;  no_bc=0; new_bc=0;  #save number of columns nfields[1]=nf;  #copy reference file data for(j=0;j<=nf;j++)  {     file[1,j]=$j; }  #read lines file for(i=2;i<argc;i++) {     ret=getline < argv[i];     if(ret==0) exit; #end of file reached     #copy columns file variable     for(j=0;j<=nf;j++)      {         file[i,j]=$j;     }     #save number of columns     nfields[i]=nf;  }  #check files in same number for(i=1;i<argc;i++)  {     bc[i]=file[i,col_bc];     bc[i]=sub("bc","",file[i,col_bc]);     if(bc[i]>bc_now) {bc_now=bc[i];new_bc=1;}        }  #one or more files have new number if (new_bc==1) {     for(i=1;i<argc;i++)     {         while(bc_now!=file[i,col_bc])         {             #read next line file             if(i==1) ret=getline; #file 1 reference file             else ret=getline < argv[i];             if(ret==0) exit; #end of file reached             #copy columns file variable             for(j=0;j<=nf;j++)              {                 file[i,j]=$j;             }             #save number of columns             nfields[i]=nf;             #check if in current file data has gone next number             if(file[i,col_bc]>bc_now)              {                 no_bc=1;                 break;               }             #no more data lines compare, end of comparison             if(filename!=argv[1])             {                 exit;             }         }         #if number not in file, process realign must restarted next number available (exit loop)         if (no_bc==1) {break;}     }     #if number not in file, process realign must restarted next number available (continue while loop)     if (no_bc==1) {next;}  }    #number aligned for(i=1;i<argc;i++) {     for(j=2;j<=nfields[i];j++) {          #join colums in text_result variable         aux=sprintf("%s %s",text_result,file[i,j]);         text_result=sprintf("%s",aux);      } } printf("bc%d%s\n",bc_now,text_result) #reset text variables aux=""; text_result="";     }  

Comments