perl - Process two space delimited text files into one by common column -
this question has answer here:
i have 2 text files like:
col1 primary col3 col4 blah 1 blah 4 1 2 5 6 ... and
cola primary colc cold 1 1 7 27 foo 2 11 13 i want merge them single wider table, such as:
primary col1 col3 col4 cola colc cold 1 blah blah 4 7 27 2 1 5 6 foo 11 13 i'm pretty new perl, i'm not sure best way this. note column order not matter, , there a couple million rows. files unfortunately not sorted.
my current plan unless there's alternative: given line in 1 of files, scan other file matching row , append them both necessary new file. sounds slow , cumbersome though.
thanks!
solution 1.
read smaller of 2 files line line, using standard cpan delimited-file parser
txt::csv_xsparse out columns.save each record (as arrayref of columns) in hash, merge column being hash key
when done, read larger of 2 files line line, using standard cpan delimited-file parser
txt::csv_xsparse out columns.for each record, find join key field, find matching record hash storing data file#1, merge 2 records needed, , print.
note: pretty memory intensive entire smaller file live in memory, won't require read 1 of files million times.
solution 2.
sort file1 (using unix
sortor simple perl code) "file1.sorted"sort file2 (using unix
sortor simple perl code) "file2.sorted"open both files reading. loop until both read:
read 1 line each file buffer if buffer file empty (buffer being variable containing next record).
compare indexes between 2 lines.
if index1 < index2, write record file1 output (without merging) , empty buffer1. repeat step 3
if index1 > index2, write record file2 output (without merging) , empty buffer2. repeat.
if index1 == index2, merge 2 records, write merged record output , empty out both buffers (assuming join index column unique. if not unique, step more complicated).
note: not require keep entire file in memory, aside sorting files (which can done in memory constrained way if need to).
Comments
Post a Comment