Perl script to check doc_b against doc_a for inconsistence

Need help with C, C++, perl, python, etc?

Perl script to check doc_b against doc_a for inconsistence

Postby Perl script » 2004-11-09 15:24

Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

1)
Original document "doc_a"
Code: Select all
Check this link to sea what scannars are supported by SANE

Already having 2 typing mistakes
sea
scannars

2)
The reproduced document "doc_b" must maintain these 2 mistakes for consistence.
Code: Select all
check thes link to sea what scannars are suppurted by SeNE

Unfortunately another 3 typing mistakes were further made;
thes
suppurted
SeNE

What I expect to have in the printout is;
Code: Select all
Original    Mistake Line No. Word No.
this     thes     1         2
supported suppurted 1         9
SANE     SeNE     1         11

not just printing out their contents and saying "differ"

Kindly advise how to start. TIA

B.R.
satimis
Perl script
 

Postby lacek » 2004-11-10 15:07

Here is a quick perl script for this. It simply reads two files line by line, splits the lines and compares the words.
It is a mess, but you'll get the idea...

Code: Select all
#!/usr/bin/perl

die "Gimme 2 files\n" unless (-f $ARGV[0]) && (-f $ARGV[1]);

open (FILE1,$ARGV[0]) or die $ARGV[0].":$!";
open (FILE2,$ARGV[1]) or die $ARGV[1].":$!";

print "Original\tMistake\tline\tword\n";
while (<FILE1>) {
 chomp;
 $line1=<FILE2>;
 chomp($line1);
 
 @f1=split(/\s/,$_);
 @f2=split(/\s/,$line1);
 
 $line=0;
 for ($i=0;$i<$#f1;$i++) {
  $line++;
  if (lc($f1[$i]) ne lc($f2[$i])) {
   print $f1[$i]."\t".$f2[$i]."\t$line\t$i\n";
  }
 }
 
}
lacek
Moderator Team Member
 
Posts: 769
Joined: 2004-03-11 18:49
Location: Budapest, Hungary

Postby Jeroen » 2004-11-10 16:26

Did you look at wdiff already?

It's diff, but than also inline, and not merely linewise:

Code: Select all
[jeroen@mordor]/tmp$ wdiff doc_a doc_b
[-Check this-]{+check thes+} link to sea what scannars are [-supported-] {+suppurted+} by [-SANE-] {+SeNE+}


See man wdiff for more information (after you've installed the wdiff package if you don't have it, of course)
Jeroen
Debian Developer, Site Admin
Debian Developer, Site Admin
 
Posts: 571
Joined: 2004-04-06 18:19
Location: Utrecht, NL

Postby lacek » 2004-11-11 10:20

Wow, that's great...
I didn't know this program. Definitely much more usable than my crappish script... :-)
lacek
Moderator Team Member
 
Posts: 769
Joined: 2004-03-11 18:49
Location: Budapest, Hungary


Return to Programming

Who is online

Users browsing this forum: No registered users and 3 guests

fashionable