Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

Perl script to check doc_b against doc_a for inconsistence

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
Perl script

Perl script to check doc_b against doc_a for inconsistence

#1 Post by Perl script »

Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

1)
Original document "doc_a"

Code: Select all

Check this link to sea what scannars are supported by SANE
Already having 2 typing mistakes
sea
scannars

2)
The reproduced document "doc_b" must maintain these 2 mistakes for consistence.

Code: Select all

check thes link to sea what scannars are suppurted by SeNE
Unfortunately another 3 typing mistakes were further made;
thes
suppurted
SeNE

What I expect to have in the printout is;

Code: Select all

Original    Mistake Line No. Word No.
this     thes     1         2
supported suppurted 1         9
SANE     SeNE     1         11
not just printing out their contents and saying "differ"

Kindly advise how to start. TIA

B.R.
satimis

lacek
Posts: 764
Joined: 2004-03-11 18:49
Location: Budapest, Hungary
Contact:

#2 Post by lacek »

Here is a quick perl script for this. It simply reads two files line by line, splits the lines and compares the words.
It is a mess, but you'll get the idea...

Code: Select all

#!/usr/bin/perl

die "Gimme 2 files\n" unless (-f $ARGV[0]) && (-f $ARGV[1]);

open (FILE1,$ARGV[0]) or die $ARGV[0].":$!";
open (FILE2,$ARGV[1]) or die $ARGV[1].":$!";

print "Original\tMistake\tline\tword\n";
while (<FILE1>) {
 chomp;
 $line1=<FILE2>;
 chomp($line1);
 
 @f1=split(/\s/,$_);
 @f2=split(/\s/,$line1);
 
 $line=0;
 for ($i=0;$i<$#f1;$i++) {
  $line++;
  if (lc($f1[$i]) ne lc($f2[$i])) {
   print $f1[$i]."\t".$f2[$i]."\t$line\t$i\n";
  }
 }
 
}

Jeroen
Debian Developer, Site Admin
Debian Developer, Site Admin
Posts: 483
Joined: 2004-04-06 18:19
Location: Utrecht, NL
Contact:

#3 Post by Jeroen »

Did you look at wdiff already?

It's diff, but than also inline, and not merely linewise:

Code: Select all

[jeroen@mordor]/tmp$ wdiff doc_a doc_b
[-Check this-]{+check thes+} link to sea what scannars are [-supported-] {+suppurted+} by [-SANE-] {+SeNE+}
See man wdiff for more information (after you've installed the wdiff package if you don't have it, of course)

lacek
Posts: 764
Joined: 2004-03-11 18:49
Location: Budapest, Hungary
Contact:

#4 Post by lacek »

Wow, that's great...
I didn't know this program. Definitely much more usable than my crappish script... :-)

Post Reply