It doesn't work very well. But its 90% there. OK. Its quite a difficult thing to do. Its taken me days to get this working right.
To add to the above. I had to change this. The php in the post above proved unsuable for me as;
1. they didn't work with longfilenames with spaces in them.
2. didn't 'escape' any text to pdflatex when I tried to add the filename of the document.
3. They didn't delete the temp directory afterwards leaving a inconsistant state for the next file.
4. didn't work with pdflatex so it was in batch mode - you had to press return to get past its interactive mode. It seemed like it was taking forever, when really it needed user input mode switched off.
5. Didn't meet all my requirements (below).
6. Used latex and multido to create multiple pages. I had all sorts of problems with this for some reason. After struggling with this, I simply used a php loop to create the multiple pages with a $newpage var, which means latex's rather limited multido package is not needed.
I would not have had a clue where to start without the post above though!
My Requirements: I needed to add page number's to LOADS of PDF's (60) so they could be supporting documents for a court case.
1. Each PDF HAD to have page numbering added.
2. I needed to add the FILENAME (not directory etc..) at the top of each page so the documents could be cross referenced easily.
(BTW: its a task to sort after, printing without the filename on my printer).
3. It needed to deal with long filenames with spaces in them.
4. It needed to have a WHITEBOX background so image files or text would not obscure the reference and numbering.
5. FINALLY - After getting it working I wanted a 'blackhole' network shared directory so I could just drop any PDF into the directory at any time and expect ALL the above done to it, and a output directory would contain the numbered PDFs (Not destroying the original). I can then drag and drop whole directories of files.
The last thing I wanted to do was to have to type filenames etc.. and watch it process them from a shell.
So after days getting this all working here it is;
System Requirements: YOU NEED.
1. Linux (I have debian).
2. phpnfo
3. pdflatex (I think it was the tetex package in aptitude its contained in...I forget..)
4. pdftk
5. php
My code doesn't test for these - make sure they are installed. OK - To get this working.
1. CREATE YOUR PDFNumbering Directory; eg 'BLACKHOLE_pdfAddPageNumbers'
2. Add the following php as pdfAddPageNumbers.php inside this Directory
Code: Select all
#!/usr/bin/php
<?
// Take out $ref from $newpage to remove the flename from being written to the pdf
(count($argv) == 2) || DIE("Usage: $argv[0] input.pdf\n");
$filename = $argv[1];
$tmpDir = 'tmp'; /*in curent directory */
$outDir = 'pdfs_numbered';
$tmpName = 'addPdfPages';
if (file_exists($outDir)) {
echo "$outDir exists..\n";
} else {
echo "$outDir Directory does not exist - creating this now in current directory\n";
exec("mkdir $outDir");
}
if (file_exists($tmpDir)) {
echo "$tmpDir exists so deleting this and recreating\n";
exec("rm -rf $tmpDir");
} else {
echo "$tmpDir does not exist - creating this now\n";
}
echo("Creating tmp dir in current dir\n");
exec("mkdir $tmpDir");
$target = $outDir."/".$filename; // ALL CHANGED FILES GO INTO SUBDIRECTORY!! NO NAME CHANGES
/* work out the number of pages with pdfinfo */
exec("pdfinfo \"$filename\"",$outList);
/* obtain contains page number */
foreach ($outList as $l) {
if (preg_match('/^Pages/',$l)) {
$a = preg_split('/[ ]+/ ',$l);
break;
}
}
echo "Input file $filename has $a[1] pages.\n";
$numPages = $a[1];
$escPages = '{' . $numPages . '}';
/*create /tmp/addPDFPages.tex.w file */
/* \\usepackage[hmargin=.8cm,vmargin=1.5cm,nohead,nofoot]{geometry} */
$tex = "\\documentclass[12pt,a4paper]{article} \\usepackage[hmargin=1.5cm,vmargin=0.3cm,nohead,nofoot]{geometry}\\usepackage[usenames]{color}\\begin{document}\\pagestyle{empty}\n";
$pages = "";
for ($i = 0; $i<$numPages; $i++)
{ $p = $i+1;
$ref = latexSpecialChars($filename);
$newpage = "\\colorbox{white}{ ".$ref." [{$p} of {$numPages}]}\\newpage\n";
$pages = $pages.$newpage;
}
$tex = $tex.$pages."\\end{document}";
if ($fp = fopen($tmpDir . '/' . $tmpName . '.tex','w')) {
fwrite($fp,$tex);
//*\title{$filename} */
echo("Generating PDF with page numbers\n");
echo("pdflatex -interaction=batchmode -output-directory=$tmpDir $tmpDir/$tmpName >> null\n");
exec("pdflatex -interaction=batchmode -output-directory=$tmpDir $tmpDir/$tmpName >> null");
echo("cp $filename $tmpDir/source.pdf\n");
copy("$filename","$tmpDir/source.pdf");
echo("Bursting input file\n");
echo("pdftk $tmpDir/source.pdf burst output $tmpDir/file_%03d.pdf\n");
exec("pdftk $tmpDir/source.pdf burst output $tmpDir/file_%03d.pdf");
echo("Bursting page number file\n");
echo("pdftk $tmpDir/$tmpName.pdf burst output $tmpDir/numb_%03d.pdf\n");
exec("pdftk $tmpDir/$tmpName.pdf burst output $tmpDir/numb_%03d.pdf");
echo("Generating Stamped (merged) new pages with pdftk for each page\n");
for($ii=1;$ii<=$numPages;$ii++) {
$jj = sprintf("%03d",$ii);
echo("pdftk $tmpDir/file_$jj.pdf stamp $tmpDir/numb_$jj.pdf output $tmpDir/new_$jj.pdf\n");
exec("pdftk $tmpDir/file_$jj.pdf stamp $tmpDir/numb_$jj.pdf output $tmpDir/new_$jj.pdf");
}
echo("Merging into new document...\n");
echo("pdftk $tmpDir/new_???.pdf output \"$target\"\n");
exec("pdftk $tmpDir/new_???.pdf output \"$target\"");
// reduces image and print quality but can do this to greatly reduce filesize... to fo this simply ncommnent the lines below
//--------------------------------------------------------------------------------------
//echo("...Compressing target with ghostscript...\n");
//exec("mv \"$target\" temppdf");
//echo("gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=\"$target\" temppdf \n");
//exec("gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=\"$target\" temppdf ");
//-------------------------------------------------------------------------------------
echo("...Done removing temp files\n");
exec("rm -rf $tmpDir");
}
function latexSpecialChars( $string )
{
$map = array(
"#"=>"\\#",
"$"=>"\\$",
"%"=>"\\%",
"&"=>"\\&",
"~"=>"\\~{}",
"_"=>"\\_",
"^"=>"\\^{}",
"\\"=>"\\textbackslash{}",
"{"=>"\\{",
"}"=>"\\}",
);
return preg_replace( "/([\^\%~\\\\#\$%&_\{\}])/e", "\$map['$1']", $string );
}
?>
3. chmod +x pdfAddPageNumbers.php
to make it executable.
At this point its worth a test. Add your pdf's and from the shell
Code: Select all
php pdfAddPageNumbers.php filename.pdf
You should see a 'pdfs_numbered' subdirectory created and a tmp subdirectory created. Your numbered pdf will be in the pdfs_numbered subdir. (tmp will be deleted afterwards). No change will be made to your orginal pdf.
Okay - to make it a 'blackhole' directory.
Create the following file in the directory: addPDFPageNumbers.sh
Code: Select all
#!/bin/bash
#echo "Executing dirdrop"
shopt -s nullglob
_dfiles="*.pdf"
for fullfile in $_dfiles
do
filename=$(basename "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
echo "fullfile -$fullfile"
echo "extension - $extension"
echo "filename - $filename"
echo "outputfile - $outputfile"
echo "$filename" to be processed
/usr/bin/php pdfAddPageNumbers.php "$fullfile"
wait
mv "$fullfile" "$fullfile".done #rename the file so no other instance processes it
#mv "$fullfile".done /media/hdd/data/gdocs/processed/
done
exit 0
Then
and add
Code: Select all
# exec the pdfAddPageNumbers.sh blackhole script every min
* * * * * bash /media/hdd/data/BLACKHOLE_pdfAddPageNumbers/pdfAddPageNumbers.sh
Obvoiusly change the directory names to what you have set up...!!
You should now have a blackhole network share directory, which you can drop any pdf files in and they will be page numbered, and found in the pdf_numbered subdirectory. The pdf you drop in will be renamed to xxx.pdf.done (to stop being endlessly processed by the blackhole script)