Howto: Convert PDF to ODT/TXT

Share your own howto's etc. Not for support questions!

Howto: Convert PDF to ODT/TXT

Postby bester69 » 2016-07-04 17:37

The following script convert any pdf file to a readable ODT, TXT file.

requirements:
convert, soffice, unoconv, tesseract

clear
workfol=$(pwd)
ruta=$(readlink -f "$1")
ruta2=$(dirname "$ruta")
filename=$(basename "$1")

echo "Ruta completa es:"$ruta
echo "Nombre es:"$filename
echo "Carpeta es:"$ruta2

tmp="/tmp/tmpxxzy"
rm -rf $tmp
mkdir $tmp
cd $tmp

convert -units pixelsperinch -density 300x300 -resize 2480x3508 -page a4 "$ruta" sal.jpg
find . -name "*.jpg" -exec tesseract -l spa {} {} \;
cat *.txt > "$filename".txt
soffice --headless --convert-to odt "$filename".txt
#pdfunite *.pdf $1
unoconv -f pdf "$filename".odt
#mv $1.* ../
echo "Moviendo A :$filename.txt a $ruta2"
mv "$filename".txt "$ruta2"/
mv "$filename".odt "$ruta2"/

cd "$workfol"/
#rm -rf $tmp



pdf2odt fich.pdf --> create two files, fich.txt and fich.pdf.odt
bester69 wrote:You wont change my mind when I know Im right, Im not an ...
User avatar
bester69
 
Posts: 1494
Joined: 2015-04-02 13:15

Return to Docs, Howtos, Tips & Tricks

Who is online

Users browsing this forum: No registered users and 6 guests

fashionable