Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
pdf text only partially selectable - though it should be
pdf text only partially selectable - though it should be
Hi,
I have a pdf of roughly 190 lines of text per page, in which I can only select the first 153 lines. But I need all lines to be selectable.
If I use Chromium's build-in pdf viewer or Windows, I can select all lines.
Hence I thought this issue might be connected to the default paper size. I changed it between a4 and letter using 'dpkg-reconfigure libpaper1', which did not solve the problem.
Does anyone has a glue what I might do or where else I might find help?
I know that this is maybe not the most ideal place to ask, but I could not come up with another idea.
Thanks ind advance,
stillsen
I have a pdf of roughly 190 lines of text per page, in which I can only select the first 153 lines. But I need all lines to be selectable.
If I use Chromium's build-in pdf viewer or Windows, I can select all lines.
Hence I thought this issue might be connected to the default paper size. I changed it between a4 and letter using 'dpkg-reconfigure libpaper1', which did not solve the problem.
Does anyone has a glue what I might do or where else I might find help?
I know that this is maybe not the most ideal place to ask, but I could not come up with another idea.
Thanks ind advance,
stillsen
-
- Emeritus
- Posts: 2435
- Joined: 2010-12-07 19:55
- Has thanked: 14 times
- Been thanked: 54 times
Re: pdf text only partially selectable - though it should be
Which application are you using that only shows 153 lines?If I use Chromium's build-in pdf viewer or Windows, I can select all lines.
- Soul Singin'
- Posts: 1605
- Joined: 2008-12-21 07:02
Re: pdf text only partially selectable - though it should be
Instead of selecting text, first install the poppler-utils package:stillsen wrote:I have a pdf of roughly 190 lines of text per page, in which I can only select the first 153 lines. But I need all lines to be selectable.
Code: Select all
# apt-get install poppler-utils
Code: Select all
$ pdftotext your-file.pdf
Or if you would like to direct the output to another file:
Code: Select all
$ pdftotext your-file.pdf some-other-file.txt
Re: pdf text only partially selectable - though it should be
The tools I have been using to display the pdfs and select text within are: Atril Document Viewer and Occular
I've tried the poppler approach, which gives me a textfile, but those lines from around 153 until the end of the page are missing too
This is the pdf in question
https://static-content.springer.com/esm ... M2_ESM.pdf
I've tried the poppler approach, which gives me a textfile, but those lines from around 153 until the end of the page are missing too
This is the pdf in question
https://static-content.springer.com/esm ... M2_ESM.pdf
- Soul Singin'
- Posts: 1605
- Joined: 2008-12-21 07:02
Re: pdf text only partially selectable - though it should be
Whoa! That's one huge file.
Because it's a data table of some kind, your task would be much easier if you could obtain the spreadsheet (or other file) that was used to generate it.
I would write to the authors and ask them to share it with you. Tell them what you're working on, why you think their work is important and how you would like to build on it. Who knows? They might say: "Yes."
Good luck!
- Soul
Because it's a data table of some kind, your task would be much easier if you could obtain the spreadsheet (or other file) that was used to generate it.
I would write to the authors and ask them to share it with you. Tell them what you're working on, why you think their work is important and how you would like to build on it. Who knows? They might say: "Yes."
Good luck!
- Soul
Re: pdf text only partially selectable - though it should be
I think you could do like this.:stillsen wrote:Hi,
I have a pdf of roughly 190 lines of text per page, in which I can only select the first 153 lines. But I need all lines to be selectable.
If I use Chromium's build-in pdf viewer or Windows, I can select all lines.
Hence I thought this issue might be connected to the default paper size. I changed it between a4 and letter using 'dpkg-reconfigure libpaper1', which did not solve the problem.
Does anyone has a glue what I might do or where else I might find help?
I know that this is maybe not the most ideal place to ask, but I could not come up with another idea.
Thanks ind advance,
stillsen
0- extract range of pages you need to use:
Code: Select all
pdftk 41540_2018_69_MOESM2_ESM.pdf cat 1-2 output sal1-2.pdf
Code: Select all
gs -sDEVICE=txtwrite dNOPAUSE -dBATCH -sOutputFile=sal1-2.pdf out1-2.txt
Code: Select all
unoconv -f odt out1-2.txt
or
libreoffice --headless --convert-to odt out1-2.txt
bester69 wrote:STOP 2030 globalists demons, keep the fight for humanity freedom against NWO...
Re: pdf text only partially selectable - though it should be
thank you so much for your help! - it's solved now
yes, it is a huge file! - and I want to convert it into csv
i absolutely didn't think of ghostscript - which in turn did the trick.
using debian i did not manage to use the correct character encoding, so i switched to windows and converted the whole pdf using:
BIG THANKS again!
yes, it is a huge file! - and I want to convert it into csv
i absolutely didn't think of ghostscript - which in turn did the trick.
using debian i did not manage to use the correct character encoding, so i switched to windows and converted the whole pdf using:
Code: Select all
gs -sDEVICE=txtwrite -dNOPAUSE -dBATCH -sOutputFile=out.txt 41540_2018_69_MOESM2_ESM.pdf
- Soul Singin'
- Posts: 1605
- Joined: 2008-12-21 07:02
Re: pdf text only partially selectable - though it should be
I'm glad you got the text. Below is a Perl script that will convert the file to CSV.stillsen wrote:thank you so much for your help! - it's solved now
yes, it is a huge file! - and I want to convert it into csv
You have made me so curious that I even tested it for you. It should work fine. Now could you please tell us what this is? .
Code: Select all
#!/usr/bin/env perl
use strict;
use warnings;
## input file -- the output of "gs" command
my $infile = "out.txt";
## output file -- formatted CSV
my $otfile = "formatted.csv";
## open the files for reading and writing
open( OTFILE, ">$otfile" ) || die "could not overwrite $otfile";
open( INFILE, $infile ) || die "could not open $infile";
## read in the input file and convert it to CSV
while (<INFILE>) {
## remove newlines (at end of each line)
chomp;
## create a scalar to hold the line
my $line = $_;
## remove excess space
$line =~ s/\s+/ /g;
$line =~ s/^ //;
$line =~ s/ $//;
## replace the spaces with commas
$line =~ s/ /,/g;
## if the line contains text, add an initial newline
## if the line contains floats, add an initial column
$line = ( $line =~ /^[A-Z]/ ) ? "\n". $line : ','. $line;
## print to the CSV file (adding a newline)
print OTFILE $line ."\n";
}
## close the files
close INFILE;
close OTFILE;
- stevepusser
- Posts: 12930
- Joined: 2009-10-06 05:53
- Has thanked: 41 times
- Been thanked: 72 times
Re: pdf text only partially selectable - though it should be
Just as a point of interest, the free-as-in-beer-but-not-as-in-speech Master PDF Editor seems to be able to copy those lines, if this is the last one:
Code: Select all
FOX3FUS3STR3DOX3TMP3 FOX FUS STR DOX TMP FOX+FUS FOX+STR FOX+DOX FOX+TMP FUS+STR FUS+DOX FUS+TMP STR+DOX STR+TMP DOX+TMP FOX+FUS+STR FOX+FUS+DOX FOX+FUS+TMP FOX+STR+DOX FOX+STR+TMP FOX+DOX+TMP FUS+STR+DOX FUS+STR+TMP FUS+DOX+TMP STR+DOX+TMP FOX+FUS+STR+DOX FOX+FUS+STR+TMP FOX+FUS+DOX+TMP FOX+STR+DOX+TMP FUS+STR+DOX+TMP FOX+FUS+STR+DOX+TMP
95.77191621 92.0480993 83.35919317 84.55559984 97.30968513 101.3576416 87.70364624 95.46159814 66.15593483 63.49883631 118.7354538 125.2521334 101.6679597 107.8743212 113.889996 71.87742436 77.15283165 96.08223429 68.7742436 47.98293251 101.9782777 58.08766486 63.05275407 115.0116369 81.18696664 54.4996121 65.22498061 116.5632273 85.84173778 54.67416602 44.74398759
101.0549983 92.75337254 78.91732964 99.70660147 95.53382233 89.7094431 77.2570045 75.04323763 57.33310273 58.99342788 109.356624 108.8031823 92.47665168 102.4386026 92.92583537 71.44586648 66.46489104 63.97440332 46.54098928 44.88066413 90.81632653 57.88654445 76.98028364 109.356624 89.98616396 46.26426842 56.77966102 101.60844 79.74749222 68.6786579 50.968523
94.69523977 87.10697348 87.10697348 85.48565121 111.9757174 99.75408396 75.58405059 100.8782716 51.13297031 60.1264711 102.0024592 110.99596 90.19848937 93.29000527 108.7380427 79.51870718 79.51870718 56.19181451 66.3095029 27.80607764 95.25733357 53.38134551 67.9957843 115.7737572 88.79325487 47.19831372 54.50553311 98.34884946 74.17881609 64.6232215 62.0937994
MX Linux packager and developer