Page 1 of 2

Searchables OCR's (Very good Solutions!!)

PostPosted: 2017-09-07 23:27
by bester69
Im very impressed with Google OCR engine, as it's incredibly fast and accurate :shock: , and the best thing we can use it in linux to get great results from thoses
blurry and small pictures than some ocrs doesnt always get accurate results.

I found out two ways:
1- Using Google docs:
- We upload a picture or a pdf file to GoogleDrive
- Open with google Docs --> It open the file in a doc with the text below.

2- My favourite, using GoogleKeep:
- Drag a picture or serveral pictures in a note or several notes apart
- Select "Grab text from image"
---------------

https://opensource.com/life/15/9/open-s ... ext-images
Google's Optical Character Recognition (OCR) software now works for over 248 world languages (including all the major South Asian languages). It's quite simple and easy to use, and can detect most languages with over 90% accuracy.

The technology extracts text from images, scans of printed text, and even handwriting, which means text can be extracted from pretty much any old books, manuscripts, or images.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 05:46
by debiman
OCR with 90% accuracy? meaning, every 10th character is wrong?
that's thick, even for the almighty google.
also, enjoy providing the beast with even more information.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 08:31
by bester69
debiman wrote:OCR with 90% accuracy? meaning, every 10th character is wrong?
that's thick, even for the almighty google.
also, enjoy providing the beast with even more information.


We cant stop it man!!, quantum D-wave its already here, and between CERN and that D-wave, who knows what future is coming. :? :? google information oscure uses should be the less to worry about.

We might even already been inside a Matrix, as Ive already noticed some weird things going on around me. :?
https://www.youtube.com/watch?v=uwl3h8l4NPI

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 16:43
by bdtc1
aptitude search tesseract-ocr

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 18:14
by Dai_trying
I gnerally use ocrfeeder, it works without submitting your work to google and in my (not so extensive) use it has been pretty accurate.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 18:56
by bester69
Dai_trying wrote:I gnerally use ocrfeeder, it works without submitting your work to google and in my (not so extensive) use it has been pretty accurate.


I use AbbyFinederX by using playonlinux (wine), It gets outstanding jobs. And i also Use tesseract with recollindex, for searches indexing within pictures and pdf's. It also do a great job.

Im loving now using GoogleKeep with Android and Chrome, you take a picture of a document and use it as a note by extracting the text and then removing the picture. Its reallt great! :D

-----------------------------
There are also some great Android Apps we can use for OCR's.
I found a good one out:
- Adobe SCan ---> It works Great.
https://video.tv.adobe.com/v/18742t1/?autoplay=true
Image

It does a real OCR pdf's job; we can upload a scanning document to the cloud and then using Adobe Scan to get the ocr's pdf, and then download the job to our computer.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 20:52
by bester69
Here i leave a pdf's OCR document ive done with Adobe Scan, in just two minits,
You will see text its in background:
https://drive.google.com/open?id=0B-1Wr ... mY2N1g5azg


Steps to get done the OCR:
1- Create a folder and put inside all pictures (limitation 25 files per document)
pdftoppm -png -aa yes -r 300 document_forOCRscan.pdf outfile.png

2- Upload folder to GoogleDrive with all pictures

3- Goto Adobe Scan App,
- select GoogleDrive source-->Pick the Folder uploades--> Select All png files
- Save as PDF file

4- Merged All pdf files (AdobeScan limited to 25 pages per file in free version)
pdftk *.pdf cat output finaldoc.pdf

Done. :D

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 21:09
by Dai_trying
It does sound very good, but I will stick to using my off-line version and retain some privacy on my data. :D

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-08 21:17
by 4D696B65
Dai_trying wrote:It does sound very good, but I will stick to using my off-line version and retain some privacy on my data. :D

+1

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-09 00:12
by bester69
4D696B65 wrote:
Dai_trying wrote:It does sound very good, but I will stick to using my off-line version and retain some privacy on my data. :D

+1


I downloaded a pdf ebook with no text from amule, and i used AdobeScan for OCR:
- It has limitiation to 25 pager per document, so i created folders of 25 pages then used ptftk to merged the resulted file.

Here, Check the proffessional result I got with AdobeScan, I uploaded a 25 pages file (limitation of free version):
https://drive.google.com/open?id=0B-1Wr ... UZTdjI2Zkk

8)

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-09 00:21
by bester69
Dai_trying wrote:It does sound very good, but I will stick to using my off-line version and retain some privacy on my data. :D


Im afraid there is no linux/off-line version that gets text mapping OCR, and the app that i think try to maps text do an awfull job,in resumen linux still lacks an App with a mapping ocr's text.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-09 06:32
by debiman
bester69 wrote:
debiman wrote:OCR with 90% accuracy? meaning, every 10th character is wrong?
that's thick, even for the almighty google.
also, enjoy providing the beast with even more information.


We cant stop it man!!, quantum D-wave its already here, and between CERN and that D-wave, who knows what future is coming. :? :? google information oscure uses should be the less to worry about.

We might even already been inside a Matrix, as Ive already noticed some weird things going on around me. :?
https://www.youtube.com/watch?v=uwl3h8l4NPI

quoted for posterity, before OP changes his/her mind and edits it.
:lol:

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-09 12:53
by alan stone
bester69 wrote: who knows what future is coming. :? :?

Unless fundamental changes, Silly Valley and other Big Tech ending up as repugnant and despised as Wall Street.

Re: Google OCR Solution (Very good!!)

PostPosted: 2017-09-09 15:45
by bester69
I finally found out tthe best and accesibles Solutions in linux for Searchables OCR's purpose.:


- Master PDF Edidtion (Native App- Free Version)-> It brings a Searchable OCR included in free version; It get a very good results, but still text labels alieneation is not perfect; when you copy some paragraphs text and paste in a text editor, some lines still not shows properly. But for using as a serachable document, it does a great job.

- Adobe Scan (Mobile app -Android)
For being an Adobe App, it does a proffesional job, the problem is that you need a smartphone to be able to enjoy. Free version, limited to 25 pages/document. We can scan our docs. in linux then upload them to cloud and use AdobeSCan for getting the Searchables OCR's, next in linux using ptftk to merge final document result. Perfect text labels alineation, worthy for copy/paste

- OcrMyPDF (comand line)
https://ocrmypdf.readthedocs.io/en/late ... notes.html
It gets the job done, but my testing showed very bad alineation with text labels, much worse than "Master PDF Edition", so it's not worthy for copy/paste text.

- PDF-XChange Viewer
I definitly support this App, this small windows app includes a free OCR that do a proffesional job. It works "Gold" with any or most of wine's versions, so you wont get any problem by installing it. The alinention it gets, its finally perfect (much better than "Master PDF Edition"); you will be able to do copy/paste paragrpahs with the right alineation words.

So, Right Now, for me the best and accessible linux searchable OCR's solution would be using PDF-XChange Viewer(wine) ( and in its default MasterPDFEdition). And for a proffesional job i would use AdobeScan.

Re: Searchables OCR's (Very good Solutions!!)

PostPosted: 2017-09-09 16:18
by Dai_trying
Did you try ocrfeeder?