[SOLVED] Bash: How to process a text file to fixed qty words

Need help with C, C++, perl, python, etc?

[SOLVED] Bash: How to process a text file to fixed qty words

Postby makh » 2019-03-25 22:39

Hi

I dont have much understanding of awk/sed!

I have a text file with lots of sentences / paragraphs. The words are single space separated. ... such that: the new file should only have 8 words (or less if sentence ends) on one line.

Please do inform: Will it will be possible if this formatted text be some how "cat" into the .odt file format, like the text files. I actually want to use the data in Writer for next task. I tried, but the .odt file got corrupt!

Thanks in advance!
Last edited by makh on 2019-03-28 18:11, edited 1 time in total.
HP Probook 440 G2: Arch, Debian Stable
Server: none
Past: Debian, Centos, Ubuntu, Opensuse
GUI: Openbox, Cinnamon
Chroot: Debian, Ubuntu, Fedora
VM: Devuan

Employing the best:
Arabic
Debian
Homeopathic

For new: Try Linux Mint
User avatar
makh
 
Posts: 638
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

Postby neuraleskimo » 2019-03-26 00:43

As I recall odt is a zip file.

Do the following...
Code: Select all
mdkir contents
cp your_file.odt contents/
unzip your_file.odt

Your text should be in content.xml.

Here are some possibilities:
1) Unzip the file, edit content.xml, and then zip the file.
2) Because you want to use writer, just write your data to a text file and import.
3) Find a text/markdown/xml/html to odt converter. Write the file in that format and then convert to odt.

Hope this helps...
User avatar
neuraleskimo
 
Posts: 102
Joined: 2019-03-12 23:26
Location: Bloomington, Indiana, USA

Re: Bash: How to process a text file to only have fixed word

Postby makh » 2019-03-26 13:29

Hi

Thanks a lot, half issue resolved: I found unoconv, to convert text to odt format.

But still I need to process it to process to 8 words per line.

Thankyou
HP Probook 440 G2: Arch, Debian Stable
Server: none
Past: Debian, Centos, Ubuntu, Opensuse
GUI: Openbox, Cinnamon
Chroot: Debian, Ubuntu, Fedora
VM: Devuan

Employing the best:
Arabic
Debian
Homeopathic

For new: Try Linux Mint
User avatar
makh
 
Posts: 638
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

Postby ralph.ronnquist » 2019-03-26 21:12

You could use fold for breaking it up at a certain column
Code: Select all
man fold

but I don't know off-hand of a program that breaks lines by word count.
Though, I suppose it'd not be a bad exercise for a sed freak.
User avatar
ralph.ronnquist
 
Posts: 320
Joined: 2015-12-19 01:07
Location: Melbourne, Australia

Re: Bash: How to process a text file to only have fixed word

Postby neuraleskimo » 2019-03-27 00:50

makh wrote:But still I need to process it to process to 8 words per line.


Sorry, I misread that as context and the second part as the question...

I have been super busy today can't give you all of the solution, but here is a start:
Code: Select all
cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'

This code can be shortened, but I wrote it this way so you can break-apart the pieces to see each step work.

In plain English, this code says,
Given a file,
1) remove the new lines (convert each to a space),
2) remove the tabs (convert each to a space),
3) remove all punctuation (convert each to a space),
4) remove all sequences of spaces (i.e., multiple spaces),
5) convert all words to lower case, and
6) replace each space with a new line.

After running this pipeline, you will have one word per line. The next task, which I don't have time to write until tomorrow evening is to loop over the list of words and write them eight to a line. There are several ways to do that: 1) use awk, 2) write a shell script, 3) use Python, or 4) another favorite language. Tomorrow night I will try to write a bash script.

If you want to take a shot at the script, it is fairly straightforward:
1) read stdin into a while loop,
2) skip any blank lines,
3) keep a counter:
3a) at 7, printf "%s\n" word
3b) otherwise, increment the counter and printf "%s " word

Hope this helps...
User avatar
neuraleskimo
 
Posts: 102
Joined: 2019-03-12 23:26
Location: Bloomington, Indiana, USA

Re: Bash: How to process a text file to only have fixed word

Postby neuraleskimo » 2019-03-28 00:45

neuraleskimo wrote:
makh wrote:But still I need to process it to process to 8 words per line.

I have been super busy today can't give you all of the solution, but here is a start:
Code: Select all
cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'

Tomorrow night I will try to write a bash script.


@makh Here is a bash script that will take the list generated above and print eight words per line.
Code: Select all
#!/bin/bash

idx=0
while read WORD
do
    if [ $WORD ]
    then
        if [ $idx -eq 7 ]
        then
            printf "%s\n" $WORD
            idx=0
        else
            printf "%s " $WORD
            idx=`expr $idx + 1`
        fi
    fi   
done </dev/stdin

Assuming I understood the problem, that should give you everything you need. I hope it helps.

By the way, that was a fun puzzle. I am curious about the bigger picture. However, if it is top-secret, you don't need to share. Thanks for the challenge!
User avatar
neuraleskimo
 
Posts: 102
Joined: 2019-03-12 23:26
Location: Bloomington, Indiana, USA

Re: Bash: How to process a text file to only have fixed word

Postby makh » 2019-03-28 17:37

neuraleskimo wrote:@neuraleskimo ...

Hi
I have understood your bash command, that works perfectly as you arranged it. It seems to be correctly working on my local language also (Urdu)!

Let me see how to integrate it to the bash code.

Your curiosity: Well I need to format my data into 8 columns; then move it to odt file; then make tables of the same data; then later add on explanation of that data, word by word, in rows below.

Thanks a lot!
HP Probook 440 G2: Arch, Debian Stable
Server: none
Past: Debian, Centos, Ubuntu, Opensuse
GUI: Openbox, Cinnamon
Chroot: Debian, Ubuntu, Fedora
VM: Devuan

Employing the best:
Arabic
Debian
Homeopathic

For new: Try Linux Mint
User avatar
makh
 
Posts: 638
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

Postby makh » 2019-03-28 18:10

Hi

With some edits, as required:

Code: Select all
#!/bin/bash

cat test_2.txt | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n' > list_3.txt

idx=0
while read WORD

do
    if [ $WORD ]
    then
        if [ $idx -eq 7 ]
        then
            printf "%s\n" $WORD
            idx=0
        else
            printf "%s " $WORD
            idx=`expr $idx + 1`
        fi
    fi   
done <list_3.txt



Thankyou all for your kind help and support!
:)
HP Probook 440 G2: Arch, Debian Stable
Server: none
Past: Debian, Centos, Ubuntu, Opensuse
GUI: Openbox, Cinnamon
Chroot: Debian, Ubuntu, Fedora
VM: Devuan

Employing the best:
Arabic
Debian
Homeopathic

For new: Try Linux Mint
User avatar
makh
 
Posts: 638
Joined: 2011-10-09 09:16

Re: [SOLVED] Bash: How to process a text file to fixed qty w

Postby pylkko » 2019-03-28 19:28

Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?
User avatar
pylkko
 
Posts: 1567
Joined: 2014-11-06 19:02

Re: Bash: How to process a text file to only have fixed word

Postby neuraleskimo » 2019-03-28 21:11

makh wrote:With some edits, as required:
Code: Select all
...


Very good!
Thankyou all for your kind help and support!
:)

No problem at all. I am happy to help.
User avatar
neuraleskimo
 
Posts: 102
Joined: 2019-03-12 23:26
Location: Bloomington, Indiana, USA

Re: [SOLVED] Bash: How to process a text file to fixed qty w

Postby makh » 2019-03-29 11:26

pylkko wrote:Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?

Hi
No Sir! Actually I have started a welfare campaign to teach different courses people, so I needed to prepare presentations etc etc (you can understand the detailed inside kernel operations).

Right now I am Every-Task-Manager in my "Home-Office". :wink:

Thankyou
HP Probook 440 G2: Arch, Debian Stable
Server: none
Past: Debian, Centos, Ubuntu, Opensuse
GUI: Openbox, Cinnamon
Chroot: Debian, Ubuntu, Fedora
VM: Devuan

Employing the best:
Arabic
Debian
Homeopathic

For new: Try Linux Mint
User avatar
makh
 
Posts: 638
Joined: 2011-10-09 09:16


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

fashionable