Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[SOLVED] Bash: How to process a text file to fixed qty words

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
User avatar
makh
Posts: 651
Joined: 2011-10-09 09:16

[SOLVED] Bash: How to process a text file to fixed qty words

#1 Post by makh »

Hi

I dont have much understanding of awk/sed!

I have a text file with lots of sentences / paragraphs. The words are single space separated. ... such that: the new file should only have 8 words (or less if sentence ends) on one line.

Please do inform: Will it will be possible if this formatted text be some how "cat" into the .odt file format, like the text files. I actually want to use the data in Writer for next task. I tried, but the .odt file got corrupt!

Thanks in advance!
Last edited by makh on 2019-03-28 18:11, edited 1 time in total.
ThinkPad E14: Arch, Debian Stable
GUI: Xfce

For new: Try MX Linux, Linux Mint; later join Debian Stable

neuraleskimo
Posts: 195
Joined: 2019-03-12 23:26

Re: Bash: How to process a text file to only have fixed word

#2 Post by neuraleskimo »

As I recall odt is a zip file.

Do the following...

Code: Select all

mdkir contents
cp your_file.odt contents/
unzip your_file.odt
Your text should be in content.xml.

Here are some possibilities:
1) Unzip the file, edit content.xml, and then zip the file.
2) Because you want to use writer, just write your data to a text file and import.
3) Find a text/markdown/xml/html to odt converter. Write the file in that format and then convert to odt.

Hope this helps...

User avatar
makh
Posts: 651
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

#3 Post by makh »

Hi

Thanks a lot, half issue resolved: I found unoconv, to convert text to odt format.

But still I need to process it to process to 8 words per line.

Thankyou
ThinkPad E14: Arch, Debian Stable
GUI: Xfce

For new: Try MX Linux, Linux Mint; later join Debian Stable

User avatar
ralph.ronnquist
Posts: 342
Joined: 2015-12-19 01:07
Location: Melbourne, Australia
Been thanked: 6 times

Re: Bash: How to process a text file to only have fixed word

#4 Post by ralph.ronnquist »

You could use fold for breaking it up at a certain column

Code: Select all

man fold
but I don't know off-hand of a program that breaks lines by word count.
Though, I suppose it'd not be a bad exercise for a sed freak.

neuraleskimo
Posts: 195
Joined: 2019-03-12 23:26

Re: Bash: How to process a text file to only have fixed word

#5 Post by neuraleskimo »

makh wrote: But still I need to process it to process to 8 words per line.
Sorry, I misread that as context and the second part as the question...

I have been super busy today can't give you all of the solution, but here is a start:

Code: Select all

cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'
This code can be shortened, but I wrote it this way so you can break-apart the pieces to see each step work.

In plain English, this code says,
Given a file,
1) remove the new lines (convert each to a space),
2) remove the tabs (convert each to a space),
3) remove all punctuation (convert each to a space),
4) remove all sequences of spaces (i.e., multiple spaces),
5) convert all words to lower case, and
6) replace each space with a new line.

After running this pipeline, you will have one word per line. The next task, which I don't have time to write until tomorrow evening is to loop over the list of words and write them eight to a line. There are several ways to do that: 1) use awk, 2) write a shell script, 3) use Python, or 4) another favorite language. Tomorrow night I will try to write a bash script.

If you want to take a shot at the script, it is fairly straightforward:
1) read stdin into a while loop,
2) skip any blank lines,
3) keep a counter:
3a) at 7, printf "%s\n" word
3b) otherwise, increment the counter and printf "%s " word

Hope this helps...

neuraleskimo
Posts: 195
Joined: 2019-03-12 23:26

Re: Bash: How to process a text file to only have fixed word

#6 Post by neuraleskimo »

neuraleskimo wrote:
makh wrote: But still I need to process it to process to 8 words per line.
I have been super busy today can't give you all of the solution, but here is a start:

Code: Select all

cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'
Tomorrow night I will try to write a bash script.
@makh Here is a bash script that will take the list generated above and print eight words per line.

Code: Select all

#!/bin/bash

idx=0
while read WORD
do
    if [ $WORD ]
    then
        if [ $idx -eq 7 ]
        then
            printf "%s\n" $WORD
            idx=0
        else
            printf "%s " $WORD
            idx=`expr $idx + 1`
        fi
    fi    
done </dev/stdin
Assuming I understood the problem, that should give you everything you need. I hope it helps.

By the way, that was a fun puzzle. I am curious about the bigger picture. However, if it is top-secret, you don't need to share. Thanks for the challenge!

User avatar
makh
Posts: 651
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

#7 Post by makh »

neuraleskimo wrote:@neuraleskimo ...
Hi
I have understood your bash command, that works perfectly as you arranged it. It seems to be correctly working on my local language also (Urdu)!

Let me see how to integrate it to the bash code.

Your curiosity: Well I need to format my data into 8 columns; then move it to odt file; then make tables of the same data; then later add on explanation of that data, word by word, in rows below.

Thanks a lot!
ThinkPad E14: Arch, Debian Stable
GUI: Xfce

For new: Try MX Linux, Linux Mint; later join Debian Stable

User avatar
makh
Posts: 651
Joined: 2011-10-09 09:16

Re: Bash: How to process a text file to only have fixed word

#8 Post by makh »

Hi

With some edits, as required:

Code: Select all

#!/bin/bash

cat test_2.txt | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n' > list_3.txt

idx=0
while read WORD

do
    if [ $WORD ]
    then
        if [ $idx -eq 7 ]
        then
            printf "%s\n" $WORD
            idx=0
        else
            printf "%s " $WORD
            idx=`expr $idx + 1`
        fi
    fi   
done <list_3.txt

Thankyou all for your kind help and support!
:)
ThinkPad E14: Arch, Debian Stable
GUI: Xfce

For new: Try MX Linux, Linux Mint; later join Debian Stable

User avatar
pylkko
Posts: 1802
Joined: 2014-11-06 19:02

Re: [SOLVED] Bash: How to process a text file to fixed qty w

#9 Post by pylkko »

Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?

neuraleskimo
Posts: 195
Joined: 2019-03-12 23:26

Re: Bash: How to process a text file to only have fixed word

#10 Post by neuraleskimo »

makh wrote:With some edits, as required:

Code: Select all

...
Very good!
Thankyou all for your kind help and support!
:)
No problem at all. I am happy to help.

User avatar
makh
Posts: 651
Joined: 2011-10-09 09:16

Re: [SOLVED] Bash: How to process a text file to fixed qty w

#11 Post by makh »

pylkko wrote:Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?
Hi
No Sir! Actually I have started a welfare campaign to teach different courses people, so I needed to prepare presentations etc etc (you can understand the detailed inside kernel operations).

Right now I am Every-Task-Manager in my "Home-Office". :wink:

Thankyou
ThinkPad E14: Arch, Debian Stable
GUI: Xfce

For new: Try MX Linux, Linux Mint; later join Debian Stable

Post Reply