Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
[SOLVED] Bash: How to process a text file to fixed qty words
[SOLVED] Bash: How to process a text file to fixed qty words
Hi
I dont have much understanding of awk/sed!
I have a text file with lots of sentences / paragraphs. The words are single space separated. ... such that: the new file should only have 8 words (or less if sentence ends) on one line.
Please do inform: Will it will be possible if this formatted text be some how "cat" into the .odt file format, like the text files. I actually want to use the data in Writer for next task. I tried, but the .odt file got corrupt!
Thanks in advance!
I dont have much understanding of awk/sed!
I have a text file with lots of sentences / paragraphs. The words are single space separated. ... such that: the new file should only have 8 words (or less if sentence ends) on one line.
Please do inform: Will it will be possible if this formatted text be some how "cat" into the .odt file format, like the text files. I actually want to use the data in Writer for next task. I tried, but the .odt file got corrupt!
Thanks in advance!
Last edited by makh on 2019-03-28 18:11, edited 1 time in total.
ThinkPad E14: Arch, Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
-
- Posts: 195
- Joined: 2019-03-12 23:26
Re: Bash: How to process a text file to only have fixed word
As I recall odt is a zip file.
Do the following...
Your text should be in content.xml.
Here are some possibilities:
1) Unzip the file, edit content.xml, and then zip the file.
2) Because you want to use writer, just write your data to a text file and import.
3) Find a text/markdown/xml/html to odt converter. Write the file in that format and then convert to odt.
Hope this helps...
Do the following...
Code: Select all
mdkir contents
cp your_file.odt contents/
unzip your_file.odt
Here are some possibilities:
1) Unzip the file, edit content.xml, and then zip the file.
2) Because you want to use writer, just write your data to a text file and import.
3) Find a text/markdown/xml/html to odt converter. Write the file in that format and then convert to odt.
Hope this helps...
Re: Bash: How to process a text file to only have fixed word
Hi
Thanks a lot, half issue resolved: I found unoconv, to convert text to odt format.
But still I need to process it to process to 8 words per line.
Thankyou
Thanks a lot, half issue resolved: I found unoconv, to convert text to odt format.
But still I need to process it to process to 8 words per line.
Thankyou
ThinkPad E14: Arch, Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
- ralph.ronnquist
- Posts: 342
- Joined: 2015-12-19 01:07
- Location: Melbourne, Australia
- Been thanked: 6 times
Re: Bash: How to process a text file to only have fixed word
You could use fold for breaking it up at a certain column
but I don't know off-hand of a program that breaks lines by word count.
Though, I suppose it'd not be a bad exercise for a sed freak.
Code: Select all
man fold
Though, I suppose it'd not be a bad exercise for a sed freak.
-
- Posts: 195
- Joined: 2019-03-12 23:26
Re: Bash: How to process a text file to only have fixed word
Sorry, I misread that as context and the second part as the question...makh wrote: But still I need to process it to process to 8 words per line.
I have been super busy today can't give you all of the solution, but here is a start:
Code: Select all
cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'
In plain English, this code says,
Given a file,
1) remove the new lines (convert each to a space),
2) remove the tabs (convert each to a space),
3) remove all punctuation (convert each to a space),
4) remove all sequences of spaces (i.e., multiple spaces),
5) convert all words to lower case, and
6) replace each space with a new line.
After running this pipeline, you will have one word per line. The next task, which I don't have time to write until tomorrow evening is to loop over the list of words and write them eight to a line. There are several ways to do that: 1) use awk, 2) write a shell script, 3) use Python, or 4) another favorite language. Tomorrow night I will try to write a bash script.
If you want to take a shot at the script, it is fairly straightforward:
1) read stdin into a while loop,
2) skip any blank lines,
3) keep a counter:
3a) at 7, printf "%s\n" word
3b) otherwise, increment the counter and printf "%s " word
Hope this helps...
-
- Posts: 195
- Joined: 2019-03-12 23:26
Re: Bash: How to process a text file to only have fixed word
@makh Here is a bash script that will take the list generated above and print eight words per line.neuraleskimo wrote:I have been super busy today can't give you all of the solution, but here is a start:makh wrote: But still I need to process it to process to 8 words per line.Tomorrow night I will try to write a bash script.Code: Select all
cat <filename> | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;:]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n'
Code: Select all
#!/bin/bash
idx=0
while read WORD
do
if [ $WORD ]
then
if [ $idx -eq 7 ]
then
printf "%s\n" $WORD
idx=0
else
printf "%s " $WORD
idx=`expr $idx + 1`
fi
fi
done </dev/stdin
By the way, that was a fun puzzle. I am curious about the bigger picture. However, if it is top-secret, you don't need to share. Thanks for the challenge!
Re: Bash: How to process a text file to only have fixed word
Hineuraleskimo wrote:@neuraleskimo ...
I have understood your bash command, that works perfectly as you arranged it. It seems to be correctly working on my local language also (Urdu)!
Let me see how to integrate it to the bash code.
Your curiosity: Well I need to format my data into 8 columns; then move it to odt file; then make tables of the same data; then later add on explanation of that data, word by word, in rows below.
Thanks a lot!
ThinkPad E14: Arch, Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
Re: Bash: How to process a text file to only have fixed word
Hi
With some edits, as required:
Thankyou all for your kind help and support!
With some edits, as required:
Code: Select all
#!/bin/bash
cat test_2.txt | tr '\n' ' ' | tr '\t' ' ' | sed -e 's/[,.;]/ /g' | tr -s ' ' | tr "[:upper:]" "[:lower:]" | tr ' ' '\n' > list_3.txt
idx=0
while read WORD
do
if [ $WORD ]
then
if [ $idx -eq 7 ]
then
printf "%s\n" $WORD
idx=0
else
printf "%s " $WORD
idx=`expr $idx + 1`
fi
fi
done <list_3.txt
ThinkPad E14: Arch, Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
Re: [SOLVED] Bash: How to process a text file to fixed qty w
Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?
-
- Posts: 195
- Joined: 2019-03-12 23:26
Re: Bash: How to process a text file to only have fixed word
Very good!makh wrote:With some edits, as required:Code: Select all
...
No problem at all. I am happy to help.Thankyou all for your kind help and support!
Re: [SOLVED] Bash: How to process a text file to fixed qty w
Hipylkko wrote:Are you some kind of IT-manager at an office, as you have so many questions related to batch modifying text documents, if I may ask?
No Sir! Actually I have started a welfare campaign to teach different courses people, so I needed to prepare presentations etc etc (you can understand the detailed inside kernel operations).
Right now I am Every-Task-Manager in my "Home-Office".
Thankyou
ThinkPad E14: Arch, Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable
GUI: Xfce
For new: Try MX Linux, Linux Mint; later join Debian Stable