unique sorted search from text file

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
MakeTopSite
Posts: 127
Joined: 2021-01-20 08:44
Has thanked: 11 times
Been thanked: 4 times

unique sorted search from text file

#1 Post by MakeTopSite »

Code: Select all

$ cat file.txt
word4 word2
word4 word2
word3 word1
word1 word3
$
How can I please get this output from file.txt ?

Code: Select all

word1
word2
word3
word4

User avatar
fabien
Forum Helper
Forum Helper
Posts: 1158
Joined: 2019-12-03 12:51
Location: Anarres (Toulouse, France actually)
Has thanked: 101 times
Been thanked: 265 times

Re: unique sorted search from text file

#2 Post by fabien »

Code: Select all

$> mawk '{for (i=1;i<=NF;++i){if (!duplicate[$i]++){print $i}}}' /tmp/file.txt
word4
word2
word3
word1
It is possible to sort with mawk, but this requires complicated code that is probably slower than piping the output to sort. I believe gawk has a sort function built in.

Code: Select all

$> mawk '{for (i=1;i<=NF;++i){if (!duplicate[$i]++){print $i}}}' /tmp/file.txt | sort
word1
word2
word3
word4
Works for any number of words per line:

Code: Select all

$> cat /tmp/file.txt
word4 word2
word4 word2 word5
word3 word1
word1 word3 word6 word7
$> mawk '{for (i=1;i<=NF;++i){if (!duplicate[$i]++){print $i}}}' /tmp/file.txt | sort
word1
word2
word3
word4
word5
word6
word7
Using sort -u:

Code: Select all

$> mawk '{for (i=1;i<=NF;++i){print $i}}' /tmp/file.txt | sort -u
word1
word2
word3
word4
ImageShare your Debian SCRIPTS
There will be neither barrier nor walls, neither official nor guard, there will be no more desert and the entire world will become a garden. — Anacharsis Cloots

arzgi
Posts: 1585
Joined: 2008-02-21 17:03
Location: Finland
Been thanked: 81 times

Re: unique sorted search from text file

#3 Post by arzgi »

A shorter one line:

Code: Select all

arto@dell:~$ awk -v RS=" " '{print}' file.txt | sort -u

word1
word2
word3
word4
arto@dell:~$ 


User avatar
fabien
Forum Helper
Forum Helper
Posts: 1158
Joined: 2019-12-03 12:51
Location: Anarres (Toulouse, France actually)
Has thanked: 101 times
Been thanked: 265 times

Re: unique sorted search from text file

#4 Post by fabien »

arzgi wrote: 2024-09-22 10:23 A shorter one line:

Code: Select all

arto@dell:~$ awk -v RS=" " '{print}' file.txt | sort -u
Yes, this is another solution, but the result may be unexpected if there are tabs:

Code: Select all

$> cat --show-tabs /tmp/file.txt 
word4 word2
word4 word2
word3 word1
word1 word3
$> cat --show-tabs /tmp/file2.txt 
word4^Iword2
word4^Iword2
word3^Iword1
word1^Iword3
$> mawk 'BEGIN {RS=" "} {print}' /tmp/file.txt | sort -u

word1
word2
word3
word4
$> mawk 'BEGIN {RS=" "} {print}' /tmp/file2.txt | sort -u

word1	word3
word3	word1
word4	word2
$> mawk '{for (i=1;i<=NF;++i){if (!duplicate[$i]++){print $i}}}' /tmp/file2.txt | sort
word1
word2
word3
word4
This also adds a newline to the result (since the newline is no longer an input record separator):

Code: Select all

$> mawk '{for (i=1;i<=NF;++i){if (!duplicate[$i]++){print $i}}}' /tmp/file.txt | wc -l
4
$> mawk 'BEGIN {RS=" "} {print}' /tmp/file.txt | sort -u | wc -l
5
ImageShare your Debian SCRIPTS
There will be neither barrier nor walls, neither official nor guard, there will be no more desert and the entire world will become a garden. — Anacharsis Cloots

lindi
Debian Developer
Debian Developer
Posts: 594
Joined: 2022-07-12 14:10
Has thanked: 2 times
Been thanked: 117 times

Re: unique sorted search from text file

#5 Post by lindi »

Code: Select all

tr " " "\n" < file.txt | sort -u

User avatar
fabien
Forum Helper
Forum Helper
Posts: 1158
Joined: 2019-12-03 12:51
Location: Anarres (Toulouse, France actually)
Has thanked: 101 times
Been thanked: 265 times

Re: unique sorted search from text file

#6 Post by fabien »

lindi wrote: 2024-09-22 12:21

Code: Select all

tr " " "\n" < file.txt | sort -u
Nice. Can even handle tabs (edit: not perfect when there are multiple spaces):

Code: Select all

$> tr " \t" "\n" < /tmp/file2.txt | sort -u
word1
word2
word3
word4
I just thought of this one:

Code: Select all

$> grep -o "[^[:blank:]]\+" /tmp/file2.txt | sort -u
word1
word2
word3
word4
Also (edit: not perfect when there are spaces at the end):

Code: Select all

$> sed -E 's/([^[:blank:]]+)([[:blank:]]+)/\1\n/g' /tmp/file2.txt | sort -u
word1
word2
word3
word4
ImageShare your Debian SCRIPTS
There will be neither barrier nor walls, neither official nor guard, there will be no more desert and the entire world will become a garden. — Anacharsis Cloots

m4c-attack
Posts: 68
Joined: 2023-10-09 05:06
Has thanked: 57 times
Been thanked: 9 times

Re: unique sorted search from text file

#7 Post by m4c-attack »

lindi wrote: 2024-09-22 12:21

Code: Select all

tr " " "\n" < file.txt | sort -u
Beat me to it 😭

(I didn't know sort had a unique flag tho and unnecessarily piped into uniq in my solution

Post Reply