bash variables, newlines and double-quotes...

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
bitrat
Posts: 99
Joined: 2023-07-20 09:41
Has thanked: 5 times

bash variables, newlines and double-quotes...

#1 Post by bitrat »

Hi,

could somebody point me to the formal explanation for this behaviour please?

Code: Select all

$ JUNK=$(seq 1 4)

$ echo "$JUNK"
1
2
3
4

$ echo $JUNK
1 2 3 4
:linked:

Aki
Global Moderator
Global Moderator
Posts: 3207
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 88 times
Been thanked: 427 times

Re: bash variables, newlines and double-quotes...

#2 Post by Aki »

Moved to “Programming” sub-forum.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

gerard4143
Posts: 6
Joined: 2024-05-10 07:30
Has thanked: 1 time

Re: bash variables, newlines and double-quotes...

#3 Post by gerard4143 »

Try googling bash IFS.

The short answer is bash has a IFS(Internal Field Separator) value that is composed of(by default) a space and newline and tab character.

Code: Select all

IFS=$' \n\t'
When a variable is quoted, the newline is not treated as a field separator.

Please note about the command seq:

Code: Select all

seq - print a sequence of numbers
...
-s, --separator=STRING
              use STRING to separate numbers (default: \n)
...
Last edited by gerard4143 on 2024-05-22 07:21, edited 2 times in total.

lindi
Debian Developer
Debian Developer
Posts: 482
Joined: 2022-07-12 14:10
Has thanked: 2 times
Been thanked: 91 times

Re: bash variables, newlines and double-quotes...

#4 Post by lindi »

Shell scripting is hard, don't do it.

Code: Select all

echo $JUNK
calls echo with four arguments. Echo prints each argument separated by space.

Code: Select all

echo "$JUNK"
calls echo with one argument. The argument includes newline characters.

gerard4143
Posts: 6
Joined: 2024-05-10 07:30
Has thanked: 1 time

Re: bash variables, newlines and double-quotes...

#5 Post by gerard4143 »

lindi wrote: 2024-05-22 07:58 Shell scripting is hard, don't do it.
Words to live by...

User avatar
fabien
Forum Helper
Forum Helper
Posts: 823
Joined: 2019-12-03 12:51
Location: Anarres (Toulouse, France actually)
Has thanked: 77 times
Been thanked: 194 times

Re: bash variables, newlines and double-quotes...

#6 Post by fabien »

The origin of the world
man 1 dash
HISTORY
dash is a POSIX-compliant implementation of /bin/sh that aims to be as small as possible.
man 1 bash
DESCRIPTION
[...]
Bash is intended to be a conformant implementation of the Shell and Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1).
[...]
SEE ALSO
[...]
Portable Operating System Interface (POSIX) Part 2: Shell and Utilities, IEEE --
http://pubs.opengroup.org/onlinepubs/9699919799/
IEEE Std 1003.1-2017 2. Shell Command Language

2.6 Word Expansions
The order of word expansion shall be as follows:

1. Tilde expansion (see Tilde Expansion), parameter expansion (see Parameter Expansion), command substitution (see Command Substitution), and arithmetic expansion (see Arithmetic Expansion) shall be performed, beginning to end. See item 5 in Token Recognition.

2. Field splitting (see Field Splitting) shall be performed on the portions of the fields generated by step 1, unless IFS is null.

3. Pathname expansion (see Pathname Expansion) shall be performed, unless set -f is in effect.

4. Quote removal (see Quote Removal) shall always be performed last.
2.2 Quoting
Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragraph, prevent reserved words from being recognized as such, and prevent parameter expansion and command substitution within here-document processing (see Here-Document).

The application shall quote the following characters if they are to represent themselves:

| & ; < > ( ) $ ` \ " ' <space> <tab> <newline>

and the following may need to be quoted under certain circumstances. That is, these characters may be special depending on conditions described elsewhere in this volume of POSIX.1-2017:

* ? [ # ˜ = %

The various quoting mechanisms are the escape character, single-quotes, and double-quotes. The here-document represents another form of quoting; see Here-Document.

2.2.1 Escape Character (Backslash)
A <backslash> that is not quoted shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the shell shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into tokens. Since the escaped <newline> is removed entirely from the input and is not replaced by any white space, it cannot serve as a token separator.

2.2.2 Single-Quotes
Enclosing characters in single-quotes ( '' ) shall preserve the literal value of each character within the single-quotes. A single-quote cannot occur within single-quotes.

2.2.3 Double-Quotes
Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the exception of the characters backquote, <dollar-sign>, and <backslash>
The answer 2.6.5 Field Splitting
After parameter expansion (Parameter Expansion), command substitution (Command Substitution), and arithmetic expansion (Arithmetic Expansion), the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result.

The shell shall treat each character of the IFS as a delimiter and use the delimiters as field terminators to split the results of parameter expansion, command substitution, and arithmetic expansion into fields.

1. If the value of IFS is a <space>, <tab>, and <newline>, or if it is unset, any sequence of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field. For example, the input:

<newline><space><tab>foo<tab><tab>bar<space>

yields two fields, foo and bar.

2. If the value of IFS is null, no field splitting shall be performed.

3. Otherwise, the following rules shall be applied in sequence. The term " IFS white space" is used to mean any sequence (zero or more instances) of white-space characters that are in the IFS value (for example, if IFS contains <space>/ <comma>/ <tab>, any sequence of <space> and <tab> characters is considered IFS white space).

a. IFS white space shall be ignored at the beginning and end of the input.

b. Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously.

c. Non-zero-length IFS white space shall delimit a field.
The Dash manual wording
Lexical Structure
The shell reads input in terms of lines from a file and breaks it up into words at whitespace (blanks and tabs), and at certain sequences of characters that are special to the shell called “operators”.
The Bash manual wording
Word Splitting
The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is
exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of
IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters space, tab, and newline are ignored
at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any
adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.
Variables must always be quoted in an interactive shell or in a shell script.

Code: Select all

$> ls
plop
$> JUNK="1    2    *    3   "
$> declare -p JUNK
declare -- JUNK="1    2    *    3   "
$> echo "$JUNK"
1    2    *    3
$> echo $JUNK
1 2 plop 3
Problem:

Code: Select all

$> for word in "$JUNK"; do echo "$(( ++COUNT )) >>>$word"; done
1 >>>1    2    *    3
Lazy people do this:

Code: Select all

$> for word in $JUNK; do echo "$(( ++COUNT )) >>>$word"; done
2 >>>1 
3 >>>2 
4 >>>plop 
5 >>>3
Solution:

Code: Select all

$> declare -a JUNK=( "1  " "  2 " "   *  " "  3   " )
$> declare -p JUNK 
declare -a JUNK=([0]="1  " [1]="  2 " [2]="   *  " [3]="  3   ")
$> for word in "${JUNK[@]}"; do echo "$(( ++COUNT )) >>>$word"; done
6 >>>1   
7 >>>  2  
8 >>>   *   
9 >>>  3
ImageShare your Debian SCRIPTS
There will be neither barrier nor walls, neither official nor guard, there will be no more desert and the entire world will become a garden. — Anacharsis Cloots

bitrat
Posts: 99
Joined: 2023-07-20 09:41
Has thanked: 5 times

Re: bash variables, newlines and double-quotes...

#7 Post by bitrat »

lindi wrote: 2024-05-22 07:58 Shell scripting is hard, don't do it.
Like everybody, I do it a lot more than I'd admit.

lindi wrote: 2024-05-22 07:58

Code: Select all

echo $JUNK
calls echo with four arguments. Echo prints each argument separated by space.

Code: Select all

echo "$JUNK"
calls echo with one argument. The argument includes newline characters.
Ha, yes, this makes sense. Thanks! :)

For some reason I was thinking of echo not evaluating args...

(echo junk) vs (echo 'junk) vs (echo a b c d) vs (echo 'a 'b 'c 'd) vs (echo '(a b c d)) ...ad infinitum, lol.

bitrat
Posts: 99
Joined: 2023-07-20 09:41
Has thanked: 5 times

Re: bash variables, newlines and double-quotes...

#8 Post by bitrat »

fabien wrote: 2024-05-22 12:16 The origin of the world
Thanks! Yes, I have to get more familiar with the environment variables like IFS. Very handy sometimes. I use printf more than echo, and I guess it's a bit clearer what will be evaluated.

Generally I use bash commands that fit on a command line. If I use the same command a lot I'll stick it in a script, maybe with an argument or two and maybe some working directory logic. Usually something breaks because of quote/evaluate errors, and I usually fix it same as you, with some explicit variables, altho I rarely use arrays. I then try to elaborate the script until it becomes clear that it's impossible or would be much easier in python or C... :D

As a result, I write many times more bash scripts than python or C programs, because the time required to write a flexible tool is much more than the time to write a one off command for a specific task.

I haven't learned to use the bash debug mode. Is there a way to trace intermediate forms?

I wasn't even aware of dash! It looks great. Apparently a drop in replacement for /bin/sh Is that recommended?

Lol, it already is!

$ ls -l /usr/bin/sh
lrwxrwxrwx 1 root root 4 Jul 16  2023 /usr/bin/sh -> dash

I spent a lot of time on bare bone systems so I'm in the habit of using tools installed by default (nano, vi, bash, gcc). Also mostly on solaris and centos, not debian. Maybe now I'm just being masochistic!

steve_v
df -h | grep > 20TiB
df -h | grep > 20TiB
Posts: 1467
Joined: 2012-10-06 05:31
Location: /dev/chair
Has thanked: 94 times
Been thanked: 230 times

Re: bash variables, newlines and double-quotes...

#9 Post by steve_v »

lindi wrote: 2024-05-22 07:58Shell scripting is hard, don't do it.
Dunno about "hard", but it can certainly be a bit perverse, and the whole quoting / escaping shenanigans and interaction between non-printing characters and parameter expansion is a perennial source of entertainment.
bitrat wrote: 2024-05-22 22:07 I haven't learned to use the bash debug mode. Is there a way to trace intermediate forms?
Is 'set -x' what you are looking for?
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.

bitrat
Posts: 99
Joined: 2023-07-20 09:41
Has thanked: 5 times

Re: bash variables, newlines and double-quotes...

#10 Post by bitrat »

steve_v wrote: 2024-05-23 05:14
bitrat wrote: 2024-05-22 22:07 I haven't learned to use the bash debug mode. Is there a way to trace intermediate forms?
Is 'set -x' what you are looking for?
Yes, that's it, thanks. I've seen it around but can never remember it when I need it. set looks pretty handy, tbh.

Code: Select all

$ JUNK=$(seq 1 4)
++ seq 1 4
+ JUNK='1
2
3
4'

$ echo $JUNK
+ echo 1 2 3 4
1 2 3 4

$ echo "$JUNK"
+ echo '1
2
3
4'
1
2
3
4

Aki
Global Moderator
Global Moderator
Posts: 3207
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 88 times
Been thanked: 427 times

Re: bash variables, newlines and double-quotes...

#11 Post by Aki »

Please, mark the discussion as "solved" by manually adding the text tag "[Solved]" to the beginning of the subject of the first post (after any other tags).
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply