[GUIDANCE] "For" in Bash

https://programming.dev/post/43536734

[GUIDANCE] "For" in Bash - programming.dev

I’m playing around with Bash just to learn. LIST=$(ls); for i in $LIST; do echo "I found one!"; done The variable “i” could literally be anything, as long as it doesn’t have a special meaning for Bash, in which case I’d have to escape it, right? Anyway, my real question is: how does do (or rather the whole for-expression) know that “i” here means “for every line/item that ls outputs”? The above one liner works great and writes “I found one!” the number of times corresponding to the number of lines or items that ls outputs. But I would like to understand why it worked… I’m a complete beginner at both Bash and C, but I understand some basic concepts.

Bash Reference Manual

Bash Reference Manual

Reading this part of the Bash manual for the third time today, I think I finally understood it better, thanks to this part in particular:

[…]execute commands once for each word in the resultant list […]

In other words, whatever follows in is half expected to result in a list, each for which command is then executed. Beyond that, I guess I’d have to simply look at the logic behind for-expressions.

Thanks!

Yeah it’s really not complicated, and it’s nearly plain English. For item in things, do action.
Anything immediately in the position after for is an assignment of whatever you put there as a temporary variable inside the loop. You can call it whatever you want. The “i” is just used a lot in examples in programming for “item” or “iterate”, but you can literally call it anything. Anything that refers to it later will have a single item from the list in $LIST assigned to it for each run through the loop.

You’ve got a few things going on to be broken down here.

And forgive me if anything I say here seems condescending, it’s not meant to be, I just like to be very explanatory with things like this and to assume the reader may not know anything about anything. (Not as an insult, but simply as a gap in knowledge).

Also, while I’m proficient at Bash, I’m no expert.

LIST=$(ls): Here you’ve stored the output of the ls command to the variable LIST, which gives you a list of items in the given directory, in this case, whichever directory the command is run from. It’s also a good idea to quote the variable assignment like this: “$(ls)”.

for i in $LIST;: This is the first part of the for loop statement, which is an iterator, meaning, it will loop or iterate over every item in the given variable/parameter/group of iterable items.

The i here, as you said could be anything. You could say for file in $LIST; or for item in $LIST;. It doesn’t matter, because it’s just a variable name that you are using in the first part of the for statement.

So what bash will do with this is loop over the list, and take each item in the list, and assign it to the variable i, which will allow you do act upon that single item by calling the variable i in some other commands.

do echo “I found one!”;: This is the next part of the for loop, which is the instruction set to be executed inside the for loop. Here is where you can act upon the items in your list that have been assigned to the variable i.

In your case, you’re just printing a statement to stdout (standard out), that says, “I found one!”

It’s like saying, for each item in this list, print “I found one!”

So if there are 20 items in the list, it will print that statement 20 times.

However, maybe you want to print the item itself as part of a statement. So instead of “I found one!”, you could do something like:

do echo “I found $i!”

Which then would print “I found some-filename-or-directory-here!” for each item in your list.

done: Finally, the done statement tells bash that this is the end of the for loop. So any commands after the done statement will only run once the for loop has iterated over all items in the list and executed the commands inside the for loop for each item on the list.

A couple of notes:

The ; is used as a command separator or terminator. So bash knows to first run LIST=$(ls) before it attempts to run whatever the next command might be.

In bash, it’s good practice to always quote your variables like so: for i in “$LIST”;. This is to avoid errors for characters that might need escaping like whitespace, backslashes, and other special characters.

With that in mind, if you’re running a command like echo “I found $i!”, you don’t need to quote the variable again, because it’s already inside a quote set.

Further, it’s not absolutely necessary, but it can also be a good idea to also enclose all of your variables in {}, so whenever you use a variable, you’d do something like: “${LIST}”

This not only more clearly identifies variables in your bash scripts/commands, but is necessary when using bash’s parameter expansion, which is pretty great.

I was a teacher for some years and I absolutely understand your style of explanation. I don’t find it condescending at all! Thank you so much for the in depth guidance! Some of it I already knew, some of it I didn’t. Anyhow, a new perspective is always appreciated! :) God, Bash (GNU/Linux in general) is so much fun!

I was also a teach for a number of years! Hello fellow teacher. :)

I agree. Bash, and GNU/Linux in general is amazing. My recent foray has been into Python, and I’m having an utter blast writing code and learning.

Wouldn’t for i in “$LIST”; just result in a single loop iteration with $i being the entirety of $LIST?

It would not, as @[email protected] explained in their comment (which I neglected to include in my explanation), Bash uses a special variable called IFS when executing for loops like this. IFS stands for Input Field Separators, and is a list of one of each type of whitespace (tab, space, and newline), and uses these as separators automatically.

So instead of taking that whole ls output as one string of text, the for loop automatically separates it into an iterable list of strings using the newline separator.

Which makes it real fun when you have spaces in filenames!

Really you shouldn’t use ls as input to for. Use find -exec or something.

I’m pretty sure that IFS does not apply to quoted strings since word splitting happens before the quote removal (see Shell Expansion).

$ ( files=$(ls); IFS=$'\n' ; for x in $files; do echo $x; done ) file a.txt file b.txt plainfile.txt $ ( files=$(ls); IFS=$'\n' ; for x in "$files"; do echo $x; done ) file a.txt file b.txt plainfile.txt
Shell Expansions (Bash Reference Manual)

Shell Expansions (Bash Reference Manual)

I didn’t realize that. Thanks for pointing that out!

for loops

Your code executes ls and records the results in a variable. The result is some text, a string of characters. (We call them “strings” and i is now a string variable.) Among the characters in a string variable might be spaces, tabs, or new line characters. I mention this because the special variable IFS is used by for loops, and it contains exactly one space, tab, and new line by default.

When you call for with a string as the input, it splits the string into units by splitting on each character in IFS. That is, it splits the big string into individual parts by splitting at each space, tab and new line. So this creates an array which is what is looped over. Each word in turn is assigned to your looping variable and then the code after the do is executed once per word.

(“word” has a sort of a special meaning here. When I say word, I mostly just mean a string that has no spaces in it. When you read text in English, there are words. They’re strings of characters separated by spaces. But words can also be separated by tabs, new lines, commas, semicolons, or whatever, but not by default when using for! You have to modify IFS to add those characters if you want them to be considered word separators.)

So, if any of the file system entries returned by ls have spaces in them, your loop is going to create more outputs than there are file system entries in the current directory.

For example:

file one.txt file two.txt my photo.jpg notes (final).md a b c d.txt

That would cause like 12 loops and 12 outputs in your code despite there only being five files.

If you instead overwrite IFS before running your loop, and only assign a single new line to the variable, then your loop will only be over the actual lines of the input text. Like this:

IFS=$‘\n’

and then use your exact code above. Using my example of five files, this code will now only produce 5 outputs, not 12.

You can assign whatever characters you want to IFS.

(I have not tested any of this code, or examples.)

Variable names

The loop variable name i is just an identifier. Any valid variable name would work except you can’t use the reserved names like $1, $2 or any keywords as names. Also, there’s no way to escape an identifier. They are just literal names.

You also don’t want to use any built-in variable names or else you’ll overwrite their values for the duration of the current session. Variable names like IFS, for instance, which is a special variable that determines how for loops split text.

See here:

Since IFS contains a space, your loop will output one line for each word in the input. So you might get more lines than there are file sytem entries returned by ls (if any have spaces in the names). Read the link to learn more.

The Meaning of IFS in Bash Scripting | Baeldung on Linux

Learn how to use the Internal Field Separator shell variable in Bash.

Baeldung on Linux
This about the IFS variable was eye opening! Thank you SO much! This is exactly what I was trying to understand, namely, how on earth the for-loop is smart enough to understand how to count when I haven’t specified a numerical interval (as I do in for instance C when I practice that). This just solved it all. Thanks! Now I understand why my code gave me excessive outputs when I changed ls into ls -l. The IFS variable made the for-loop count every single blank space!!! :D
For maximum pedantry, it may be worth mentioning that filenames in typical Linux file systems can contain newline characters.