Bash notes 3
Another Exercism Bash track exercise, another opportunity to learn from the community solutions. This time I came across a few nice Bash language features that I'd probably known about but forgotten due to lack of practice.
The exercise in question is Atbash Cipher and the features that I wanted to share with you are from user Victor Guthrie's solution.
Array definition
The first line in Victor's solution is as follows:
alphabet=({a..z})
The concise nature of this is quite striking. There are two mechanisms at play here. The first is the outer brackets (...)
. Brackets are used in different contexts in Bash, but here, without any leading symbol before the opening bracket, and in the context of an assignment to a variable, they represent the definition of an array.
Here's a simple example, with a variable letters
declared thus:
letters=(a b c)
This results in letters
being an array of three elements, the values a
, b
and c
. We can check this as follows:
for letter in ${letters[@]}; do echo "-> $letter"; done
This produces:
-> a
-> b
-> c
Brace expansion
So what's the {a..z}
inside of the brackets in this particular case? Well, given the variable name and the a
and the z
we can probably reasonably guess that it's all the letters in the alphabet.
And we'd be right. But what is that construct and how does it work? I find that one of the key aspects of learning Bash and any language is knowing what things are called, so you can research them in the documentation.
And the {...}
construct is called brace expansion, which is described as:
A mechanism by which arbitrary strings may be generated.
There are plenty of other expansion mechanisms in Bash, which are documented in the Shell Expansions section of the manual.
Anyway, this example will expand the characters a
and z
lexicographically, using the default C locale, resulting in every letter of the alphabet. What brace expansion offers is more than this, and I'd recommend you take a quick look at the page in the manual. To give you a taste, you can do things like this:
echo {a,b}{1..3}
which results in:
a1 a2 a3 b1 b2 b3
You can even use numbers, with an optional increment value, like this:
; echo {1..10..2}
the ..2
is the optional increment, and this expands to:
1 3 5 7 9
Nice!
Character classes and bracket expressions
In the main
function of Victor's solution, the first line is this:
local trimmed="${2//[^[:alnum:]]/}"
We've come across some of the constructs here before, but let's break this down to get to something I'd vaguely known about but never could remember how to express it, until now.
The value being assigned to the locally declared variable trimmed
is the result of a shell parameter expansion. Specifically it's this construct, where pattern
is replaced with string
inside of the given parameter
:
${parameter/pattern/string}
In fact, the version we see in the solution is the "all matches" version, where pattern
itself begins with a /
; this is described in the manual thus:
If pattern begins with ‘/’, all matches of pattern are replaced with string.
In other words we should first see this in the expression:
local trimmed="${2//.../}"
By the way, the 2
here refers to the second positional parameter passed into the main
function.
Note that the string
is empty here; in other words, every occurrence of where pattern
is matched is replaced with nothing, i.e. effectively removed.
So what is the pattern? Let's now stare at it for a second:
[^[:alnum:]]
There are two things going on here. Well, three if you count the ^
separately. Working from the outside in, we start with a bracket expression, thus:
[...]
This is simply a list of characters, where any of them can be matched. It's also possible to use a "range expression" instead of a list of characters, so a-c
would match a
, b
or c
. It's even possible to combine range expressions with single characters. For example, this:
fruit=bananas
echo ${fruit//[sa-c]/_}
would result in:
__n_n__
A circumflex (^
) in the first character position of the bracket expression negates the characters listed, so this:
fruit=bananas
echo ${fruit//[^sa-c]/_}
would result in:
ba_a_as
So we can see that
[^[:alnum:]]
has a brace expression which is negating something ([^...]
) but it's neither one or more single characters nor a range expression. It's this:
[:alnum:]
This is a "character class", of which there are several, described in the Character Classes and Bracket Expressions part of the manual, and "alnum" is short for "alphanumeric", basically meaning letters and numbers. The equivalent brace expression for [:alnum:]
would be [0-9a-zA-Z]
.
With this in mind, we now know what's happening with this line:
local trimmed="${2//[^[:alnum:]]/}"
The variable trimmed
is being given the value of $2
(the second positional parameter passed to the function) but with anything that's not a letter or a number removed.
My problem in the past was that I hadn't taken enough time to stare at the different parts of expressions like this, and could therefore not quite remember whether the opening square brackets went together or not. But now I know that the outermost pair is the bracket expression, and the inner pair is the character class, it is now obvious that the ^
negation must go between the opening two square brackets, as they each belong to two completely separate parts.
Arithmetic evaluation
The final line in the solution that I want to stare at for a moment is this one:
((i < length - 1)) && ((i % 5 == 4)) && output+=' '
In my first solution to this exercise I had a similar approach to adding a space every few characters (this was part of the requirements for the encoding output in the task), but my equivalent line was a lot noisier. Victor's version is cleaner and very pleasant to read.
It uses the double parentheses construct for a couple of arithmetic expression evaluations. As well as regular numeric expressions, shell arithmetic allows for logical expressions too, which is what we see here in both examples, where the operator in the first example is <
("is i less than one less than the value of length?") and the operator in the second example is ==
("is i modulo 5 equal to 4?").
If both of these arithmetic expressions evaluate to true then the final expression
output+=' '
causes a space to be added to the end of the value in output
.
That's about it for this community solution. There's always lots to learn from reading code, and I'm getting a lot out of the community solutions on Exercism. Thanks folks!