Bash notes 2

I looked at a couple of more solutions to another Exercism exercise in the Bash track - Scrabble Score. I was reminded of one particular feature of the case statement, and another solution was rather splendid in its approach and it reminded me a little of some functional programming techniques, or perhaps MapReduce.

I find Exercism great for practice but get as much if not more pleasure and insight from reading the Community Solutions - solutions to exercises that others have completed.

My initial solution to the Scrabble Score exercise was a little pedestrian, which I find acceptable at least as the first iteration, as long as it works. That said, I had been trying to write my solution to reflect, almost visually, the instructions, the core of which was this table:

Letter                           Value
A, E, I, O, U, L, N, R, S, T       1
D, G                               2
B, C, M, P                         3
F, H, V, W, Y                      4
K                                  5
J, X                               8
Q, Z                               10

I'd ended up with this:

declare word="${1^^}"
declare score=0
for ((i = 0; i < ${#word}; i++)); do
  [[ "AEIOULNRST" =~ ${word:$i:1} ]] && ((score += 1))
  [[ "DG" =~ ${word:$i:1} ]]         && ((score += 2))
  [[ "BCMP" =~ ${word:$i:1} ]]       && ((score += 3))
  [[ "FHVWY" =~ ${word:$i:1} ]]      && ((score += 4))
  [[ "K" =~ ${word:$i:1} ]]          && ((score += 5))
  [[ "JX" =~ ${word:$i:1} ]]         && ((score += 8))
  [[ "QZ" =~ ${word:$i:1} ]]         && ((score += 10))
done
echo "$score"

It was ok, if not a little "bulky".

In looking at other solutions, I came across one from user Devin Miller which did what I'd been looking to achieve, but in a much neater way:

total=0
for x in $(echo ${1^^} | grep -o .); do
    case $x in
        [AEIOULNRST]) ((total++));;
        [DG])         ((total+=2));;
        [BCMP])       ((total+=3));;
        [FHVWY])      ((total+=4));;
        K)            ((total+=5));;
        [JX])         ((total+=8));;
        *)            ((total+=10));;
    esac
done

I'd forgotten that the case statement allows for pattern matching. The Simplified conditions section of the Bash Beginners Guide states: "Each case is an expression matching a pattern". What sort of pattern? Well, the Bash Manual explains, in section 3.5.8.1 on Pattern Matching. In Devlin's solution here, the [...] construct is used for each case expression, which "matches any of the enclosed characters". Of course! This makes for a much more concise way of expressing that scoring table. I think, for symmetry, I'd have used ((total+=1)) for the first case, just to match the rest, but there you go.

One note on the command substitution in the for line above. There's nothing in the rules that says that external commands, that would normally and perhaps naturally be part of any Bash script solution (after all, Bash scripts are great for encoding UNIX style constructs) so the use of the external grep command here is fine. And it's an interesting way to iterate through the letters of the word passed to the scoring script.

The secret is in the -o option, short for --only-matching, and the man page describes this option thus:

Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Before we look at that, note that the ${1^^} parameter expansion results in an uppercased version of the value in $1.

So if $1 had the value hello, then the result of echo ${1^^} | grep -o . would be:

H
E
L
L
O

This feeds nicely into the for ... in style loop construct used. The effect, ultimately, is the same as the C-style for loop construct I used in my solution where I used a incrementing variable i to point to each letter of the word in turn, via the ${parameter:offset:length} style of parameter expansion.

I'd like to dwell briefly on another solution to this exercise, which looks like this:

set -eu
main() {
    local -l str="$1"
    str=${str//[^[:alpha:]]}
    str=${str//[aeioulnrst]/_}  # 1
    str=${str//[dg]/__}         # 2
    str=${str//[bcmp]/___}      # 3
    str=${str//[fhvwy]/____}    # 4
    str=${str//[k]/_____}       # 5
    str=${str//[jx]/________}   # 8
    str=${str//[qz]/__________} # 10
    echo ${#str}
}
main "$@"

This is a really interesting approach that appeals to my sense of beauty and intrigue - all the heavy lifting is done with the ${parameter/pattern/string} style of parameter expansion, specifically the one where all matches are replaced because the pattern actually begins with a / (i.e. it's ${str//[aeioulnrst]/_} rather than ${str/[aeioulnrst]/_}).

What is happening here is that after removing any characters that are not in the "alphabetic" POSIX class (see the POSIX Character Classes section of 18.1. A Brief Introduction to Regular Expressions), the letters are replaced by underscores, where the number of underscores in the replacement reflects the points for that letter. So for example an a is replaced with _ reflecting a single point for that letter, whereas an f is replaced with ____ reflecting four points for that letter. After all the replacements are done, the string is just a sequence of underscores, and how many underscores reflects the total number of points for that word (which is reflected in yet another style of parameter expansion, the length of a variable, via ${#parameter}). Lovely!

I don't know about you, but this sort of reminds me of the underlying philosophy of MapReduce, where the input is reduced to a sequence of simple, countable atoms - in this case, underscore characters. Given the "sequence" feeling that this solution also conveys, I think there's an element of FP philosophy too.

DJ Adams

Bash notes 2

Further reading