It's not often that I'm relaxed enough to be aware of how my mind is (or isn't) working, and what it's doing. So it was a surprise when I realised that what I've been doing for the past 15 minutes is descending through multiple levels into some classic yak shaving territory.
I thought I'd write about another Exercism community solution that caught my eye this morning. So I went to my blog repository locally, and thought:
Actually, what I need is an updated version of my old script that sets up a new blog post file, so I can streamline the authoring of a new post.
I've recently moved to 11ty and it's a decent static site generator; it has introduced a slightly new structure, and I'm happy with it so far, but it means I need a slightly different workflow to create a new blog post file.
Anyway, this thought should have been an early warning sign, but I sort of ignored it.
Then, in thinking about what I'd want this script to do, I started to think about what input I'd give it. Initially just the blog post title, perhaps, but then:
What about tags, and how would I specify them? Why don't I choose them from a list? But then how would I determine that list?
The tags in any given post are declared in the frontmatter; here's the frontmatter for the previous post Bash notes 2:
--- layout: post title: Bash notes 2 tags: - shell - til - exercism ---
I had the idea of pulling out all the tags from all the Markdown files that represented posts. But how would I do that? I quickly descended to the next level down in my yak shaving journey.
I could simply look through each of the files for any line that started with a couple of spaces, had a dash, and then a word. But I couldn't be sure that this approach wouldn't be too eager, and match blog post body content that wasn't tag related. So I thought it best to match those lines where
tags: preceded them.
I had an inkling that something like multiline matching with
grep might help, or even
sed. There was a related question on Stack Overflow to which this answer seemed as intriguing as it was concise:
sed -e '/abc/,/efg/!d' [file-with-content]
The first iteration of translating this into my requirements, and trying it out on the blog post files for this year so far, looks like this:
sed -e '/^tags:/,/---/!d' 2022-*
This gave me the following output:
A second iteration, adding a second instruction
/^ - /!d to search within the results for just the tag lines, looks like this:
sed -e '/^tags:/,/---/!d; /^ - /!d' 2022-*
And this gave me (output reduced for brevity):
So there are two more tasks here - to reduce each line to just the tag name (i.e. to remove the bullet point and spaces) and to deduplicate the list.
As we're already in
sed mode, the first of these reductions might as well be a third instruction, specifically
s/ - //, like this:
sed -e '/^tags:/,/---/!d; /^ - /!d; s/^ - //' 2022-*
This results in:
And while we could turn to
uniq to deduplicate the list, we'll have to sort it first anyway, so we might as well use the
-u option to
sed -e '/^tags:/,/---/!d; /^ - /!d; s/^ - //' 2022-* | sort -u
This gives us what we want, a nice clean, unique list of tags:
I can now use this with fzf and its multi select mode to give me the option of choosing one or more tags:
sed -e '/^tags:/,/---/!d; /^ - /!d; s/^ - //' 2022-* | sort -u | fzf -m
This gives me a nice interface like this:
(Here, I've selected the three tags
bats, and my selection cursor is currently pointing to
Great, I can now get on with putting the script together. I'll also need a way to specify a new tag if it's not in the list, but I'll deal with that when I get to it.
But I'm not done with my descent yet. I'm not really sure exactly what the
!d part in the first
sed instruction is, and how it works. So at this point I send the sed manual to my trusty Nexus 9 tablet, and head off to make a cup of coffee to enjoy while reading and learning more about this venerable stream editor that's been around for almost half a century.
I'm further away than ever from writing that post about the Exercism community solution I'd seen, but that's all fine. Yak shaving doesn't feel so bad when you're aware of when you're doing it.
I've had my coffee and read some of the manual. It's now clear to me how the initial
sed invocation works. Here it is in isolation:
The first thing I needed to realise is that the
! doesn't belong to the
d, it belongs to the part before it.
The sed script overview explains that
sed commands have this structure:
where "addr" is an address and "X" represents the actual command, or operation.
Looking at the Addresses section, we see that there are multiple ways of specifying lines that the given command is to operate upon. The specifications include direct line numbers ("numeric addresses"), and text matching ("regexp addresses"). Moreover, a range can be specified, with the start and end specifications joined with a comma
This is all fine, and we grokked that in building our sed instructions earlier. But the thing I didn't realise is that the
! character is part of the "addr" specification (not part of the "X" command) and serves to negate whatever address was specified.
In other words, the "addr" part is actually:
which means "all the lines that are NOT in this range". And then the
d command deletes what's specified, i.e. deletes everything apart from sequences like this:
tags: - shell - til - exercism ---
So there you have it.