Reshaping data values using jq's with_entries
Receipt of a JSON file containing valid tags for tutorial metadata gave me the perfect opportunity to explore it and learn a bit more jq in the process.
For each of our tutorials in SAP's Tutorial Navigator, we have metadata in the frontmatter. Here's an example from the Learn About OData Fundamentals tutorial:
author_name: DJ Adams
author_profile: https://github.com/qmacro
title: Learn about OData Fundamentals
description: Discover OData's origins and learn about the fundamentals of OData by exploring a public OData service.
auto_validation: false
primary_tag: software-product>sap-business-technology-platform
tags: [ software-product>sap-business-technology-platform, topic>cloud, programming-tool>odata, tutorial>beginner ]
time: 15
I received a JSON file with updated valid tags, against which I could check the values for the primary_tag
and tags
properties. The tags were arranged like this (drastically reduced to save space here):
{
"level": [
{
"name": "Beginner",
"value": " tutorial>beginner"
},
{
"name": "Intermediate",
"value": " tutorial>intermediate"
},
{
"name": "Advanced",
"value": " tutorial>advanced"
}
],
"common": [
{
"name": "ABAP Connectivity",
"value": "topic>abap-connectivity"
},
{
"name": "ABAP Development",
"value": "programming-tool>abap-development"
},
{
"name": "ABAP Extensibility",
"value": "programming-tool>abap-extensibility"
},
{
"name": "Android",
"value": "operating-system>android"
},
{
"name": "Artificial Intelligence",
"value": "topic>artificial-intelligence"
},
{
"name": "Big Data",
"value": "topic>big-data"
}
]
}
I wanted to explore the tags by "category", the part before the >
symbol in the value
properties. In the above excerpt (in the common
object, which is where the main list of tags are), there are the following categories: topic
, programming-tool
and operating-system
.
Separating tags from categories with split
First, I used split to separate out the categories and tags by splitting on the >
symbol in each of the values.
.common
| map(.value | split(">"))
This produces an array of arrays. The outer array is the result of running map
(which takes an array and produces an array) and the inner arrays are the result of running split
on each category>tag
pattern in the value
properties:
[
[
"topic",
"abap-connectivity"
],
[
"programming-tool",
"abap-development"
],
[
"programming-tool",
"abap-extensibility"
],
[
"operating-system",
"android"
],
[
"topic",
"artificial-intelligence"
],
[
"topic",
"big-data"
]
Grouping by categories with group_by
The categories are the first values in each of the inner arrays, so next is to group the inner arrays by those categories:
.common
| map(.value | split(">"))
| group_by(.[0])
The .[0]
supplied to group_by
specifies that it's the first element of each inner array that should be the basis of grouping (i.e. the categories topic
, programming-tool
, programming-tool
, etc).
This produces a differently shaped nesting of arrays, one for each of the categories:
[
[
[
"operating-system",
"android"
]
],
[
[
"programming-tool",
"abap-development"
],
[
"programming-tool",
"abap-extensibility"
]
],
[
[
"topic",
"abap-connectivity"
],
[
"topic",
"artificial-intelligence"
],
[
"topic",
"big-data"
]
]
]
Reforming the structure with the entries functions
Now comes the task to reform that essential structure into something a little less "noisy". Using the entries family of functions, this turned out to be quite straightforward. That said, I'll explain the intermediate steps I went through on the way.
Getting from an array-based to an object-based structure with to_entries
As I wanted an object, with the keys being categories, and the values being arrays of tag strings, it felt right to reach for the to_entries
function:
.common
| map(.value | split(">"))
| group_by(.[0])
| to_entries
This produced the following:
[
{
"key": 0,
"value": [
[
"operating-system",
"android"
]
]
},
{
"key": 1,
"value": [
[
"programming-tool",
"abap-development"
],
[
"programming-tool",
"abap-extensibility"
]
]
},
{
"key": 2,
"value": [
[
"topic",
"abap-connectivity"
],
[
"topic",
"artificial-intelligence"
],
[
"topic",
"big-data"
]
]
}
]
Tidying up with object construction in a map
That is sort of the direction I want to go, but there's some tidying up to do, to get cleaner values for key
and value
. So I reached for map
to do this:
.common
| map(.value | split(">"))
| group_by(.[0])
| to_entries
| map({key: .value[0][0], value: .value|map(.[1])})
The expression passed to map
is the object construction ({...}
), creating objects each with two properties, key
and value
. The reason for staying with these property names will become clear shortly.
The value for key
is expressed as .value[0][0]
, i.e. the first (zeroth) element of the inner array that is the first (zeroth) element of the array that is the value of the value
property.
In other words, given the last object in the above most recent intermediate results:
{
"key": 2,
"value": [
[
"topic",
"abap-connectivity"
],
[
"topic",
"artificial-intelligence"
],
[
"topic",
"big-data"
]
]
}
Then .value[0][0]
will return "topic"
(specifically, the first instance of that string in the above JSON).
Similarly, to build the value for the new value
property in the object being constructed, I used this expression: .value|map(.[1])
. The current value of the value
property is an array, so using map
on that will produce another array. Of what? Well, of these values: .[1]
.
In other words, the second (index 1) value in each of the sub arrays. Given this same last object example above, .value|map(.[1])
produces ["abap-connectivity", "artificial-intelligence", "big-data"]
.
Running this latest iteration with the map
function produces this:
[
{
"key": "operating-system",
"value": [
"android"
]
},
{
"key": "programming-tool",
"value": [
"abap-development",
"abap-extensibility"
]
},
{
"key": "topic",
"value": [
"abap-connectivity",
"artificial-intelligence",
"big-data"
]
}
]
Almost there!
Creating a neat structure with from_entries
According to the manual, the to_entries
and from_entries
"convert between an object and array of key-value pairs". In each case, the name for the key and value properties are key
and value
respectively. I had an inkling I would probably want to use from_entries
at some stage, and this is the reason why I kept the names of the properties earlier.
Let's have a look what passing the above structure into from_entries
produces:
.common
| map(.value | split(">"))
| group_by(.[0])
| to_entries
| map({key: .value[0][0], value: .value|map(.[1])})
| from_entries
It's this:
{
"operating-system": [
"android"
],
"programming-tool": [
"abap-development",
"abap-extensibility"
],
"topic": [
"abap-connectivity",
"artificial-intelligence",
"big-data"
]
}
That's very nice, and pretty much exactly what I want. A neat and low-noise representation of the category and tag structure.
Refactoring with with_entries
It turns out that the pattern:
to_entries -> map(...) -> from_entries
is common enough to have a function expression all of its own, and it's with_entries
. As detailed in the entries section of the manual:
with_entries(foo)
is shorthand forto_entries | map(foo) | from_entries
In fact, we can see how it's defined (which is exactly as described in the manual) in builtin.jq, along with to_entries
and from_entries
.
Wrapping up
While I've played around a little with the entries family, this is the first time I've used it for real. Going through the intermediate process of finding myself using map
actually has helped me reflect on with_entries
very well.