Migrating GitHub issue based URL bookmarks to wallabag
In this post I outline how I migrated my collection of reading list bookmarks, stored as issues in a GitHub repo, to wallabag which I'm now self-hosting.
Background
I use Miniflux for managing and reading posts from blogs to which I've subscribed, and it works very well. For the ad hoc articles and posts that I come across more generally and that I've wanted to bookmark for later reading, I've been using a GitHub repo url-notes, with a bookmarklet that I used to save an article as an issue:
javascript:(function () { window.open( 'https://github.com/qmacro-org/url-notes/issues/new?body=%27+encodeURIComponent(location.href)+%27&title=%27+encodeURIComponent(document.title) );})()
This sets the issue title to the article's title, and sets the article's URL as the (only) content in the body of the issue.
In this solution there's also an automated mechanism for tooting any notes I make on a post once I've read it and closed the issue. Here's an example toot for the URL bookmark and notes in issue 552. This was a bonus feature that I wasn't going to try to reproduce in the new wallabag based solution.
Self hosting
I am trying to reduce my reliance on third party services and also improve my actual consumption of bookmarked articles, and after a brief search for a self-hosted bookmarking service I came across the "read it later" app wallabag which seemed ideal, especially as I could host it as a Docker container.
I went for the SQLite option to keep things simple, using a Docker volume for persistency. Here's my compose file:
services:
wallabag:
container_name: wallabag
labels:
tsdproxy.enable: "true"
image: wallabag/wallabag
restart: unless-stopped
environment:
- SYMFONY__ENV__DOMAIN_NAME=https://wallabag.secret.ts.net
- SYMFONY__ENV__FOSUSER_REGISTRATION=true
- SYMFONY__ENV__FOSUSER_CONFIRMATION=false
- SYMFONY__ENV__SERVER_NAME="wallabag"
ports:
- "8060:80"
volumes:
- data:/var/www/wallabag/data
- images:/var/www/wallabag/web/assets/images
healthcheck:
test: ["CMD", "wget" ,"--no-verbose", "--tries=1", "--spider", "http://localhost/api/info"]
interval: 1m
timeout: 3s
volumes:
data:
images:
I'm using Tailscale (it's one of two key services I install automatically on every device I set up, the other being Docker) and use the excellent TSDProxy to have my containers available on my tailnet and served via TLS certificates (i.e. via HTTPS).
Some notes on the compose file:
- To have this container appear via HTTPS on my tailnet, I am uing the label
tsdproxy.enable
- While I'm mapping wallabag's natural port 80 to 8060, TSDProxy will automatically make that 443 (i.e. the default for HTTPS) so that I don't need to specify a port in the URL when using it in the browser
- The
DOMAIN_NAME
env var value needs to be the "base" URL for what you're going to use as it is used to construct URLs for CSS and other assets (see my comments on this issue for more info) - There's a default user
wallabag
with a default password but I wanted to use my own username; the option in the UI to register a new user is not shown by default, hence theFOSUSER_REGISTRATION
env var here set to totrue
explicitly (I also changed the password of the default user for security reasons, via the UI, once I'd logged in) - To avoid any messing about with email server configuration, I turned off the email based new user confirmation flow with
FOSUSER_CONFIGURATION
set tofalse
- The
secret
part of my tailnet domain name masks the real value for security reasons
After adding a few test articles, this is what my wallabag UI looked like:
Normal operation
Before I get to the migration notes, just a quick note on what I intend to do in in normal circumstances as I come across articles and posts I want to bookmark.
I installed the wallabagger Chrome extension and the Android app wallabag, aka "In The Poche". Both seem to work very well and allow me to easily save URLs to wallabag.
For both the Chrome extension and the Android app I had to create OAuth credentials as they're effectively API clients. wallabag has a very well thought out API, which is protected via OAuth.
Here they are listed in the API clients management section of the configuration part of the UI (you can also see a third client "Command line" which I created for the migration work which is described next):
Migrating existing bookmarks
In fact, it was the API that made it easy for me to bring across all the URLs I'd bookmarked in my existing "url-notes" repo based solution, specifically using the POST
method on the /api/entries
API endpoint.
This endpoint offers a number of parameters, the main one being url
to specify the article's URL, but I used another two to store some context (that the entry was originally in this "url-notes" system):
origin_url
- where I specified the "url-notes" repo issue URL (which contains the issue number which is what I wanted)tags
- where I specified the tagurl-notes
to indicate where the bookmark came from
I didn't want hundreds of unique issue number tags, hence the split between these two parameters.
You can actually see the url-notes
tag on the first three of the items in the wallabag UI screenshot earlier, from some initial manual API call testing.
As this was going to be a one-off exercise, I didn't bother to write a permanent script, opting instead to just perform the steps on the command line using the power of the shell, trying to channel the style of the great Brian Kernighan in the awesome AT&T Archives: The UNIX Operating System video, with my legs resting on the desk and the keyboard on my lap:
Retrieving the "url-notes" issue based bookmarks
I used the excellent GitHub CLI tool gh to retrieve the details of the open issues, saved them and peeked at the first few to check everything was in order:
gh issue list --json body,url --limit 1000 \
| tee issues.json \
| jq '.[:3]'
This created issues.json
with all entries, and also emitted:
[
{
"body": "https://secondphase.com.au/seven-reasons-sap-tech-failing/",
"url": "https://github.com/qmacro-org/url-notes/issues/564"
},
{
"body": "https://github.blog/open-source/git/working-with-submodules/",
"url": "https://github.com/qmacro-org/url-notes/issues/563"
},
{
"body": "https://piannaf.com/blog/self-hosted-twitter-archive-with-github-pages-subdomain/",
"url": "https://github.com/qmacro-org/url-notes/issues/562"
}
]
At this point I had everything I needed from a source data perspective.
Generating an access token for the API calls
As mentioned earlier, the API is protected via OAuth and so I created a new client definition using the API clients management section of the UI, and called it "Command line" (you can see the entry in the screenshot earlier).
The creation gave me a client ID and client secret with which I requested an access token:
curl \
--url 'https://wallabag.secret.ts.net/oauth/v2/token' \
--data grant_type=password \
--data client_id=3_5r9nw8bucwcos... \
--data client_secret=nrn7junh0pw4sgo... \
--data username=dj \
--data-urlencode password='supersekrit'
This returned something like this:
{
"access_token": "N2RkM2FjZmRkM...",
"expires_in": 3600,
"token_type": "bearer",
"scope": null,
"refresh_token": "ODg4NjA2YTMyN2..."
}
The one hour validity (3600 seconds) was plenty of time but it was nice to get a refresh token too in case I needed it.
Feeding in the bookmarks
At this point I had everything ready, and it was just a case of looping through the bookmarks in issues.json
and making an API call for each one, to load into wallabag:
jq -r '.[]|[.body,.url]|@tsv' issues.json \
| while read -r url origin_url
do
curl \
--url 'https://wallabag.secret.ts.net/api/entries.json' \
--header 'Authorization: Bearer N2RkM2FjZmRk...' \
--data url="$url" \
--data origin_url="$origin_url" \
--data tags='url-notes'
sleep 0.5
done
A short while later, they were all loaded, all 400+ of them:
Tidying up
I wanted to close all the issues I migrated, which I did like this:
jq -r '.[].url' issues.json \
| while read -r issue
do
gh issue close $issue
sleep 1
done
The gh issue close
command accepts an issue number or URL, which is great. The output is also nice and clean, which is pleasing to look at too:
✓ Closed issue #564 (The Seven Reasons Your SAP Tech Initiatives Are Failing — Second Phase Solutions)
✓ Closed issue #563 (Working with submodules - The GitHub Blog)
✓ Closed issue #562 (Self Hosting Twitter Archive on Github Pages | Justin Mancinelli)
...
I could have done this in the same loop as when feeding in the bookmarks, dependent on the outcome of the API calls, but in the end I did it separately.
Wrapping up
So far I've been very impressed with wallabag - its functionality, the support for running in a Docker container, the documentation and the API too. Now all I have to do is actually get round to reading those bookmarked articles!