It’s nice sometimes being able to listen instead of reading, isn’t? If only there was a way to automagically be able to do that for all blog posts 🤔️

Reading without saying a word

Inspired by a blog post about setting up Mimic I wondered how feasible it would be to provide an audio version of every blog post. Obviously making the time to read it myself would be too easy. And I kind of love to automate things. Better yet I can have it read by the familiar voice of none other than Alan Pope who happened to provide speech samples for Mycroft.

Your wish is my command-line

To start with I familiarized myself with the command-line. Blog posts written in Markdown are basically text files later processed by Hugo. Maybe there’s a way to simply feed those files into Mimic? Installing the command via pip is straightforward:

pip install markdown mycroft-mimic3-tts[all]
mimic3 --voice en_UK/apope_low < content/post/2022-09-02-publishing-re-usable-actions.md

Note that you need to pipe the content into the command, otherwise all you hear is the spoken version of the filename 🙄️. And that basically works although metadata will be read out with it. So maybe it’s a good idea to only read the content.

Adding the audio to a blog post is actually easy if you don’t mind raw HTML:

<audio controls src="../2010-10-10-my-blog-post.wav"><a href="../2010-10-10-my-blog-post.wav">Download audio</a></audio>

If the web browser doesn’t know how to play audio a download link will be shown.

Time to write some Python code

Tempting as it is to use sed to remove unwanted markup I also need a way to inject the filename into each post. Otherwise I would still have to edit all posts by hand. So maybe a few lines of code will solve this more cleanly:

controls = 'This is where the audio player goes'
content = re.sub('^---[\s\S]+?---', '', text)
content.replace('<!--more-->', '<!--more-->' + controls)

Getting rid of the front matter is easy. And the “more” marker is already at the right position so why not add the audio controls underneath. If you’re curious about how I strip the markdown from the source have a look at markless.py.

Reading the text with a human voice can also be done in just a handful of lines:

tts = Mimic3TextToSpeechSystem(Mimic3Settings())
tts.begin_utterance()
tts.speak_text("this is a test")
results = tts.end_utterance()

Almost pitch perfect

Using the script in an existing GitLab pipeline is trivial now:

before_script:
  - pip install markdown mycroft-mimic3-tts[all]
  - cp -r content generated
  - ./markdown.py

Note: [all] is optional. If you want to use languages other than English you can also specify the language you need in square brackets.

If you test this locally note that audio and blog posts will with the HTML snippet will end up in the folder generated. This is so the script can avoid touching the original files. However all content needs to be copied if you have other source files.

Update: The original script was using wave in Python which is easy but produces large, uncompressed files. I made a small change to use pydub which writes mp3 using ffmpeg behind the scenes!