feat(emacs): incorporate Vosk blog post

2025-12-10 19:23:03 +03:00 · 2022-09-18 18:33:36 +03:00 · 2022-09-18 18:33:36 +03:00 · 1d3e2940b4
commit 1d3e2940b4
parent ca3b1781eb
2 changed files with 99 additions and 15 deletions
--- a/.emacs.d/init.el
+++ b/.emacs.d/init.el
@ -2900,7 +2900,7 @@ Returns (<buffer> . <workspace-index>) or nil."
            ,@project-files))
    (setq org-refile-targets
          `(,@(mapcar
-               (lambda (f) `(,f . (:level . 1)))
+               (lambda (f) `(,f . (:level . 2)))
               project-files)
            ,@(mapcar
               (lambda (f) `(,f . (:tag . "refile")))
@ -2963,7 +2963,7 @@ Returns (<buffer> . <workspace-index>) or nil."
        (format-time-string "%Y-%m-%d" scheduled)
      "")))

-(setq org-agenda-hide-tags-regexp (rx (or "org" "log" "log_here")))
+(setq org-agenda-hide-tags-regexp (rx (or "org" "log" "log_here" "refile")))

 (setq org-agenda-custom-commands
      `(("p" "My outline"
@ -4661,9 +4661,9 @@ by the `my/elfeed-youtube-subtitles' function."
  "Path to the `podcasts-vosk' script folder.")

 (defun my/invoke-vosk (input output)
-  "Extract subtitles from audio file.
+  "Extract subtitles from the audio file.

-INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
+INPUT is the audio file, OUTPUT is the path to the resulting SRT file."
  (interactive
   (list
    (read-file-name "Input file: " nil nil t)
@ -4714,6 +4714,8 @@ INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
                      my/elfeed-vosk-podcast-files-directory
                      file-name))))
    (message "Download started")
+    (unless (file-exists-p my/elfeed-vosk-podcast-files-directory)
+      (mkdir my/elfeed-vosk-podcast-files-directory))
    (request url
      :type "GET"
      :encoding 'binary
@ -4732,7 +4734,7 @@ INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
         (message "Error!: %S" error-thrown))))))

 (defun my/elfeed-vosk-get-transcript (entry)
-  "Retrieve transcript from the current elfeed ENTRY."
+  "Retrieve transcript for the enclosure of the current elfeed ENTRY."
  (interactive (list elfeed-show-entry))
  (let ((enclosure (caar (elfeed-entry-enclosures entry))))
    (unless enclosure
@ -4747,6 +4749,9 @@ INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
        (my/elfeed-vosk-get-transcript-new enclosure srt-path)))))

 (defun my/elfeed-vosk-subed (entry)
+  "Run MPV for the current Vosk-generated subtitles file.
+
+ENTRY is an instance of `elfeed-entry'."
  (interactive (list elfeed-show-entry))
  (unless entry
    (user-error "No entry!"))
--- a/Emacs.org
+++ b/Emacs.org
@ -4031,7 +4031,7 @@ Used files:
            ,@project-files))
    (setq org-refile-targets
          `(,@(mapcar
-               (lambda (f) `(,f . (:level . 1)))
+               (lambda (f) `(,f . (:level . 2)))
               project-files)
            ,@(mapcar
               (lambda (f) `(,f . (:tag . "refile")))
@ -4165,7 +4165,7 @@ Some custom agendas to fit my workflow.
        (format-time-string "%Y-%m-%d" scheduled)
      "")))

-(setq org-agenda-hide-tags-regexp (rx (or "org" "log" "log_here")))
+(setq org-agenda-hide-tags-regexp (rx (or "org" "log" "log_here" "refile")))

 (setq org-agenda-custom-commands
      `(("p" "My outline"
@ -6546,9 +6546,37 @@ by the `my/elfeed-youtube-subtitles' function."

 Keep in mind that this function has to be launched inside the buffer opened by the =my/elfeed-youtube-subtitles= function.
 *** Podcast transcripts
-Occasionally I want to have a text version of a podcast, for instance to take some notes.
+In my experience, finding something in a podcast is particularly troublesome. For example, occasionally I want to refer to some line in the podcast to make an [[https://github.com/org-roam/org-roam][org-roam]] node, e.g. I want to check that I got that part right.

-In order do do that, I've made a [[https://github.com/SqrtMinusOne/podcasts-vosk][small script]] that uses the [[https://alphacephei.com/vosk/][Vosk speech recognition toolkit]] to extract subtitles from an audio file. Here's a function to invoke that script.
+And I have no reasonable way to get there because audio files in themselves don't allow for [[https://en.wikipedia.org/wiki/Random_access][random access]], i.e. there are no "landmarks" that point to this or that portion of the file. At least if nothing like a transcript is available.
+
+For obvious reasons, podcasts rarely ship with transcripts. So in this post, I'll be using a speech recognition engine to make up for that. A generated transcript is not quite as good as a manually written one, but for the purpose of finding a fragment of a known podcast, it works well enough.
+
+The general idea is to get the podcast info from [[https://github.com/skeeto/elfeed][elfeed]], process it with [[https://github.com/alphacep/vosk-api][vosk-api]] and feed it to [[https://github.com/sachac/subed][subed]] to control the playback in [[https://mpv.io/][MPV]].
+
+**** Vosk API
+After some search, I found [[https://github.com/alphacep/vosk-api][Vosk API]], an offline speech recognition toolkit.
+
+I want to make a program that receives an audio file and outputs an [[https://en.wikipedia.org/wiki/SubRip][SRT]] file. Vosk provides bindings to different languages, of which I choose Python because... reasons.
+
+So, with the help of kindly provided [[https://github.com/alphacep/vosk-api/tree/master/python/example][examples]] of how to use the Python API, the resulting script is [[https://github.com/SqrtMinusOne/podcasts-vosk][available here]]. Except Vosk, the script uses [[https://click.palletsprojects.com/en/8.1.x/][click]] to make a simple CLI, a library aptly called [[https://github.com/cdown/srt][srt]] to make srt files, and [[https://ffmpeg.org/][ffmpeg]].
+
+Another piece we need is a speech recognition model, some of which you can download [[https://alphacephei.com/vosk/models][on their website]]. I chose a small English model called =vosk-model-small-en-us-0.15= because all my podcasts are in English and also because larger models are much slower.
+
+Now that we have the script and the model, we need to create a virtual environment. Somehow I couldn't install the =vosk= package with [[https://docs.conda.io/en/latest/][conda]], but the Guix version of Python with =virtualenv= worked just fine:
+#+begin_src bash :eval no
+python3 -m virtualenv venv
+source venv/bin/activate
+pip install -r requirements.txt
+#+end_src
+
+After which the script can be used as follows:
+#+begin_src bash
+python main.py --file-path <path-to-file> --model-path ./model-small --save-path <path-to-subtitles-file>.srt
+#+end_src
+
+**** Running it from Emacs
+The next step is to run the script from Emacs. This is rather straightforward to do with [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Asynchronous-Processes.html][asyncronous processes]].

 #+begin_src emacs-lisp
 (defvar my/vosk-script-path
@ -6556,9 +6584,9 @@ In order do do that, I've made a [[https://github.com/SqrtMinusOne/podcasts-vosk
  "Path to the `podcasts-vosk' script folder.")

 (defun my/invoke-vosk (input output)
-  "Extract subtitles from audio file.
+  "Extract subtitles from the audio file.

-INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
+INPUT is the audio file, OUTPUT is the path to the resulting SRT file."
  (interactive
   (list
    (read-file-name "Input file: " nil nil t)
@ -6586,7 +6614,14 @@ INPUT is the audio file, OUTPUT is the part to the resulting SRT file."
                  (user-error "Error in Vosk API: %s" err)))))))))
 #+end_src

-In order to use that, we need to download the file first. So here's a function that extracts the file name from the URL:
+If run interactively, the defined function prompts for paths to both files.
+
+The process sentinel sends a [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Notifications.html][desktop notification]] because it's a bit more noticeable than =message=, and the process is expected to take some time.
+
+**** Integrating with elfeed
+To actually run the function from the section above, we need to download the file in question.
+
+So first, let's extract the file name from the URL:
 #+begin_src emacs-lisp
 (defun my/get-file-name-from-url (url)
  "Extract file name from the URL."
@ -6602,7 +6637,10 @@ In order to use that, we need to download the file first. So here's a function t
    match))
 #+end_src

-Now can use that to save the file and invoke the =my/invoke-vosk= function.
+I use a library called [[https://github.com/tkf/emacs-request][request.el]] to download files elsewhere, so I'll re-use it here. You can just as well invoke =curl= or =wget= via a asynchronous process.
+
+This function downloads the file to a non-temporary folder, which is =~/.elfeed/podcast-files/= if you didn't move the elfeed database. That is so because a permanently downloaded file works better for the next section.
+
 #+begin_src emacs-lisp
 (with-eval-after-load 'elfeed
  (defvar my/elfeed-vosk-podcast-files-directory
@ -6615,6 +6653,8 @@ Now can use that to save the file and invoke the =my/invoke-vosk= function.
                      my/elfeed-vosk-podcast-files-directory
                      file-name))))
    (message "Download started")
+    (unless (file-exists-p my/elfeed-vosk-podcast-files-directory)
+      (mkdir my/elfeed-vosk-podcast-files-directory))
    (request url
      :type "GET"
      :encoding 'binary
@ -6633,11 +6673,13 @@ Now can use that to save the file and invoke the =my/invoke-vosk= function.
         (message "Error!: %S" error-thrown))))))
 #+end_src

-And the final entrypoint, that opens up the SRT file is it's available, and queues the download if it's not.
+I also experimented with a bunch of options to write binary data in Emacs, of which the way with =write-region= (as implemented in [[https://github.com/rejeep/f.el][f.el]]) seems to be the fastest. [[https://emacs.stackexchange.com/questions/59449/how-do-i-save-raw-bytes-into-a-file][This thread on StackExchange]] suggests that it may screw some bytes towards the end, but whether or not this is the case, mp3 files survive the procedure. The proposed solution with =seq-doseq= takes at least a few seconds.
+
+Finally, we need a function to show the transcript if it exists or invoke =my/elfeed-vosk-get-transcript-new= if it doesn't. And this is the function that we'll call from an =elfeed-entry= buffer.

 #+begin_src emacs-lisp
 (defun my/elfeed-vosk-get-transcript (entry)
-  "Retrieve transcript from the current elfeed ENTRY."
+  "Retrieve transcript for the enclosure of the current elfeed ENTRY."
  (interactive (list elfeed-show-entry))
  (let ((enclosure (caar (elfeed-entry-enclosures entry))))
    (unless enclosure
@ -6652,8 +6694,17 @@ And the final entrypoint, that opens up the SRT file is it's available, and queu
        (my/elfeed-vosk-get-transcript-new enclosure srt-path)))))
 #+end_src

+**** Integrating with subed
+Now that we've produced a =.srt= file, we can use a package called [[https://github.com/sachac/subed][subed]] to control the playback, like I had done in the previous post.
+
+By the way, this wasn't the most straightforward thing to figure out, because the MPV window doesn't show up for an audio file, and the player itself starts in the paused state. So I thought nothing was happening until I enabled the debug log.
+
+With that in mind, here's a function to launch MPV from the buffer generated by =my/elfeed-vosk-get-transcript=:
 #+begin_src emacs-lisp
 (defun my/elfeed-vosk-subed (entry)
+  "Run MPV for the current Vosk-generated subtitles file.
+
+ENTRY is an instance of `elfeed-entry'."
  (interactive (list elfeed-show-entry))
  (unless entry
    (user-error "No entry!"))
@ -6667,6 +6718,20 @@ And the final entrypoint, that opens up the SRT file is it's available, and queu
  (subed-mpv--play subed-mpv-video-file))
 #+end_src

+After running =M-x my/elfeed-vosk-subed=, run =M-x subed-toggle-loop-over-current-subtitle= (=C-c C-l=), because somehow it's turned on by default, and =M-x subed-toggle-pause-while-typing= (=C-c C-p=), because sometimes this made my instance of MPV lag.
+
+After that, =M-x subed-mpv-toggle-pause= should start the playback, which you can control by moving the cursor in the buffer.
+
+You can also run =M-x subed-toggle-sync-point-to-player= (=C-c .=) to toggle syncing the point in the buffer to the currently played subtitle (this automatically gets disabled when you switch buffers).
+
+Running =M-x subed-toggle-sync-player-to-point= (=C-c ,=) does the opposite, i.e. sets the player position to the subtitle under point. These two functions are useful since the MPV window controls aren't available.
+
+**** Some observations
+So, the functions above work for my purposes.
+
+I think it should be possible to get transcripts of better quality by using a better speech recognition model, adding a speaker detection model and a model to restore case & punctuation. But it seems to be harder to implement, and this would take more time and resources. On my PC, the smallest Vosk model runs maybe 10 times faster than the playback time, which is still a few minutes for an hour-long podcast. Waiting longer is probably not worth it.
+
+Also, technically MPV can stream files without downloading them, and it's even possible to feed stream data into Vosk. But MPV isn't particularly good at seeking in streamed files, at least not with my Internet connection.
 ** Internet & Multimedia
 *** Notmuch
 My notmuch config now resides in [[file:Mail.org][Mail.org]].
@ -7488,6 +7553,20 @@ My package for doing Pomodoro timer.
  :straight t
  :after (hledger-mode))
 #+end_src
+
+Here are some usage notes.
+
+The fastest way to enter new entiries to the journal is by running =hledger add=
+
+Then, run =hledger bs= to check whether the balance sheet matches the ground truth (e.g. the bank UI).
+
+If it doesn't the simplest way to check for the differences is by running =hledger register <item>=.
+
+Here are some interesting commands to run:
+- =hledger incomestatement <query>=, where =<query>= is the account prefix. e.g. =expenses= or =revenues=.
+  - add =--pivot=payee= to get grouping by transaction descriptions
+  - add =-B= to cast currencies
+
 *** Calendar
 Emacs' built-in calendar. Can even calculate sunrise and sunset times.