feat(gource): add

This commit is contained in:
Pavel Korytov 2023-01-02 20:37:06 +03:00
parent e1782cd204
commit 990514b50f
4 changed files with 580 additions and 0 deletions

View file

@ -0,0 +1,302 @@
+++
title = "Running Gource with Emacs"
author = ["Pavel Korytov"]
date = 2023-01-02
tags = ["emacs"]
draft = false
+++
{{< figure src="/images/gource/gource.png" >}}
[Gource](https://gource.io/) is a program that draws an animated graph of users changing the repository over time.
Although it can work without extra effort (just run `gource` in a [git](https://git-scm.com/) repo), there are some tweaks that can be done:
- Gource supports using custom pictures for users. [Gravatar](https://en.gravatar.com/) is an obvious place to get these.
- Occasionally, the same people have different names and/or emails in history.<br />
It may happen when people use forges like [GitLab](https://gitlab.com/) or just have different settings on different machines. It would be nice to merge these names.
- Visualizing the history of multiple repositories (e.g. frontend and backend) requires combining multiple gource logs.
So, why not try doing that with Emacs?
## Gravatars {#gravatars}
Much to my surprise, Emacs turned out to have a built-in package called [gravatar.el](https://github.com/emacs-mirror/emacs/blob/master/lisp/image/gravatar.el).
So, let's make a function to retrieve a gravatar and save it:
```emacs-lisp
(defun my/gravatar-retrieve-sync (email file-name)
"Get gravatar for EMAIL and save it to FILE-NAME."
(let ((gravatar-default-image "identicon")
(gravatar-size nil)
(coding-system-for-write 'binary)
(write-region-annotate-functions nil)
(write-region-post-annotation-function nil))
(write-region
(image-property (gravatar-retrieve-synchronously email) :data)
nil file-name nil :silent)))
```
To use these images, we need to save them to some folder and use usernames as file names. The folder:
```emacs-lisp
(setq my/gravatar-folder "/home/pavel/.cache/gravatars/")
```
And the function that downloads a gravatar if necessary:
```emacs-lisp
(defun my/gravatar-save (email author)
"Download gravatar for EMAIL.
AUTHOR is the username."
(let ((file-name (concat my/gravatar-folder author ".png")))
(mkdir my/gravatar-folder t)
(unless (file-exists-p file-name)
(message "Fetching gravatar for %s (%s)" author email)
(my/gravatar-retrieve-sync email file-name))))
```
## Merging authors {#merging-authors}
Now to merging authors.
Gource itself uses only usernames (without emails), but we can use `git log` to get both. The required information can be extracted like that:
```bash
git log --pretty=format:"%ae|%an" | sort | uniq -c | sed "s/^[ \t]*//;s/ /|/"
```
The output is a list of pipe-separated strings, where the values are:
- Number of occurrences for this combination of username and email
- Email
- Username
Of course, that part would have to be changed appropriately for other version control systems if you happen to use one.
So, below is one hell of a function that wraps this command and tries to merge emails and usernames belonging to one author:
```emacs-lisp
(defun my/git-get-authors (repo &optional authors-init)
"Extract and merge all combinations of authors & emails from REPO.
REPO is the path to a git repository.
AUTHORS-INIT is the previous output of `my/git-get-authors'. It can
be used to extract that information from multiple repositories.
The output is a list of alists with following keys:
- emails: list of (<email> . <count>)
- authors: list of (<username> . <count>)
- email: the most popular email
- author: the most popular username
I.e. one alist is all emails and usernames of one author."
(let* ((default-directory repo)
(data (shell-command-to-string
"git log --pretty=format:\"%ae|%an\" | sort | uniq -c | sed \"s/^[ \t]*//;s/ /|/\""))
(authors
(cl-loop for string in (split-string data "\n")
if (= (length (split-string string "|")) 3)
collect (let ((datum (split-string string "|")))
`((count . ,(string-to-number (nth 0 datum)))
(email . ,(downcase (nth 1 datum)))
(author . ,(nth 2 datum)))))))
(mapcar
(lambda (datum)
(setf (alist-get 'author datum)
(car (cl-reduce
(lambda (acc author)
(if (> (cdr author) (cdr acc))
author
acc))
(alist-get 'authors datum)
:initial-value '(nil . -1))))
(setf (alist-get 'email datum)
(car (cl-reduce
(lambda (acc email)
(if (> (cdr email) (cdr acc))
email
acc))
(alist-get 'emails datum)
:initial-value '(nil . -1))))
datum)
(cl-reduce
(lambda (acc val)
(let* ((author (alist-get 'author val))
(email (alist-get 'email val))
(count (alist-get 'count val))
(saved-value
(seq-find
(lambda (cand)
(or (alist-get email (alist-get 'emails cand)
nil nil #'string-equal)
(alist-get author (alist-get 'authors cand)
nil nil #'string-equal)
(alist-get email (alist-get 'authors cand)
nil nil #'string-equal)
(alist-get author (alist-get 'emails cand)
nil nil #'string-equal)))
acc)))
(if saved-value
(progn
(if (alist-get email (alist-get 'emails saved-value)
nil nil #'string-equal)
(cl-incf (alist-get email (alist-get 'emails saved-value)
nil nil #'string-equal)
count)
(push (cons email count) (alist-get 'emails saved-value)))
(if (alist-get author (alist-get 'authors saved-value)
nil nil #'string-equal)
(cl-incf (alist-get author (alist-get 'authors saved-value)
nil nil #'string-equal)
count)
(push (cons author count) (alist-get 'authors saved-value))))
(setq saved-value
(push `((emails . ((,email . ,count)))
(authors . ((,author . ,count))))
acc)))
acc))
authors
:initial-value authors-init))))
```
Despite the probable we-enjoy-typing-ness of the implementation, it's actually pretty simple:
- The output of `git log` is parsed into a list of alists with `count`, `email` and `author` as keys.
- This list is reduced by `cl-reduce` into a list of alists with `emails` and `authors` as keys and the respective counts as values, e.g. `((<email-1> . 1) (<email-2> . 3))`.<br />
I've seen a couple of cases where people would swap their username and email (lol), so `seq-find` also looks for an email in the list of authors and vice versa.
- The `mapcar` call determines the most popular email and username for each authors.
The output is another list of alists, now with the following keys:
- `emails` - list of elements like `(<email> . <count>)`
- `authors` - list of elements like `(<author-name> . <count>)`
- `email` - the most popular email
- `author` - the most popular username.
## Running for multiple repos {#running-for-multiple-repos}
This section was mostly informed by [this page](https://github.com/acaudwell/Gource/wiki/Visualizing-Multiple-Repositories) in the [gource wiki](https://github.com/acaudwell/Gource/wiki).
As I said above, by default `gource` just creates a visualization for the current repo. To change something in it, we need to invoke the program like that: `gource --output-custom-log PATH`, where `PATH` is either the path to the log file or `-` for stdout.
The log consists of lines of pipe-separated strings, e.g.:
```text
1600769568|dsofronov|A|/studentor/.dockerignore
1600769568|dsofronov|A|/studentor/.editorconfig
1600769568|dsofronov|A|/studentor/.flake8
1600769568|dsofronov|A|/studentor/.gitignore
```
where the values of one line are:
- UNIX timestamp
- Author name
- `A` for add, `M` for modify, and `D` for delete
- Path to file
The file has to be sorted by the timestamp in ascending order.
So, the function that prepares the log for one repository:
```emacs-lisp
(defun my/gource-prepare-log (repo authors)
"Create gource log string for REPO.
AUTHORS is the output of `my/git-get-authors'."
(let ((log (shell-command-to-string
(concat
"gource --output-custom-log - "
repo)))
(authors-mapping (make-hash-table :test #'equal))
(prefix (file-name-base repo)))
(cl-loop for author-datum in authors
for author = (alist-get 'author author-datum)
do (my/gravatar-save (alist-get 'email author-datum) author)
do (cl-loop for other-author in (alist-get 'authors author-datum)
unless (string-equal (car other-author) author)
do (puthash (car other-author) author
authors-mapping)))
(cl-loop for line in (split-string log "\n")
concat (let ((fragments (split-string line "|")))
(when (> (length fragments) 3)
(when-let (mapped-author (gethash (nth 1 fragments)
authors-mapping))
(setf (nth 1 fragments) mapped-author))
(setf (nth 3 fragments)
(concat "/" prefix (nth 3 fragments))))
(string-join fragments "|"))
concat "\n")))
```
This function:
- Downloads a gravatar for each author
- Replaces all usernames of one author with the most frequent one
- Prepends the file path with the repository name.
The output is a string in the gource log format as described above.
Finally, as we need to invoke all of this for multiple repositories, why not do that with [dired](https://www.gnu.org/software/emacs/manual/html_node/emacs/Dired.html):
```emacs-lisp
(defun my/gource-dired-create-logs (repos log-name)
"Create combined gource log for REPOS.
REPOS is a list of strings, where a string is a path to a git repo.
LOG-NAME is the path to the resulting log file.
This function is meant to be invoked from `dired', where the required
repositories are marked."
(interactive (list (or (dired-get-marked-files nil nil #'file-directory-p)
(user-error "Select at least one directory"))
(read-file-name "Log file name: " nil "combined.log")))
(let ((authors
(cl-reduce
(lambda (acc repo)
(my/git-get-authors repo acc))
repos
:initial-value nil)))
(with-temp-file log-name
(insert
(string-join
(seq-filter
(lambda (line)
(not (string-empty-p line)))
(seq-sort-by
(lambda (line)
(if-let (time (car (split-string line "|")))
(string-to-number time)
0))
#'<
(split-string
(mapconcat
(lambda (repo)
(my/gource-prepare-log repo authors))
repos "\n")
"\n")))
"\n")))))
```
This function extracts authors from each repository and merges the logs as required by gource, that is sorting the result by time in ascending order.
## Using the function {#using-the-function}
To use the function above, mark the required repos in a dired buffer and run `M-x my/gource-dired-create-logs`. This also works nicely with [dired-subtree](https://github.com/Fuco1/dired-hacks), in case your repos are located in different folders.
The function will create a combined log file (by default `combined.log`). To visualize the log, run:
```bash
gource <log-file> --user-image-dir <path-to-gravatars>
```
Check the [README](https://github.com/acaudwell/Gource) for possible parameters, such as the speed of visualization, different elements, etc. That's it!
I thought about making something like a [transient.el](https://github.com/magit/transient) wrapper around the `gource` command but figured it wasn't worth the effort for something that I run just a handful of times in a year.

278
org/2023-01-02-gource.org Normal file
View file

@ -0,0 +1,278 @@
#+HUGO_SECTION: posts
#+HUGO_BASE_DIR: ../
#+TITLE: Running Gource with Emacs
#+DATE: 2023-01-02
#+HUGO_TAGS: emacs
#+HUGO_DRAFT: false
[[./static/images/gource/gource.png]]
[[https://gource.io/][Gource]] is a program that draws an animated graph of users changing the repository over time.
Although it can work without extra effort (just run =gource= in a [[https://git-scm.com/][git]] repo), there are some tweaks that can be done:
- Gource supports using custom pictures for users. [[https://en.gravatar.com/][Gravatar]] is an obvious place to get these.
- Occasionally, the same people have different names and/or emails in history.\\
It may happen when people use forges like [[https://gitlab.com/][GitLab]] or just have different settings on different machines. It would be nice to merge these names.
- Visualizing the history of multiple repositories (e.g. frontend and backend) requires combining multiple gource logs.
So, why not try doing that with Emacs?
* Gravatars
Much to my surprise, Emacs turned out to have a built-in package called [[https://github.com/emacs-mirror/emacs/blob/master/lisp/image/gravatar.el][gravatar.el]].
So, let's make a function to retrieve a gravatar and save it:
#+begin_src emacs-lisp
(defun my/gravatar-retrieve-sync (email file-name)
"Get gravatar for EMAIL and save it to FILE-NAME."
(let ((gravatar-default-image "identicon")
(gravatar-size nil)
(coding-system-for-write 'binary)
(write-region-annotate-functions nil)
(write-region-post-annotation-function nil))
(write-region
(image-property (gravatar-retrieve-synchronously email) :data)
nil file-name nil :silent)))
#+end_src
To use these images, we need to save them to some folder and use usernames as file names. The folder:
#+begin_src emacs-lisp
(setq my/gravatar-folder "/home/pavel/.cache/gravatars/")
#+end_src
And the function that downloads a gravatar if necessary:
#+begin_src emacs-lisp
(defun my/gravatar-save (email author)
"Download gravatar for EMAIL.
AUTHOR is the username."
(let ((file-name (concat my/gravatar-folder author ".png")))
(mkdir my/gravatar-folder t)
(unless (file-exists-p file-name)
(message "Fetching gravatar for %s (%s)" author email)
(my/gravatar-retrieve-sync email file-name))))
#+end_src
* Merging authors
Now to merging authors.
Gource itself uses only usernames (without emails), but we can use =git log= to get both. The required information can be extracted like that:
#+begin_src bash
git log --pretty=format:"%ae|%an" | sort | uniq -c | sed "s/^[ \t]*//;s/ /|/"
#+end_src
The output is a list of pipe-separated strings, where the values are:
- Number of occurrences for this combination of username and email
- Email
- Username
Of course, that part would have to be changed appropriately for other version control systems if you happen to use one.
So, below is one hell of a function that wraps this command and tries to merge emails and usernames belonging to one author:
#+begin_src emacs-lisp
(defun my/git-get-authors (repo &optional authors-init)
"Extract and merge all combinations of authors & emails from REPO.
REPO is the path to a git repository.
AUTHORS-INIT is the previous output of `my/git-get-authors'. It can
be used to extract that information from multiple repositories.
The output is a list of alists with following keys:
- emails: list of (<email> . <count>)
- authors: list of (<username> . <count>)
- email: the most popular email
- author: the most popular username
I.e. one alist is all emails and usernames of one author."
(let* ((default-directory repo)
(data (shell-command-to-string
"git log --pretty=format:\"%ae|%an\" | sort | uniq -c | sed \"s/^[ \t]*//;s/ /|/\""))
(authors
(cl-loop for string in (split-string data "\n")
if (= (length (split-string string "|")) 3)
collect (let ((datum (split-string string "|")))
`((count . ,(string-to-number (nth 0 datum)))
(email . ,(downcase (nth 1 datum)))
(author . ,(nth 2 datum)))))))
(mapcar
(lambda (datum)
(setf (alist-get 'author datum)
(car (cl-reduce
(lambda (acc author)
(if (> (cdr author) (cdr acc))
author
acc))
(alist-get 'authors datum)
:initial-value '(nil . -1))))
(setf (alist-get 'email datum)
(car (cl-reduce
(lambda (acc email)
(if (> (cdr email) (cdr acc))
email
acc))
(alist-get 'emails datum)
:initial-value '(nil . -1))))
datum)
(cl-reduce
(lambda (acc val)
(let* ((author (alist-get 'author val))
(email (alist-get 'email val))
(count (alist-get 'count val))
(saved-value
(seq-find
(lambda (cand)
(or (alist-get email (alist-get 'emails cand)
nil nil #'string-equal)
(alist-get author (alist-get 'authors cand)
nil nil #'string-equal)
(alist-get email (alist-get 'authors cand)
nil nil #'string-equal)
(alist-get author (alist-get 'emails cand)
nil nil #'string-equal)))
acc)))
(if saved-value
(progn
(if (alist-get email (alist-get 'emails saved-value)
nil nil #'string-equal)
(cl-incf (alist-get email (alist-get 'emails saved-value)
nil nil #'string-equal)
count)
(push (cons email count) (alist-get 'emails saved-value)))
(if (alist-get author (alist-get 'authors saved-value)
nil nil #'string-equal)
(cl-incf (alist-get author (alist-get 'authors saved-value)
nil nil #'string-equal)
count)
(push (cons author count) (alist-get 'authors saved-value))))
(setq saved-value
(push `((emails . ((,email . ,count)))
(authors . ((,author . ,count))))
acc)))
acc))
authors
:initial-value authors-init))))
#+end_src
Despite the probable we-enjoy-typing-ness of the implementation, it's actually pretty simple:
- The output of =git log= is parsed into a list of alists with =count=, =email= and =author= as keys.
- This list is reduced by =cl-reduce= into a list of alists with =emails= and =authors= as keys and the respective counts as values, e.g. =((<email-1> . 1) (<email-2> . 3))=.\\
I've seen a couple of cases where people would swap their username and email (lol), so =seq-find= also looks for an email in the list of authors and vice versa.
- The =mapcar= call determines the most popular email and username for each authors.
The output is another list of alists, now with the following keys:
- =emails= - list of elements like =(<email> . <count>)=
- =authors= - list of elements like =(<author-name> . <count>)=
- =email= - the most popular email
- =author= - the most popular username.
* Running for multiple repos
This section was mostly informed by [[https://github.com/acaudwell/Gource/wiki/Visualizing-Multiple-Repositories][this page]] in the [[https://github.com/acaudwell/Gource/wiki][gource wiki]].
As I said above, by default =gource= just creates a visualization for the current repo. To change something in it, we need to invoke the program like that: =gource --output-custom-log PATH=, where =PATH= is either the path to the log file or =-= for stdout.
The log consists of lines of pipe-separated strings, e.g.:
#+begin_example
1600769568|dsofronov|A|/studentor/.dockerignore
1600769568|dsofronov|A|/studentor/.editorconfig
1600769568|dsofronov|A|/studentor/.flake8
1600769568|dsofronov|A|/studentor/.gitignore
#+end_example
where the values of one line are:
- UNIX timestamp
- Author name
- =A= for add, =M= for modify, and =D= for delete
- Path to file
The file has to be sorted by the timestamp in ascending order.
So, the function that prepares the log for one repository:
#+begin_src emacs-lisp
(defun my/gource-prepare-log (repo authors)
"Create gource log string for REPO.
AUTHORS is the output of `my/git-get-authors'."
(let ((log (shell-command-to-string
(concat
"gource --output-custom-log - "
repo)))
(authors-mapping (make-hash-table :test #'equal))
(prefix (file-name-base repo)))
(cl-loop for author-datum in authors
for author = (alist-get 'author author-datum)
do (my/gravatar-save (alist-get 'email author-datum) author)
do (cl-loop for other-author in (alist-get 'authors author-datum)
unless (string-equal (car other-author) author)
do (puthash (car other-author) author
authors-mapping)))
(cl-loop for line in (split-string log "\n")
concat (let ((fragments (split-string line "|")))
(when (> (length fragments) 3)
(when-let (mapped-author (gethash (nth 1 fragments)
authors-mapping))
(setf (nth 1 fragments) mapped-author))
(setf (nth 3 fragments)
(concat "/" prefix (nth 3 fragments))))
(string-join fragments "|"))
concat "\n")))
#+end_src
This function:
- Downloads a gravatar for each author
- Replaces all usernames of one author with the most frequent one
- Prepends the file path with the repository name.
The output is a string in the gource log format as described above.
Finally, as we need to invoke all of this for multiple repositories, why not do that with [[https://www.gnu.org/software/emacs/manual/html_node/emacs/Dired.html][dired]]:
#+begin_src emacs-lisp
(defun my/gource-dired-create-logs (repos log-name)
"Create combined gource log for REPOS.
REPOS is a list of strings, where a string is a path to a git repo.
LOG-NAME is the path to the resulting log file.
This function is meant to be invoked from `dired', where the required
repositories are marked."
(interactive (list (or (dired-get-marked-files nil nil #'file-directory-p)
(user-error "Select at least one directory"))
(read-file-name "Log file name: " nil "combined.log")))
(let ((authors
(cl-reduce
(lambda (acc repo)
(my/git-get-authors repo acc))
repos
:initial-value nil)))
(with-temp-file log-name
(insert
(string-join
(seq-filter
(lambda (line)
(not (string-empty-p line)))
(seq-sort-by
(lambda (line)
(if-let (time (car (split-string line "|")))
(string-to-number time)
0))
#'<
(split-string
(mapconcat
(lambda (repo)
(my/gource-prepare-log repo authors))
repos "\n")
"\n")))
"\n")))))
#+end_src
This function extracts authors from each repository and merges the logs as required by gource, that is sorting the result by time in ascending order.
* Using the function
To use the function above, mark the required repos in a dired buffer and run =M-x my/gource-dired-create-logs=. This also works nicely with [[https://github.com/Fuco1/dired-hacks][dired-subtree]], in case your repos are located in different folders.
The function will create a combined log file (by default =combined.log=). To visualize the log, run:
#+begin_src bash
gource <log-file> --user-image-dir <path-to-gravatars>
#+end_src
Check the [[https://github.com/acaudwell/Gource][README]] for possible parameters, such as the speed of visualization, different elements, etc. That's it!
I thought about making something like a [[https://github.com/magit/transient][transient.el]] wrapper around the =gource= command but figured it wasn't worth the effort for something that I run just a handful of times in a year.

Binary file not shown.

After

Width:  |  Height:  |  Size: 590 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 590 KiB