diff --git a/content/posts/2023-01-02-gource.md b/content/posts/2023-01-02-gource.md new file mode 100644 index 0000000..ddfefc2 --- /dev/null +++ b/content/posts/2023-01-02-gource.md @@ -0,0 +1,302 @@ ++++ +title = "Running Gource with Emacs" +author = ["Pavel Korytov"] +date = 2023-01-02 +tags = ["emacs"] +draft = false ++++ + +{{< figure src="/images/gource/gource.png" >}} + +[Gource](https://gource.io/) is a program that draws an animated graph of users changing the repository over time. + +Although it can work without extra effort (just run `gource` in a [git](https://git-scm.com/) repo), there are some tweaks that can be done: + +- Gource supports using custom pictures for users. [Gravatar](https://en.gravatar.com/) is an obvious place to get these. +- Occasionally, the same people have different names and/or emails in history.
+ It may happen when people use forges like [GitLab](https://gitlab.com/) or just have different settings on different machines. It would be nice to merge these names. +- Visualizing the history of multiple repositories (e.g. frontend and backend) requires combining multiple gource logs. + +So, why not try doing that with Emacs? + + +## Gravatars {#gravatars} + +Much to my surprise, Emacs turned out to have a built-in package called [gravatar.el](https://github.com/emacs-mirror/emacs/blob/master/lisp/image/gravatar.el). + +So, let's make a function to retrieve a gravatar and save it: + +```emacs-lisp +(defun my/gravatar-retrieve-sync (email file-name) + "Get gravatar for EMAIL and save it to FILE-NAME." + (let ((gravatar-default-image "identicon") + (gravatar-size nil) + (coding-system-for-write 'binary) + (write-region-annotate-functions nil) + (write-region-post-annotation-function nil)) + (write-region + (image-property (gravatar-retrieve-synchronously email) :data) + nil file-name nil :silent))) +``` + +To use these images, we need to save them to some folder and use usernames as file names. The folder: + +```emacs-lisp +(setq my/gravatar-folder "/home/pavel/.cache/gravatars/") +``` + +And the function that downloads a gravatar if necessary: + +```emacs-lisp +(defun my/gravatar-save (email author) + "Download gravatar for EMAIL. + +AUTHOR is the username." + (let ((file-name (concat my/gravatar-folder author ".png"))) + (mkdir my/gravatar-folder t) + (unless (file-exists-p file-name) + (message "Fetching gravatar for %s (%s)" author email) + (my/gravatar-retrieve-sync email file-name)))) +``` + + +## Merging authors {#merging-authors} + +Now to merging authors. + +Gource itself uses only usernames (without emails), but we can use `git log` to get both. The required information can be extracted like that: + +```bash +git log --pretty=format:"%ae|%an" | sort | uniq -c | sed "s/^[ \t]*//;s/ /|/" +``` + +The output is a list of pipe-separated strings, where the values are: + +- Number of occurrences for this combination of username and email +- Email +- Username + +Of course, that part would have to be changed appropriately for other version control systems if you happen to use one. + +So, below is one hell of a function that wraps this command and tries to merge emails and usernames belonging to one author: + +```emacs-lisp +(defun my/git-get-authors (repo &optional authors-init) + "Extract and merge all combinations of authors & emails from REPO. + +REPO is the path to a git repository. + +AUTHORS-INIT is the previous output of `my/git-get-authors'. It can +be used to extract that information from multiple repositories. + +The output is a list of alists with following keys: +- emails: list of ( . ) +- authors: list of ( . ) +- email: the most popular email +- author: the most popular username +I.e. one alist is all emails and usernames of one author." + (let* ((default-directory repo) + (data (shell-command-to-string + "git log --pretty=format:\"%ae|%an\" | sort | uniq -c | sed \"s/^[ \t]*//;s/ /|/\"")) + (authors + (cl-loop for string in (split-string data "\n") + if (= (length (split-string string "|")) 3) + collect (let ((datum (split-string string "|"))) + `((count . ,(string-to-number (nth 0 datum))) + (email . ,(downcase (nth 1 datum))) + (author . ,(nth 2 datum))))))) + (mapcar + (lambda (datum) + (setf (alist-get 'author datum) + (car (cl-reduce + (lambda (acc author) + (if (> (cdr author) (cdr acc)) + author + acc)) + (alist-get 'authors datum) + :initial-value '(nil . -1)))) + (setf (alist-get 'email datum) + (car (cl-reduce + (lambda (acc email) + (if (> (cdr email) (cdr acc)) + email + acc)) + (alist-get 'emails datum) + :initial-value '(nil . -1)))) + datum) + (cl-reduce + (lambda (acc val) + (let* ((author (alist-get 'author val)) + (email (alist-get 'email val)) + (count (alist-get 'count val)) + (saved-value + (seq-find + (lambda (cand) + (or (alist-get email (alist-get 'emails cand) + nil nil #'string-equal) + (alist-get author (alist-get 'authors cand) + nil nil #'string-equal) + (alist-get email (alist-get 'authors cand) + nil nil #'string-equal) + (alist-get author (alist-get 'emails cand) + nil nil #'string-equal))) + acc))) + (if saved-value + (progn + (if (alist-get email (alist-get 'emails saved-value) + nil nil #'string-equal) + (cl-incf (alist-get email (alist-get 'emails saved-value) + nil nil #'string-equal) + count) + (push (cons email count) (alist-get 'emails saved-value))) + (if (alist-get author (alist-get 'authors saved-value) + nil nil #'string-equal) + (cl-incf (alist-get author (alist-get 'authors saved-value) + nil nil #'string-equal) + count) + (push (cons author count) (alist-get 'authors saved-value)))) + (setq saved-value + (push `((emails . ((,email . ,count))) + (authors . ((,author . ,count)))) + acc))) + acc)) + authors + :initial-value authors-init)))) +``` + +Despite the probable we-enjoy-typing-ness of the implementation, it's actually pretty simple: + +- The output of `git log` is parsed into a list of alists with `count`, `email` and `author` as keys. +- This list is reduced by `cl-reduce` into a list of alists with `emails` and `authors` as keys and the respective counts as values, e.g. `(( . 1) ( . 3))`.
+ I've seen a couple of cases where people would swap their username and email (lol), so `seq-find` also looks for an email in the list of authors and vice versa. +- The `mapcar` call determines the most popular email and username for each authors. + +The output is another list of alists, now with the following keys: + +- `emails` - list of elements like `( . )` +- `authors` - list of elements like `( . )` +- `email` - the most popular email +- `author` - the most popular username. + + +## Running for multiple repos {#running-for-multiple-repos} + +This section was mostly informed by [this page](https://github.com/acaudwell/Gource/wiki/Visualizing-Multiple-Repositories) in the [gource wiki](https://github.com/acaudwell/Gource/wiki). + +As I said above, by default `gource` just creates a visualization for the current repo. To change something in it, we need to invoke the program like that: `gource --output-custom-log PATH`, where `PATH` is either the path to the log file or `-` for stdout. + +The log consists of lines of pipe-separated strings, e.g.: + +```text +1600769568|dsofronov|A|/studentor/.dockerignore +1600769568|dsofronov|A|/studentor/.editorconfig +1600769568|dsofronov|A|/studentor/.flake8 +1600769568|dsofronov|A|/studentor/.gitignore +``` + +where the values of one line are: + +- UNIX timestamp +- Author name +- `A` for add, `M` for modify, and `D` for delete +- Path to file + +The file has to be sorted by the timestamp in ascending order. + +So, the function that prepares the log for one repository: + +```emacs-lisp +(defun my/gource-prepare-log (repo authors) + "Create gource log string for REPO. + +AUTHORS is the output of `my/git-get-authors'." + (let ((log (shell-command-to-string + (concat + "gource --output-custom-log - " + repo))) + (authors-mapping (make-hash-table :test #'equal)) + (prefix (file-name-base repo))) + (cl-loop for author-datum in authors + for author = (alist-get 'author author-datum) + do (my/gravatar-save (alist-get 'email author-datum) author) + do (cl-loop for other-author in (alist-get 'authors author-datum) + unless (string-equal (car other-author) author) + do (puthash (car other-author) author + authors-mapping))) + (cl-loop for line in (split-string log "\n") + concat (let ((fragments (split-string line "|"))) + (when (> (length fragments) 3) + (when-let (mapped-author (gethash (nth 1 fragments) + authors-mapping)) + (setf (nth 1 fragments) mapped-author)) + (setf (nth 3 fragments) + (concat "/" prefix (nth 3 fragments)))) + (string-join fragments "|")) + concat "\n"))) +``` + +This function: + +- Downloads a gravatar for each author +- Replaces all usernames of one author with the most frequent one +- Prepends the file path with the repository name. + +The output is a string in the gource log format as described above. + +Finally, as we need to invoke all of this for multiple repositories, why not do that with [dired](https://www.gnu.org/software/emacs/manual/html_node/emacs/Dired.html): + +```emacs-lisp +(defun my/gource-dired-create-logs (repos log-name) + "Create combined gource log for REPOS. + +REPOS is a list of strings, where a string is a path to a git repo. +LOG-NAME is the path to the resulting log file. + +This function is meant to be invoked from `dired', where the required +repositories are marked." + (interactive (list (or (dired-get-marked-files nil nil #'file-directory-p) + (user-error "Select at least one directory")) + (read-file-name "Log file name: " nil "combined.log"))) + (let ((authors + (cl-reduce + (lambda (acc repo) + (my/git-get-authors repo acc)) + repos + :initial-value nil))) + (with-temp-file log-name + (insert + (string-join + (seq-filter + (lambda (line) + (not (string-empty-p line))) + (seq-sort-by + (lambda (line) + (if-let (time (car (split-string line "|"))) + (string-to-number time) + 0)) + #'< + (split-string + (mapconcat + (lambda (repo) + (my/gource-prepare-log repo authors)) + repos "\n") + "\n"))) + "\n"))))) +``` + +This function extracts authors from each repository and merges the logs as required by gource, that is sorting the result by time in ascending order. + + +## Using the function {#using-the-function} + +To use the function above, mark the required repos in a dired buffer and run `M-x my/gource-dired-create-logs`. This also works nicely with [dired-subtree](https://github.com/Fuco1/dired-hacks), in case your repos are located in different folders. + +The function will create a combined log file (by default `combined.log`). To visualize the log, run: + +```bash +gource --user-image-dir +``` + +Check the [README](https://github.com/acaudwell/Gource) for possible parameters, such as the speed of visualization, different elements, etc. That's it! + +I thought about making something like a [transient.el](https://github.com/magit/transient) wrapper around the `gource` command but figured it wasn't worth the effort for something that I run just a handful of times in a year. diff --git a/org/2023-01-02-gource.org b/org/2023-01-02-gource.org new file mode 100644 index 0000000..3af2b40 --- /dev/null +++ b/org/2023-01-02-gource.org @@ -0,0 +1,278 @@ +#+HUGO_SECTION: posts +#+HUGO_BASE_DIR: ../ +#+TITLE: Running Gource with Emacs +#+DATE: 2023-01-02 +#+HUGO_TAGS: emacs +#+HUGO_DRAFT: false + +[[./static/images/gource/gource.png]] + +[[https://gource.io/][Gource]] is a program that draws an animated graph of users changing the repository over time. + +Although it can work without extra effort (just run =gource= in a [[https://git-scm.com/][git]] repo), there are some tweaks that can be done: +- Gource supports using custom pictures for users. [[https://en.gravatar.com/][Gravatar]] is an obvious place to get these. +- Occasionally, the same people have different names and/or emails in history.\\ + It may happen when people use forges like [[https://gitlab.com/][GitLab]] or just have different settings on different machines. It would be nice to merge these names. +- Visualizing the history of multiple repositories (e.g. frontend and backend) requires combining multiple gource logs. + +So, why not try doing that with Emacs? + +* Gravatars +Much to my surprise, Emacs turned out to have a built-in package called [[https://github.com/emacs-mirror/emacs/blob/master/lisp/image/gravatar.el][gravatar.el]]. + +So, let's make a function to retrieve a gravatar and save it: +#+begin_src emacs-lisp +(defun my/gravatar-retrieve-sync (email file-name) + "Get gravatar for EMAIL and save it to FILE-NAME." + (let ((gravatar-default-image "identicon") + (gravatar-size nil) + (coding-system-for-write 'binary) + (write-region-annotate-functions nil) + (write-region-post-annotation-function nil)) + (write-region + (image-property (gravatar-retrieve-synchronously email) :data) + nil file-name nil :silent))) +#+end_src + +To use these images, we need to save them to some folder and use usernames as file names. The folder: +#+begin_src emacs-lisp +(setq my/gravatar-folder "/home/pavel/.cache/gravatars/") +#+end_src + +And the function that downloads a gravatar if necessary: +#+begin_src emacs-lisp +(defun my/gravatar-save (email author) + "Download gravatar for EMAIL. + +AUTHOR is the username." + (let ((file-name (concat my/gravatar-folder author ".png"))) + (mkdir my/gravatar-folder t) + (unless (file-exists-p file-name) + (message "Fetching gravatar for %s (%s)" author email) + (my/gravatar-retrieve-sync email file-name)))) +#+end_src + +* Merging authors +Now to merging authors. + +Gource itself uses only usernames (without emails), but we can use =git log= to get both. The required information can be extracted like that: +#+begin_src bash +git log --pretty=format:"%ae|%an" | sort | uniq -c | sed "s/^[ \t]*//;s/ /|/" +#+end_src + +The output is a list of pipe-separated strings, where the values are: +- Number of occurrences for this combination of username and email +- Email +- Username + +Of course, that part would have to be changed appropriately for other version control systems if you happen to use one. + +So, below is one hell of a function that wraps this command and tries to merge emails and usernames belonging to one author: +#+begin_src emacs-lisp +(defun my/git-get-authors (repo &optional authors-init) + "Extract and merge all combinations of authors & emails from REPO. + +REPO is the path to a git repository. + +AUTHORS-INIT is the previous output of `my/git-get-authors'. It can +be used to extract that information from multiple repositories. + +The output is a list of alists with following keys: +- emails: list of ( . ) +- authors: list of ( . ) +- email: the most popular email +- author: the most popular username +I.e. one alist is all emails and usernames of one author." + (let* ((default-directory repo) + (data (shell-command-to-string + "git log --pretty=format:\"%ae|%an\" | sort | uniq -c | sed \"s/^[ \t]*//;s/ /|/\"")) + (authors + (cl-loop for string in (split-string data "\n") + if (= (length (split-string string "|")) 3) + collect (let ((datum (split-string string "|"))) + `((count . ,(string-to-number (nth 0 datum))) + (email . ,(downcase (nth 1 datum))) + (author . ,(nth 2 datum))))))) + (mapcar + (lambda (datum) + (setf (alist-get 'author datum) + (car (cl-reduce + (lambda (acc author) + (if (> (cdr author) (cdr acc)) + author + acc)) + (alist-get 'authors datum) + :initial-value '(nil . -1)))) + (setf (alist-get 'email datum) + (car (cl-reduce + (lambda (acc email) + (if (> (cdr email) (cdr acc)) + email + acc)) + (alist-get 'emails datum) + :initial-value '(nil . -1)))) + datum) + (cl-reduce + (lambda (acc val) + (let* ((author (alist-get 'author val)) + (email (alist-get 'email val)) + (count (alist-get 'count val)) + (saved-value + (seq-find + (lambda (cand) + (or (alist-get email (alist-get 'emails cand) + nil nil #'string-equal) + (alist-get author (alist-get 'authors cand) + nil nil #'string-equal) + (alist-get email (alist-get 'authors cand) + nil nil #'string-equal) + (alist-get author (alist-get 'emails cand) + nil nil #'string-equal))) + acc))) + (if saved-value + (progn + (if (alist-get email (alist-get 'emails saved-value) + nil nil #'string-equal) + (cl-incf (alist-get email (alist-get 'emails saved-value) + nil nil #'string-equal) + count) + (push (cons email count) (alist-get 'emails saved-value))) + (if (alist-get author (alist-get 'authors saved-value) + nil nil #'string-equal) + (cl-incf (alist-get author (alist-get 'authors saved-value) + nil nil #'string-equal) + count) + (push (cons author count) (alist-get 'authors saved-value)))) + (setq saved-value + (push `((emails . ((,email . ,count))) + (authors . ((,author . ,count)))) + acc))) + acc)) + authors + :initial-value authors-init)))) +#+end_src + +Despite the probable we-enjoy-typing-ness of the implementation, it's actually pretty simple: +- The output of =git log= is parsed into a list of alists with =count=, =email= and =author= as keys. +- This list is reduced by =cl-reduce= into a list of alists with =emails= and =authors= as keys and the respective counts as values, e.g. =(( . 1) ( . 3))=.\\ + I've seen a couple of cases where people would swap their username and email (lol), so =seq-find= also looks for an email in the list of authors and vice versa. +- The =mapcar= call determines the most popular email and username for each authors. + +The output is another list of alists, now with the following keys: +- =emails= - list of elements like =( . )= +- =authors= - list of elements like =( . )= +- =email= - the most popular email +- =author= - the most popular username. + +* Running for multiple repos +This section was mostly informed by [[https://github.com/acaudwell/Gource/wiki/Visualizing-Multiple-Repositories][this page]] in the [[https://github.com/acaudwell/Gource/wiki][gource wiki]]. + +As I said above, by default =gource= just creates a visualization for the current repo. To change something in it, we need to invoke the program like that: =gource --output-custom-log PATH=, where =PATH= is either the path to the log file or =-= for stdout. + +The log consists of lines of pipe-separated strings, e.g.: +#+begin_example +1600769568|dsofronov|A|/studentor/.dockerignore +1600769568|dsofronov|A|/studentor/.editorconfig +1600769568|dsofronov|A|/studentor/.flake8 +1600769568|dsofronov|A|/studentor/.gitignore +#+end_example + +where the values of one line are: +- UNIX timestamp +- Author name +- =A= for add, =M= for modify, and =D= for delete +- Path to file + +The file has to be sorted by the timestamp in ascending order. + +So, the function that prepares the log for one repository: +#+begin_src emacs-lisp +(defun my/gource-prepare-log (repo authors) + "Create gource log string for REPO. + +AUTHORS is the output of `my/git-get-authors'." + (let ((log (shell-command-to-string + (concat + "gource --output-custom-log - " + repo))) + (authors-mapping (make-hash-table :test #'equal)) + (prefix (file-name-base repo))) + (cl-loop for author-datum in authors + for author = (alist-get 'author author-datum) + do (my/gravatar-save (alist-get 'email author-datum) author) + do (cl-loop for other-author in (alist-get 'authors author-datum) + unless (string-equal (car other-author) author) + do (puthash (car other-author) author + authors-mapping))) + (cl-loop for line in (split-string log "\n") + concat (let ((fragments (split-string line "|"))) + (when (> (length fragments) 3) + (when-let (mapped-author (gethash (nth 1 fragments) + authors-mapping)) + (setf (nth 1 fragments) mapped-author)) + (setf (nth 3 fragments) + (concat "/" prefix (nth 3 fragments)))) + (string-join fragments "|")) + concat "\n"))) +#+end_src + +This function: +- Downloads a gravatar for each author +- Replaces all usernames of one author with the most frequent one +- Prepends the file path with the repository name. + +The output is a string in the gource log format as described above. + +Finally, as we need to invoke all of this for multiple repositories, why not do that with [[https://www.gnu.org/software/emacs/manual/html_node/emacs/Dired.html][dired]]: +#+begin_src emacs-lisp +(defun my/gource-dired-create-logs (repos log-name) + "Create combined gource log for REPOS. + +REPOS is a list of strings, where a string is a path to a git repo. +LOG-NAME is the path to the resulting log file. + +This function is meant to be invoked from `dired', where the required +repositories are marked." + (interactive (list (or (dired-get-marked-files nil nil #'file-directory-p) + (user-error "Select at least one directory")) + (read-file-name "Log file name: " nil "combined.log"))) + (let ((authors + (cl-reduce + (lambda (acc repo) + (my/git-get-authors repo acc)) + repos + :initial-value nil))) + (with-temp-file log-name + (insert + (string-join + (seq-filter + (lambda (line) + (not (string-empty-p line))) + (seq-sort-by + (lambda (line) + (if-let (time (car (split-string line "|"))) + (string-to-number time) + 0)) + #'< + (split-string + (mapconcat + (lambda (repo) + (my/gource-prepare-log repo authors)) + repos "\n") + "\n"))) + "\n"))))) +#+end_src + +This function extracts authors from each repository and merges the logs as required by gource, that is sorting the result by time in ascending order. + +* Using the function +To use the function above, mark the required repos in a dired buffer and run =M-x my/gource-dired-create-logs=. This also works nicely with [[https://github.com/Fuco1/dired-hacks][dired-subtree]], in case your repos are located in different folders. + +The function will create a combined log file (by default =combined.log=). To visualize the log, run: +#+begin_src bash +gource --user-image-dir +#+end_src + +Check the [[https://github.com/acaudwell/Gource][README]] for possible parameters, such as the speed of visualization, different elements, etc. That's it! + +I thought about making something like a [[https://github.com/magit/transient][transient.el]] wrapper around the =gource= command but figured it wasn't worth the effort for something that I run just a handful of times in a year. diff --git a/org/static/images/gource/gource.png b/org/static/images/gource/gource.png new file mode 100644 index 0000000..04f559a Binary files /dev/null and b/org/static/images/gource/gource.png differ diff --git a/static/images/gource/gource.png b/static/images/gource/gource.png new file mode 100644 index 0000000..04f559a Binary files /dev/null and b/static/images/gource/gource.png differ