Building our own zsh_stats command line app

Photo by Teo Zac on Unsplash

Building our own zsh_stats command line app

Return the most frequently used terminal commands

Ā·

11 min read

In the previous post we saw how zsh has a nice inbuilt function zsh_stats to get a summarized list of the most commonly used terminal commands.

This got me wondering, can we replicate this result ourselves? šŸ¤”

Letā€™s find out.

Language of Choice

We will go with F# to do this.

Iā€™ve been checking it out recently and found it to be a really nice concise language (like most other functional languages), with a really helpful type system and an amazing development experience. I love the ability to interactively evaluate expressions right inside the editor. Reminds me a lot of Clojure.

This article: Why is F# code so robust and reliable? does a great job of explaining its amazing features better than I could, so please go and check it out.

I like it so far and want to use it more, so Iā€™m taking every chance I get to build more projects with it šŸ‘·šŸ¾

Understanding the history file

The first thing we need to do is to get where these historical commands are written and try to read the file ourselves.

In my case (using zsh and oh-my-zsh) I found that the history file is stored in the $HISTFILE environment variable. This is nice since we can just provide the same environment variable to our program.

Letā€™s see how it looks like with cat $HISTFILE

Sample of commands from the shell history

We can see in the small sample here that there are some blank lines mixed in with others that contain a command, along with some other content. Letā€™s figure it out.

It looks like the lines with the commands start with a colon : and a space, then a number that looks like a unix timestamp, and another colon then 0
Then we have a semicolon ; and finally we have the actual command.

According to ChatGPT this the meaning of each part of the line

  1. : <epoch_timestamp>:

    • This is a Unix epoch timestamp (the number of seconds since January 1, 1970). It represents when the command was executed.
  2. : <elapsed_time_in_seconds>:

    • This is the amount of time (in seconds) that the command took to execute. It's the difference between the time the command started and when it finished.
  3. ;command:

    • After the semicolon (;), the actual command that was executed in the shell appears. This is the command as the user typed it.

Now that we understand what we have to deal with, letā€™s setup the project.

Project Setup

If you want to follow along ensure you have .NET installed. As of the time of writing, the most current version is .NET8.

Once everything is setup you should have the dotnet command

āÆ dotnet --version
8.0.104

Letā€™s use the dotnet CLI to create a new console app and call it HistoryStats. We also need to specify that we want to use F# as the language.

dotnet new console -lang "F#" -o HistoryStats

It generates a console app with this directory structure

āÆ tree
.
ā”œā”€ā”€ HistoryStats.fsproj
ā””ā”€ā”€ Program.fs

1 directory, 2 files

We will add all the subsequent code to the Program.fs file.

To make sure everything is working, run the app with dotnet run and you should see the following message printed from the default program

"Hello from F#"

Now that we have the environment ready, letā€™s get started.

The eventual goal is to run dotnet run $HISTFILE and it should give us a ranked list of the most commonly used commands in our terminal.

Reading the History File

This is the first iteration of a program to read and print out each line from the history file, which is provided as a command line argument

open System
open System.IO

let commandsByFrequency historyFile =
    File.ReadLines(historyFile)
    |> Seq.iter (fun line -> printfn "%s" line)

[<EntryPoint>]
let main argv =
    let args = Environment.GetCommandLineArgs()

    // Optional: Print all the arguments
    printfn "Command-line arguments: %A" args

    match args with
    | [| _app; historyFile; |] ->
        printfn "First argument: %s" historyFile
        readFileLines historyFile

    | _ ->
        printfn "Usage: dotnet run <historyFile>"

    0

The main part to focus on is the commandsByFrequency function, which reads the provided file line by line. We then use Seq.iter to iterate and print out each line in the file. We will add most of the functionality to this function.

In the main function, which will be the entrypoint, we also have some pattern matching to ensure we only try to process the file if the program has been run with the first argument, which should be the history file.

If the file path is not provided, we print a message showing that we require an argument to be provided and how to provide it.

Now we can run our application, providing $HISTFILE as the argument, and it should print out each line of the history file.

dotnet run $HISTFILE

Parsing the History File

On my system, printing out all the lines in the history file is quite noisy, so letā€™s start processing those lines to get to the interesting part.

We will focus on the commandsByFrequency function to do all the processing we need.

First, we need to skip the blank lines before continuing with the processing

let commandsByFrequency filePath =
    // Returns an enumerable over the lines in the file
    File.ReadLines(filePath)
    |> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
    |> Seq.iter (fun line -> printfn "%s" line)

If you run the app now you should see that we no longer print out the blank lines in the file.

We will make heavy use of the Seq module, which is roughly similar to the Enumerable module in Ruby. It has a ton of useful functions for processing collections of data. It is lazy by default and only processes individual sequence elements as required, making it a nice tool for our needs.

Seq.choose deserves a special mention since it makes it so easy to separate the lines we want to keep and the ones to discard. It takes a function that should return None if we want to skip the item or Some x if we want to keep the item x

I found it to be quite elegant and we will use it again in the next section.

Extracting the Command from a History Line

Now that we have established a pattern for iterating over the lines in the file, letā€™s start parsing them to extract the info we need.

Just a reminder that this is how a line in the history file looks:
: 1726414019:0;brew info mongod

To extract the command from a non-blank line, letā€™s add a new function that should do 2 things:

  1. Get the command, which starts immediately after the first semicolon i.e. ;

  2. Extract the first part of the command, without the arguments

let parseHistoryLine (line: string) =
    if line.StartsWith(":") then
        let semiColonIndex = line.IndexOf(';')
        // 1. Get the command i.e. everything after the semicolon
        let fullCommand = line.Substring(semiColonIndex + 1)
        // 2. Split by space and get the first part of the command
        let command = fullCommand.Split([| ' ' |]) |> Array.head
        Some command
    else
        None

We will use the same pattern of using a function that returns an option, to determine which lines to keep or discard, then passing this function to Seq.choose

We return Some command to indicate that want to keep the line and None to indicate that we want to discard the line.

By now we already know how to use this command in our processing pipeline:

let commandsByFrequency historyFile =
    File.ReadLines(historyFile)
    // Take only the non-blank lines
    |> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
    // Extract the command from the line
    |> Seq.choose (fun line -> (parseHistoryLine line))
    // Optional: Print out the commands we will process
    |> Seq.iter (fun command -> printfn "%s" command)

Now that we have this pattern established, letā€™s take advantage of it to now finally get the most frequently used commands.

More Seq Magic

All thatā€™s let for us to do is to find out how many times each command occurs. Once we have that, we can sort the sequence of commands in descending order, then get the top commands based on the frequency of occurrence.

Letā€™s write out the whole function then go through the important parts

let commandsByFrequency count historyFile =
    // Returns an enumerable over the lines in the file
    File.ReadLines(historyFile)
    // Take only the non-blank lines
    |> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
    // Extract the command from the line
    |> Seq.choose (fun line -> (parseHistoryLine line))
    // Group by command -> (command, seq of commands)
    |> Seq.groupBy id
    // Count occurrences
    |> Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))
    // Sort by the count in descending order
    |> Seq.sortByDescending snd
    // Take the top `count` elements
    |> Seq.take count

The first change is the addition of a new count parameter, which represents the number of commands we want to return.

The first addition is Seq.groupBy

Applies a key-generating function to each element of a sequence and yields a sequence of unique keys. Each unique key contains a sequence of all elements that match to this key.

The key-generating function we provide is the id function, which simply returns whatever is provided to it as an input.

 id 12345     //  Evaluates to 12345
 id "command"  //  Evaluates to "command"

In our case, what it does is to group the commands by the command itself. It returns a tuple containing 2 items, where the first item is the command itself, and the second is each occurrence of the command in the provided sequence of commands.

This is the sample output we would get if we printed out the result at this point in the program

("omz_history", seq ["omz_history"])
("gds", seq ["gds"])
("ggpush", seq ["ggpush"])
("zsh_stats", seq ["zsh_stats"; "zsh_stats"; "zsh_stats"])

Now that we have the command and a sequence of all the occurrences, then we can count the occurrences of each command. This is what happens in the next step:

// Count occurrences
Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))

This takes in the tuple returned from the groupBy and transforms it into a new tuple containing the command and the frequency i.e. number of times it occurs.

So it would transform the previous input into something like this:

("omz_history", 1)
("gds", 1)
("ggpush", 1)
("zsh_stats", 3)

And now that we have the counts of each command, we can now sort the commands by the count in descending order, which is the second element of the tuple.

// Sort by the count in descending order
|> Seq.sortByDescending snd

snd is a function that returns the second element of a tuple, and we use it here to get the count and use it to sort the commands.

And now that we have the sorted commands, the final part is to return the top commands by the count.

// Take the top `count` elements
|> Seq.take count

And just like that, we have our very own custom zsh_stats clone

Wrapping up

Now we can take a look at the whole program

open System
open System.IO

let parseHistoryLine (line: string) =
    if line.StartsWith(":") then
        let semiColonIndex = line.IndexOf(';')
        // 1. Get the command i.e. everything after the semicolon
        let fullCommand = line.Substring(semiColonIndex + 1)
        // 2. Split by space to get the first part of the command
        let command = fullCommand.Split([| ' ' |]) |> Array.head
        Some command
    else
        None

let commandsByFrequency count historyFile =
    // Returns an enumerable over the lines in the file
    File.ReadLines(historyFile)
    // Take only the non-blank lines
    |> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
    // Extract the command from the line
    |> Seq.choose (fun line -> (parseHistoryLine line))
    // Group by command -> (command, seq of commands)
    |> Seq.groupBy id
    // Count occurrences
    |> Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))
    // Sort by the count in descending order
    |> Seq.sortByDescending snd
    // Take the top `count` elements
    |> Seq.take count


[<EntryPoint>]
let main argv =
    let args = Environment.GetCommandLineArgs()

    match args with
    | [| _app; historyFile |] ->
        printfn "History file: %s" historyFile
        let result = commandsByFrequency 10 historyFile

        for (cmd: string, count) in result do
            printfn "Command: %s \tCount: %d" cmd count

    | _ -> printfn "Usage: dotnet run <historyFile>"

    0

The only addition is that we now print out the results of the commandsByFrequency function. We will default to getting the top 10 results for now.

And now for the moment of truth, letā€™s compare our programā€™s output and and the zsh_stats command output

zsh_stats command output

Thereā€™s an off by one error with the ls result which returns 166 in our command vs 165 in zsh_stats but they are mostly similar and it gets the job done.

via GIPHY

Conclusion

In summary, we have gone through replicating the zsh function zsh_stats by writing a custom program in F#.

It has proven to be capable enough to replicate the original function, and more importantly, has helped us learn more about operating on sequences of data in F#.

PS:

If you are interested in how the original zsh_stats function works, here you go

āÆ which zsh_stats
zsh_stats () {
        fc -l 1 | awk '{ CMD[$2]++; count++; } END { for (a in CMD) print CMD[a] " " CMD[a]*100/count "% " a }' | grep -v "./" | sort -nr | head -n 20 | column -c3 -s " " -t | nl
}

You can access the full example in the GitHub repository