Exploring Z.sh

3 hours ago 1

Learning shell scripting through code analyses of a real-world example, breaking the ‘z’ program line by line to understand what’s happening under the hood and hopefully become a better bash programmer.

This is valid for the 29/07/2024 master at the rupa/z GitHub repo. To reference the repo's latest commit at the time of writing, you can click here and follow the article examples.

This walkthrough is intended for any programmer who wants to explore how a real-world bash program works under the hood; this walkthrough is designed for beginners who wish to deeper their understanding; with that said, there is a critical requirement that one needs to meet to benefit from this walkthrough, you should be experienced with at least one programming language to the degree that you feel comfortable with its basic concepts, everything else can be picked up as you continue reading.

Motivation: As a beginner bash programmer, I constantly seek practical resources to learn and improve my skills. As you probably already heard, the best way to learn is to get your hands dirty and write actual programs. This is solid advice, but where does one begin? User input is the first candidate that comes to mind, but how do you accept arguments for a program? How do you divide your logic? What's considered a good vs bad practice? Once your motivation gears start shifting, those and many more questions arise.

I've decided that I won't start my bash quest by grinding ideas into bash programs in hopes of improving my programs over time. Yet, instead, I'll peak at what much more experienced programmers are creating. By exploring their practices first, I believe it will help lift the burden of dealing with the many questions one has to answer when entering a new territory and provide the confidence that will allow writing bash programmers that at least adhere to some real-world practices and by doing so it'll hopefully shorten the curve which otherwise will take much more resources to navigate through. I like to think of this process as "code autopsy," inspired by Leonardo de Vinci; we pick a codebase apart and build a mental model of its anatomy and components to reproduce it from our imagination into another composition, our own "painting.”

I picked 'z' for this analysis because it's a program I use daily. It saves me tons of time navigating through paths and practically replaces the 'cd' command for me. When I use my colleagues' terminals or perform an SSH tunneling into a server, I try navigating its terminal with 'z' just to laugh at how much I'm dependent on it being around once the 'program not found' error appears.

When I needed to pick a program to explore, I immediately thought about 'z,' then it stroked me; I have no idea how it actually works under the hood, not even in theory; it's just pure magic as far as I know, you type a partial path and boom you're placed at the location you intended. Some sort of path teleportation, it's star-trek's transporter sci-fi stuff; instead of "Beam me up, Scotty," I write "z up," and if previously I visited a path containing "up," I'll be teleported there.

Curious about its implementation and in need of a real-world program to perform my autopsy on, the decision was easy. I strike two birds with one shot: first, I learn by observing and analyzing code in the wild, and second, I unwrap the magic behind one of my most used bash programs.

Into the code autopsy: Pumped with motivation, I opened GitHub and looked into the source code; I was happy to learn it was only 267 lines long, including comments. After reading a few lines, it immediately hit me, "I have no idea what's going on here.", so many new commands and techniques that I had never seen before, and bash tutorials out there barely scratch the surface of, I knew that reading it top to bottom won't going to be enough, I needed to understand it at the atomic level to grasp the program as a whole. I needed to grab a shovel and start digging into each command, consult with man pages, and write down notes as I go; anything less than that would miss my goals and be a waste of my time.

“A few moments later…”: After covering every command, flow, and idea, all the dots eventually connected, and the program unraveled. I completed my goals and gained the knowledge I was seeking, I learned far beyond what I could've hoped for, and this journey exceeded all my expectations; I gained the confidence I needed, and finally, I know how to begin writing my own programs.

After conquering this imaginary hill, I knew that I needed to share this experience. It was such a beautiful journey for me, and it would be a shame to keep its insights only for myself. This article is my best effort to take you with me on this journey of climbing up the bash skill tree by exploring the dark corners of a real-world program line-by-line. I hope you'll enjoy it as much as I did making it.

Please remember that I'm not an authority on the subject; I have no relations with the creator nor the maintainers of 'z'; I'm just a noob who tries to get better, so if you find any mistakes on my part which you probably will, don't hesitate to comment on this article so that we can both improve and ease the path for others that might follow.

This analysis is intended for complete beginners, even though I hope that even intermediate bash developers can gain some benefits from it; if you are familiar with basic programming concepts, then it is possible to start without any prior bash scripting knowledge, but be warned, it’s going to be unnecessarily brutal, To make the most out of this walkthrough I suggest that you get familiar with at least a few basic concepts which will make this journey much more accessible, fun and less frustrating.

Luckily, those concepts are easy to pick up by completing a few short tutorials; feel free to skip those if you’re already familiar with the basics of Linux terminal, bash scripting, the Awk command, and the Sed command, Sed is a nice-to-have but not strictly necessarily to complete this walkthrough.

If you’re not familiar with those topics, take your time and go over those tutorials first; I’ll be waiting right here for you after you’ve finished them.

  • Open up the linked source code side-by-side with this guide.
  • Try running sections of this code in Replit or locally on your machine and playing with the commands that confuse you to better understand them. Most of them will be obvious, but some, like the commands in the “completions” sections, can be somewhat mind-bending.
  • If using Replit, try asking the AI assistant for explanations, it can possibly clarify confusing sections and provide extra context that is not covered in this article.
  • Don’t try to rush through it. Make sure you understand each command presented in each section. If you do plan to speed-run it, at least pour yourself a quadruple espresso. You’ll need it.

No AI agents were significantly harmed in preparing this article; I only used Grammarly for grammar checking since English is not my native language and Replit to test sections of the "z" source.

While writing this article, I discovered that Replit has an AI assistant. I tested it once or twice to figure out how parts of the completions code work, and it explained it well. This is why I added Replit as an early suggestion for following along.

Even though I'm not a big fan of AI tools and prefer to deep-dive through the official documentation, especially when learning new things, this one has the potential to provide insights I didn't include in this article or even point out mistakes I made, which I think can complement this article and provide extra value for you, the reader.

This is by no means intended to be a discussion about design choices made by the ‘z’ creator and maintainers; we are mainly going to focus on the “how” and not the “why.” If, during this read, you feel strongly about implementation details and think that some areas can and should be optimized, then I encourage you to open an issue and start a discussion over at the ‘z’ project GitHub repo.

This walkthrough is intended to be read top-to-bottom, as it will equal the actual code as if you were reading it the same way. You will be left with the original program if you remove all the content from this article and leave only the code snippets.

I divided this walkthrough into chapters; each one represents a significant component of the program that I thought is worth encapsulating and discussing in isolation; each chapter and its sub-sections will repeat the same pattern:

  • A flow chart indicating the current code in focus. An image that represents the current code in focus relative to all program flows; clicking on it will open a high-resolution version.

     flows clear z flows clear example

  • A snippet of code in focus. Will be indicated by a code component, comments at top portion will indicate line number in source followed by optional nesting level of current snippet. for example:

    # z.sh:63 # _z { ... } / if [ "$1" = "--add" ]; then { ... }

    the above comments indicate we are focusing the code starting at line ‘63’ in the source and are currently inside the ‘_z’ function body, specifically inside the ‘then’ block of the ‘if’ condition that tests if first argument equals the string “--add”.

  • Newly introduced commands. Will be indicated by a bullet list of:

    • command - <short summary>. <Reference link to read more>.
  • Explanation of goals. Will be indicated by a “Goal/s: ” prefix.

  • Walkthrough of the code. Will be indicated by a “Walkthrough: ” prefix.

In each chapter the ‘flow chart’ will help us understand in what part of what flow we’re currently focusing, since we explore the code top-to-bottom, this is not always obvious.

In each chapter's 'commands' section, I'll provide a short summary for each command and include a link referencing the official documentation for further reading. It's up to you how deep you want to explore each command, but I do encourage you to use those links if only for the sake of getting yourself familiar with the bash manual; it's by far the best resource for any questions you might have and getting familiar with it is essential to fully utilize bash.

Explaining the goals will help us understand what the code snippet in focus is trying to achieve.

After we understand what each command in the snippet does and what the goals are, we'll walk through the implementation and witness how it achieves the goal by utilizing the commands we referenced up until now.

Some sections will have "comments." Treat those as "tips" or "clarifications" that didn't fit the above pattern but still include information worth noting.


So, without further ado, put on your surgical cap, scrub in, take a deep breath, grab a scalpel, and let’s cut into this code.

 flows c0 installation Flows chapter 0 installation

# z.sh:5 # INSTALL: # * put something like this in your .bashrc/.zshrc: # . /path/to/z.sh # * cd around for a while to build up the db
  • # - Comment code, a word beginning with ‘#’ causes that word and all remaining characters on that line to be ignored. Read more.
  • . filename [arguments] - Read and execute commands from the filename argument in the current shell context. Read more.

Goal: Make "z.sh" file content part of shell initialization:

  • Define datafile path.
  • Make the "z" command executable.
  • Install completions.
  • Create a shell hook.

Walkthrough: After downloading "z.sh" by directly copying the file, cloning the repo, or installing with a package manager (brew, etc.,…), reference the local path to "z.sh" in .bashrc or .zshrc files. The following ./path/to/z.sh line will evaluate "z.sh" each time we create a new shell instance.

As part of the evaluation, we will define a data file path, either a default value or a value derived from setting a global variable. We will alias "z" to an inner function. If completions are not already installed, we will install them for the specific shell we are on. Finally, we will create a pre-command hook to maintain our data file. There will be a dedicated chapter for each of the sub-goals mentioned.

For now, all you need to know is that upon executing "z.sh," we set the shell to work with "z" by making the mentioned changes, which will allow us to execute our main flows.

Global variables

# z.sh:11 # set $_Z_CMD in .bashrc/.zshrc to change the command (default z). # set $_Z_DATA in .bashrc/.zshrc to change the datafile (default ~/.z). # set $_Z_MAX_SCORE lower to age entries out faster (default 9000). # set $_Z_NO_RESOLVE_SYMLINKS to prevent symlink resolution. # set $_Z_NO_PROMPT_COMMAND if you're handling PROMPT_COMMAND yourself. # set $_Z_EXCLUDE_DIRS to an array of directories to exclude. # set $_Z_OWNER to your username if you want use z while sudo with $HOME kept

Goal: Allow the user to change how “z.sh” works by setting global variables.

Walkthrough: When “z.sh” or its internal flows are executed, they will attempt to read those global variables. If they exist, the program will take their values instead of the default ones.

Usage notes

# z.sh:19 # USE: # * z foo cd to most frecent dir matching foo # * z foo bar cd to most frecent dir matching foo and bar # * z -r foo cd to highest ranked dir matching foo # * z -t foo cd to most recently accessed dir matching foo # * z -l foo list matches instead of cd # * z -e foo echo the best match, don't cd # * z -c foo restrict matches to subdirs of $PWD # * z -x remove the current directory from the datafile # * z -h show a brief help message

Goal: An example of usage.

Walkthrough: Some examples of how a user can work with “z” and what options and positional parameters can be passed to change “z” behavior; we will have a dedicated chapter to explore each option separately.

 flows c1 initialization Flows chapter 1 initialization

# z.sh:30 [ -d "${_Z_DATA:-$HOME/.z}" ] && { echo "ERROR: z.sh's datafile (${_Z_DATA:-$HOME/.z}) is a directory." }
  • [ expr ] - Single brackets are identical to the test command; they evaluate a conditional expression expr and return a 0 (true) or 1 (false) status. Read more.
  • -d file - A conditional expression that is True if the file exists and is a directory. Read more.
  • ${} - Shell parameter expansion: The ‘$’ character introduces parameter expansion, command substitution, or arithmetic expansion. The parameter name or symbol to be expanded may be enclosed in braces, which are optional but serve to protect the variable from characters immediately following it that could be interpreted as part of the name. Read more.
  • $HOME - Shell variable representing the current user home directory. Read more.
  • " " - Double quotes preserve the literal value of all characters within the quotes; it is a string representation. Read more.
  • ${parameter:-word} - If the parameter is unset or null, the expansion of the word is substituted. Otherwise, the value of the parameter is substituted. Read more.
  • expression1 && expression2 - True if both expression1 and expression2 are true. Read more.
  • {} - Block of code.
  • echo [arg …] - Output arguments list arg. Read more.

Goal: Confirm that the system invoking the program has a dedicated file that the program can use to store its data, either provided by the user or a default value. Since it’s possible to override the default value as the comment on the z.sh:12 file state. By reassigning $_Z_DATA environment variable to any value, an edge case may occur if this value evaluates to a directory instead of a file; if we find this edge case, we want to echo an error message with the path we tried to assess to communicate back to the user what originated this error.

Walkthrough: Expand the variable _Z_DATA if it exists; if not, expand $HOME shell variable followed by the /.z string. Check if the result of the expansion is a directory on the file system; if it’s not, skip the block followed by &&; if it is a directory, enter the block, use the echo command and pass it a string to output as an error message, this string utilizes the same expansion we did with the test earlier.


# z.sh:34 _z() { .. }
  • fname() compound-command [ redirections ] - Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. Read more.

Goal: Encapsulate the logic of our program in a reusable function. Its parameters will control the inner execution flow, either adding new entries, getting completions for a current path, or navigating to the best matching path. This reusability will help us when we deal with multiple completion systems. Don't worry too much about this now; we'll cover all those flows soon.

Walkthrough: Define a function using the “_” prefix convention; this convention indicates that this function is intended to be a private function invoked only from other functions inside the program and not exposed to the outside world.


# z.sh:36 # _z { ... } local datafile="${_Z_DATA:-$HOME/.z}"
  • name=[value] - Assign variable statement. Read more.
  • local name[=value] … - Local variable in a function. Read more.

Goal: Define a local variable that will store the path on the file system for our data file so that we can access it anywhere inside the _z() function scope during its execution.

Walkthrough: Define a local variable to the _z() function named datafile and give it a string value of the expansion of the _Z_DATA variable. If it doesn’t exist, assign it to the expansion of the $HOME variable followed by a /.z string.


# z.sh:39 # _z { ... } # if symlink, dereference. [ -h "$datafile" ] && datafile=$(readlink "$datafile")
  • -h file - Conditional expression argument, true if file exists and a symbolic link. Read more
  • $(command) - Command substitution allows the output of a command to replace the command substitution itself. Read More.
  • readlink - Linux command that read value of a symbolic link to get its canonical path. Read more.

Goal: Check an edge case for a symbolic link passed as a data file path through the _Z_DATA environment variable. In such a case, we want to resolve it to its canonical path and reassign our local variable.

Walkthrough: Test if symlink using the -h conditional expression; if it is, reassign $datafile to the resolved symbolic link path using the readlink command on the current value of $datafile inside a command substitution.


# z.sh: 17 # set $_Z_OWNER to your username if you want use z while sudo with $HOME kept # z.sh:42 # _z { ... } # Bail if we don't own ~/.z and $_Z_OWNER not set. [ -z "$_Z_OWNER" -a -f "$datafile" -a ! -O "$datafile" ] && return

-z, -a, -f, -O are unary conditional expressions. Read more to see the full list.

-a is being an exception here and treated by the test as binary operator, it’s behavior inside a test [...] will be determined according to the number of arguments passed to the test, if 2 arguments passed it will be treated as unary conditional operator, meaning it will check if a file exists, if 3 arguments passed it will be treated as binary conditional operator, meaning it will evaluate to true if left side (expr1) AND (expr2) right side of it are both true, acting as a logical AND (&&) control operator.

In our case there are 9 arguments passed, they will be evaluated according to precedence using the rules listed above. Read more to see the full list.

expr1 = -z "$_Z_OWNER" expr2 = -f "$datafile" expr3 = expr1 -a expr2 expr4 = ! -O "$datafile" expr5 = expr3 -a expr4
  • -z - Check if variable length is zero.
  • -a - True if both expr1 and expr2 are true. (binary operator, see callout above)
  • -f - Check if it's an actual file.
  • -O - Check if executioner is owner.
  • ! expression - True if expression is false. Read more.
  • return [n] - Cause a shell function to stop executing and return the value n to its caller. If n is not supplied, the return value is the exit status of the last command executed in the function. Read more.

Goal: Check an edge case where the user doesn't have ownership over the data file. This may occur if a system has multiple users and one user tries to write to another data file. Since the data file is scoped by default to the $HOME variable, which in turn mapped to the current user home directory, there shouldn't be a problem since each user will have its own data file, but if we defined the $_Z_DATA variable to a specific and a non-user contextual path, usually permitted only by admin privileges then any non-admin user who tries to use the program and doesn't explicitly added to the $_Z_OWNER environment variable will hit this edge case.

One possible reason (I'm not sure; I need to confirm the intention) for this to be even possible is to allow for a global data file to be shared across some or all users. If we define $_Z_DATA to a shared path and set $_Z_OWNER to the current user then users can share a data file if they are permitted to write to it by the system admin or by using the sudo command.

Walkthrough: First, check if $_Z_OWNER is empty, then check if $datafile exists and is a file, and finally, check if the executioner is NOT the file owner; if all conditions pass, early return. If the current user is not the file owner, exit with the status of the previous command (which is "reading"? Or nothing if there is no symlink? I need to confirm this).


 flows c2 read data Flows chapter 2 read data

# z.sh:44 # _z { ... } _z_dirs () { [ -f "$datafile" ] || return local line while read line; do # only count directories [ -d "${line%%\|*}" ] && echo "$line" done < "$datafile" return 0 }
  • expression1 || expression2 -* True if either expression1 or expression2 is true. Read more.
  • while test-commands; do consequent-commands; done - Execute consequent-commands as long as test-commands has an exit status of zero. Read more.
  • read - One line is read from the standard input. Read more.
  • -d - True if file exists and is a directory. Read more.
  • ${parameter%%word} - The word is expanded to produce a pattern and matched against, If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted. Read more.
  • [n]<word - Redirection of output causes the file whose name results from the expansion of word to be opened for writing on file descriptor n, or the standard output (file descriptor 1) if n is not specified.. Read more.

Goal: Retrieve all the directories listed in our $datafile, since each line in our $datafile contains a path portion, we want to extract and echo it so that the invoker of this function will get a list of all the paths we stored up to this point in time.

Walkthrough: Define a new function named _z_dirs(); in its body, test if the variable $datafile is an actual file; if not, then early return; if it is, continue to assign a local line variable. Loop over $datafile variable using the while…do loop by redirecting the output of $datafile to the loop, as a test-command for the loop, use read on the variable line to iterate over each line from the $datafile and expose it inside the loop body. Use ${parameter%%word} to expand the word portion into the pattern \|*, this pattern then matched against the parameter and gets deleted from the result, leaving us with parameter minus the pattern. Test if the resulting parameter is a directory, if it is, echo the original line. After iteration is done return a success exit status from the function.

Each line in the $datafile will have 3 sections separated by a pipe character |; the first section is the path, followed by the rank and a timestamp; the entire line will have the following structure path | rank | timestamp \n(without spaces), for example:

/Users/me/.emacs.d|16|1733654434\n

In the current walkthrough, we stated that the word portion of ${parameter%%word} is expanded into a pattern \|*. This pattern escapes the pipe character using \ and then matches everything after and including the pipe using the globstar pattern matching *. Applying this pattern to the above example will result to the path portion of the line.

/Users/me/.emacs.d

Now, we can safely test against this path. Later, we’ll see how to construct those sections and populate the $datafile, but for now, just remember that each line will have this shape.


 flows c3 add-data Flows chapter 3 add data

Later in the completions chapter (chapter 8), we’ll see that we trigger this flow on every directory path change. We’ll do this by adding a pre-command execution function, which means that on every path change, we fire the _z functions with the “--add” parameter to land here.

Don’t worry too much about this at the moment. Currently, you only need to understand that this flow is responsible for adding and updating entries in the $datafile and that it’s triggered on every path change.

# z.sh:56 # _z { ... } # add entries if [ "$1" = "--add" ]; then { .. }
  • if []; then … elif … fi - If clause conditional construct. Read more.
  • $1 - Is the first positional parameter passed to the function. Read more.
  • string1 = string2 - True if the strings are equal. Read more.

Goal: Check if the first argument passed to the script equals the string "--add." If it does, enter the then block; if not, skip it.

Walkthrough: Evaluate positional parameter $1, which will be the first argument passed to the _z function by its invoker. In the case of “_z --add,” $1 will evaluate to “--add.” Then, we use the equal conditional expression (operator =) and compare it to the string “--add.” If we have a match, we enter the then block of the if clause.


# z.sh:57 # _z { ... } / if [ "$1" = "--add" ]; then { ... } shift
  • shift [n] - shift and shift 1 are the same, it shifts arguments by n-1, the positional parameters from n+1 … $# are renamed to $1 … . Parameters represented by the numbers $# down to $#-n+1 are unset. Read more.
    • $# - Expands to the number of positional parameters in decimal. Read more.

Goal: Since we already used the first parameter to control the flow of execution in the _z function and its evaluation pointed us to the "add" flow, by convention, we have no further use for it. Furthermore, we want to treat the "rest" of the passed parameters as a single word utilizing special parameters such as $*, which operate on all the current parameters to combine them into a single word representing a single path; if we didn't use shift, our path would end up with an "--add" prefix.

Walkthrough: After we use the shift command, the $2 argument becomes the new $1, and the previous $1 and $2 are discarded, this effectively removes “--add” from the arguments passed to script.


# z.sh:60 # _z { ... } / if [ "$1" = "--add" ]; then { ... } # $HOME and / aren't worth matching [ "$*" = "$HOME" -o "$*" = '/' ] && return
  • $* - Expands to the positional parameters. When the expansion occurs within double quotes, it expands to a single word, the separation of words is determined by IFS shell variable . Read more.
    • IFS - Shell variable responsible for a list of characters that separate fields; used when the shell splits words as part of expansion. Read more.
  • expr1 -o expr2 - True if either expr1 or expr2 is true. Read more.

Goal: Test for a case where the positional parameters passed as arguments are either the user's home directory or the root directory; those two paths are too common, visited frequently, and easy to navigate without special tooling; adding them to our $datafile adds no significant value.

Walkthrough: Check if the concatenation of all passed arguments is equal to the expansion of the Shell variable $HOME or to the root path /, if it is, early return.


# z.sh:63 # _z { ... } / if [ "$1" = "--add" ]; then { ... } # don't track excluded directory trees if [ ${#_Z_EXCLUDE_DIRS[@]} -gt 0 ]; then local exclude for exclude in "${_Z_EXCLUDE_DIRS[@]}"; do case "$*" in "$exclude"*) return;; esac done fi
  • ${name[@]} - Expands to all members of the array name. expands each element of name to a separate word. Read more.
  • arg1 -gt arg2 - Arithmetic binary operators return true if arg1 is greater than arg2arg1 and arg2 may be positive or negative integers. Read more.
  • for name [ [in [words …] ] ; ] do commands; done - Expand words and execute commands once for each member in the resultant list, with name bound to the current member. Read more.
  • case word in …[ [(] pattern [| pattern]…) command-list ;;]… esac - case will selectively execute the command-list corresponding to the first pattern that matches word. The | (pipe) is used to separate multiple patterns, and the ) (closing parentheses) operator terminates a pattern list. A list of patterns and an associated command-list is known as a clause. Each clause must be terminated with ;;, &, or ;;&. Each word and pattern undergoes tilde expansion, parameter expansion, command substitution, arithmetic expansion, (process substitution - not word only pattern), and quote removal. There may be an arbitrary number of case clauses, each terminated by a ;;, &, or ;;&. The first pattern that matches determines the command-list that is executed. It’s a common idiom to use * as the final pattern to define the default case, since that pattern will always match. Read more.

Goal: Allow the user to define a global array variable of paths to exclude directory trees; each path and its sub-directories will return early upon encounter during script execution.

Walkthrough: Check if a global variable _Z_EXCLUDE_DIRS is supplied with at least one array item value representing excluded dirs. If it has more than zero items, execute the block. Then, define a local exclude variable inside the if to not pollute the function scope if no items are provided. In this case, the local variable will be scoped to the _z functions scope. Iterate over the list of excluded paths and check if provided arguments to the _z function, namely the provided path, match or start with one of the excluded paths; if it does, early return from the function to finish its execution.

The * (asterisk) in the ”$exclude”* pattern is a globstar; it will match against any string followed by $exclude; we use it to match against any path that starts with $exclude, for example, if we excluded /var/dir path by adding it to the _Z_EXCLUDE_DIRS array and we evaluated $* word to the path /var/dir/subdir then the case will match since the word component matches the pattern because of the expansion of the globstar. - Read more.


# z.sh:71 # _z { ... } / if [ "$1" = "--add" ]; then { ... } # maintain the data file local tempfile="$datafile.$RANDOM" local score=${_Z_MAX_SCORE:-9000} _z_dirs | \awk -v path="$*" -v now="$(\date +%s)" -v score=$score -F"|" ' BEGIN { rank[path] = 1 time[path] = now } $2 >= 1 { # drop ranks below 1 if( $1 == path ) { rank[$1] = $2 + 1 time[$1] = now } else { rank[$1] = $2 time[$1] = $3 } count += $2 } END { if( count > score ) { # aging for( x in rank ) print x "|" 0.99*rank[x] "|" time[x] } else for( x in rank ) print x "|" rank[x] "|" time[x] } ' 2>/dev/null >| "$tempfile"
  • $RANDOM - Generated a random integer from 0 to 32767. Read more.
  • command1 | command2 - A pipeline, the output of each command in the pipeline is connected via a pipe to the input of the next command. That is, each command reads the previous command’s output. Read more.
  • date +%s - The date utility displays the date and time read from the kernel clock, %s is the number of seconds since the Epoch (1970-01-01 00:00 UTC), + sign is pad with zeros, and put ‘+’ before future years with >4 digit. Read more.
  • _z_dirs - Invoke a function with the same name defined earlier.
  • \command - putting a backslash before a command ignores aliases and invoke command directly, only affect interactive-shell and not scripts.
  • [n]>[|]word - Redirection of output causes the file whose name results from the expansion of word to be opened for writing on file descriptor n, or the standard output (file descriptor 1) if n is not specified. If the file does not exist it is created; if it does exist it is truncated to zero size (only if ‘noclobber’ option is not enabled). If the redirection operator is >|, the redirection is attempted even if the file named by word exists (disregard ‘noclobber’ option). Read more.
    • > - redirects output to a file/device, if no number specified, standard output stream (stdout) is assumed.
    • 1> and > - are the same, redirects "stdout" to a file.
    • 2> - redirects "stderr" to a file.
    • &>file - redirect both "stderr" and "stdout" to a file.
    • > file 2>$1 - this is older syntax replaced by the nitted &>file, redirects "stdout" and "stderr" to file. Read more.
  • /dev/null - is the null device, it takes any input and throws it away, used to surpass output.

Although you can follow along with rudimentary or even have no previous knowledge of awk, it is strongly recommended that you familiarize yourself with the basics. I’ll explain each command and idea as we encounter it, just as we did up until now, but we won’t cover awk itself since it’s outside of our scope and deserves a tutorial of its own. The best way to move forward regardless if you never used awk or have basic understanding of it is to follow along with the awk man pages for each command you want to understand better.

AWK CONTEXT:

  • awk -v - Is used to introduce a variable to the awk program, in this case we define "path”, “now” and “score" variables for the awk program to use internally.
  • awk -F"|" - Is used to define the awk input field separator to the pipe character "|".
  • BEGIN - part of awk special patterns, it’s executed before the first record of input is read, used as a definition block to pass properties, override variables and assign new variables.
  • END - is used as a post-process block to handle awk processing results.
  • var[index]=value - var is an associative array which uses string index to assign a value at that index.
  • $1 - in awk is equal to the value of the second input field for current record, if a record is a one line made of '[path]|[rank]|[time]\n’ and we used a field separator as the pipe(|) character then $1 will reference [path] on each line.
  • $2 - same as ‘$1’ but $2 will reference [rank] on each line.
  • $3 - same as ‘$1’ but $3 will reference [time] on each line.
  • += - assign to left side its current value with the additional value of the right side.
  • >= - checks if left side is greater than or equal to right side.

Goal: Maintain the datafile; maintaining the datafile means a few things:

  1. Creating or updating an entry: if the provided path argument doesn't exist then create it. If it already exists, add 1 to its current rank and update the timestamp.
  2. Drop entries: If an entry has a low rank exclude it from the result.
  3. Age entries: If the accumulated ranks of all entries exceed the _Z_MAX_SCORE value, then lower all entries' ranks.
  4. Temporary copy of datafile: Create a temporary file with the new results, the name of the file will have a random suffix.

Walkthrough: Set a local variable tempfile to a string constructed from the expansion of $datafile to a path, a dot and a random int. Set local variable score to equal _Z_MAX_SCORE if not null or unset; otherwise, assign the int 9000 as the default value.

We invoke the _z_dirs function, which, I remind you, echoes lines from the $datafile file to be used as stdin for the awk program. Each line represents a row, which is further divided by a pipe character | to represent a column. The first column is the path, the second is the score, and the third is a timestamp. Each row has the following structure - 'path: string | score: float | timestamp:int' (with no spaces).

We invoke the 'awk' program over the piped _z_dirs stdout with the -v flag for each variable we want to pass to awk; the first variable is a path variable providing the input positional parameters as a single word, the now variable as an epoch timestamp, and the score variable we previously defined as a local variable. We then set the awk input field separator character by passing a -F flag with the value of | (pipe); this will break each line into fields by targeting the | (pipe) character, given that each row has the previously mentioned structure we can then refer to each field as $1, $2, $3 when $1 will be the path field, $2 the score field and $3 the timestamp field.

In the BEGIN block, we utilize the path variable, which evaluates to $* as an array index both for the rank and time newly defined associative arrays, rank[path] = 1, which assigns 1 to the rank array at index path. And time[path] = now does the same as the previous, but assigns the now variable as the value for that index on the time array.

The awk body iterates over each record, giving us access to its fields; we condition the body execution for a record by checking if $2 >= 1, which performs a check per record that conditions the execution for that record only if the second field($2), the ‘rank,’ is greater or equal to int 1. This effectively drops all records with a rank below one.

After we filter out low-ranking records, we can start operating on each of the remaining ones. We first check if the $1 field, which is the ‘path’ field for the current record, matches the path variable ($*) that we passed to awk upon its execution using the -v flag.

  • If we have a match, reassign the rank array on the path index key to the sum of the current record rank($2) field and 1. Also, reassign the time array on the same index key to the now variable.
  • In case we don’t have a match, create a new index key both on the rank and time associative arrays for the current record path ($1) and assign the current record rank($2) and time($3) fields respectfully.

Lastly, on the awk body, we accumulate ranks by assigning a count variable that holds all records ranks that passed into it.

At the end of the awk body iteration, we will have rank and time arrays and a count integer variable stored in memory for later use.

In the END block, we deal with aging. To age entries, we need to test if the count accumulated variable is greater than the max allowed score stored in the _Z_MAX_SCORE global variable, which is 9000 by default.

Whether we match the condition or not, in both cases, we loop over the rank array and construct a line matching the structure of rows from our $datafile; the only difference is that if we had a match, then on the rank column, we multiply 'rank' value by "0.99" to represent the 'aging' of all records.

At the end of the awk execution, we are left with a new string representing a $datafile as stdout of the awk program; it contains modified lines, our new or updated entry for passed path, and possibly aged rank for each line.

We then save the output of the awk program to a $tempfile by taking the stdout from the awk program and redirecting it to a $tempfile while ignoring all errors; we do this redirection even if $tempfile already exists.


# z.sh:96 # _z { ... } / if [ "$1" = "--add" ]; then { ... } # do our best to avoid clobbering the datafile in a race condition. if [ $? -ne 0 -a -f "$datafile" ]; then \env rm -f "$tempfile" else [ "$_Z_OWNER" ] && chown $_Z_OWNER:"$(id -ng $_Z_OWNER)" "$tempfile" \env mv -f "$tempfile" "$datafile" || \env rm -f "$tempfile" fi
  • $? - Expands to the exit status of the most recently executed foreground pipeline. Read more.
  • arg1 -ne arg2 - Arithmetic binary operators return true if arg1 is not equal to arg2, respectively, arg1 and arg2 may be positive or negative integers. Read more.
  • env - Is a utility to set environment variables before executing a command, it is usually used to find executable in different operating systems.. Read more.
  • rm -f - Remove files or directories, -f, ‘--force’, ignore nonexistent files and arguments, never prompt . Read more.
  • mv -f - Move (rename) files, -f, ‘--force’, used to not prompt before overwriting. Read more.
  • chown - Change file owner and group. Read more.
  • id - Print real and effective user and group IDs. Read more.
    • -n, --name - Print a name instead of a number.
    • -g, --group - Print only the effective group ID.

Goal: We have three main goals in the section:

  1. Checking for errors that might occur during the awk program execution, if we encounter an error, we remove the possibly created $tempfile to prevent it from accumulating as junk files on the user system.
  2. If no errors are found, check if the global _Z_OWNER variable is set. If this variable is defined, we add ownership permissions to the newly created $tempfile.
  3. If no errors are found, then rename the $tempfile to $datafile, which overwrites the previous $datafile. If we fail at this by returning a non-zero exit code, then we fall back to removing the $tempfile.

Walkthrough: Check if the last execution didn't exit with an error code and that "$datafile" is an actual regular file. If all conditions pass, meaning we have an error, then enter the if block and remove $tempfile without prompt confirmation. If not, go to the else block.

In the else block, first, check if the $_Z_OWNER variable is defined; if it is, then evaluate the right side of the AND operator, which makes the evaluated $_Z_OWNER the owner of the $tempfile using chown <name>:<group> <file> command.

Following in the else block, we use the mv command to switch the current $datafile with our newly created $tempfile. If the switching operation fails, evaluate the right side of the OR operator and delete the $tempfile using the rm command.

Congratulations on making it to the end of this part. In this part, you learned about the installation steps, initialization of the data file, reading data from the data file, and adding new data. In the next part, we'll explore the completions generation flow, how to get user input, and how to deal with edge cases. Whenever you are ready, hit the following part link. See you there!

Read Entire Article