Show HN: Htvend, a tool to capture internet dependencies
3 months ago
3
htvend is a tool to help you capture any internet dependencies needed in order to perform a task.
It builds a manifest of internet assets needed, which you can check-in with your project.
The idea being that this serves as an upstream package lock file for any asset type, and that you can re-use this to rebuild your application if the upstream assets are removed, or if you are without internet connectivity.
To build just htvend, you need Go installed and then:
make
# optional, copies target/htvend to /usr/local/bin
sudo make install
When the proxy server receives a URL that is found in assets.json, then that content is served, along with any relevant headers in that file.
If it isn't found, then if invoked as htvend build, it will be fetched from upstream, or if invoked as htvend offline, a 404 not found response will be served.
This is useful for a number of reasons, including:
Some environments (such as air-gapped networks) don't have internet access. Here you can supply a directory of blobs instead.
Assets on the internet often change, and not always on a schedule that supports your team.
Assets on the internet can become unavailable due to commercial, geopolitical or other reasons (e.g. Dockerhub rate limits), or a maintainer simply deleting their repository.
Perhaps most importantly, this lets you accept changes on your schedule. If you have to make a small change to a script that lives inside of an image to address a production issue, this makes it easy to make that change without inavertently bringing in additional changes due to other upstream changes that are pulled in via an otherwise uncontrolled image build process.
This is the main tool built by this repo.
Usage:
htvend [OPTIONS] <command>
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
Available commands:
build Run command to create/update the manifest file
clean Clean various files, see htvend clean --help for details
export Export referenced assets to directory
offline Serve assets to command, don't allow other outbound requests
verify Verify and fetch any missing assets in the manifest file
Runs the passed subprocess to create/update assets.json in your current directory.
After setting up a proxy server with a self-signed certificate, it will set the relevant environment variables and execute a sub-command. If none specified, an interactive shell will be made.
See make assets.json for an example that creates a manifest for the dependcies of this project.
Usage:
htvend [OPTIONS] build [build-OPTIONS] [COMMAND] [ARG...]
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
[build command options]
-m, --manifest= File to put manifest data in (default: ./assets.json)
--blobs-dir= Common directory to store downloaded blobs in (default: ${XDG_DATA_HOME}/htvend/cache/blobs)
--cache-manifest= Cache of all downloaded assets (default: ${XDG_DATA_HOME}/htvend/cache/assets.json)
-l, --listen-addr= Listen address for proxy server (:0) will allocate a dynamic open port (default: 127.0.0.1:0)
-t, --with-temp-dir= List of temporary directories to be creating when running this command. Env vars will be be pointing to these for the
sub-process.
--set-env-var-ssl-cert-file= List of environment variables that will be set pointing to the temporary CA certificates file in PEM format. (default:
SSL_CERT_FILE)
--set-env-var-jks-keystore= List of environment variables that will be set pointing to the temporary CA certificates file in JKS format. (default:
JKS_KEYSTORE_FILE)
--set-env-var-http-proxy= List of environment variables that will be set pointing to the proxy host:port. (default: HTTP_PROXY, HTTPS_PROXY,
http_proxy, https_proxy)
--set-env-var-no-proxy= List of environment variables that will be set blank. (default: NO_PROXY, no_proxy)
--no-cache-response= Regex list of URLs to never store in cache. Useful for token endpoints. (default: ^http.*/v2/$, /token\?)
--cache-header= List of headers for which we will cache the first value. (default: Content-Type, Content-Encoding, X-Checksum-Sha1)
--force-refresh If set, always fetch from upstream (and save to both local and global cache).
--clean If set, reset local blob list to empty before running.
[build command arguments]
COMMAND: Sub-process to run. If not specified an interactive-shell is opened
ARG: Arguments to pass to the sub-process
This runs the specified sub-porcess with a proxy which only serves the contents referenced in assets.json. Anything else will return a 404 not found error.
make offline does this for this repository.
If you have unshare installed, then a good way to really verify that you are offline can be as follows:
unshare -r -n -- \
bash -c "ip link set lo up && make offline"
The unshare -r -n runs the sub-command in a new namespace with no networks. The ip link set lo up creates a loopback interface in that empty namespace so that htvend can create a server that it's sub-command can then hit.
By default all blobs are saved to and retrieved from ${XDG_DATA_HOME}/htvend/cache/blobs (XDG_DATA_HOME defaults to ~/.local/share).
A cache assets.json is also saved at ${XDG_DATA_HOME}/htvend/cache/assets.json, and this is useful during rebuilds of assets.json to avoid needing to connect to upstream servers more than neccessary.
Usage:
htvend [OPTIONS] offline [offline-OPTIONS] [COMMAND] [ARG...]
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
[offline command options]
--blobs-dir= Common directory to store downloaded blobs in (default: ${XDG_DATA_HOME}/htvend/cache/blobs)
--cache-manifest= Cache of all downloaded assets (default: ${XDG_DATA_HOME}/htvend/cache/assets.json)
-m, --manifest= File to put manifest data in (default: ./assets.json)
-l, --listen-addr= Listen address for proxy server (:0) will allocate a dynamic open port (default: 127.0.0.1:0)
-t, --with-temp-dir= List of temporary directories to be creating when running this command. Env vars will be be pointing to these for the
sub-process.
--set-env-var-ssl-cert-file= List of environment variables that will be set pointing to the temporary CA certificates file in PEM format. (default:
SSL_CERT_FILE)
--set-env-var-jks-keystore= List of environment variables that will be set pointing to the temporary CA certificates file in JKS format. (default:
JKS_KEYSTORE_FILE)
--set-env-var-http-proxy= List of environment variables that will be set pointing to the proxy host:port. (default: HTTP_PROXY, HTTPS_PROXY,
http_proxy, https_proxy)
--set-env-var-no-proxy= List of environment variables that will be set blank. (default: NO_PROXY, no_proxy)
--dummy-ok-response= Regex list of URLs that we return a dummy 200 OK reply to. Useful for some Docker clients. (default: ^http.*/v2/$)
[offline command arguments]
COMMAND: Sub-process to run. If not specified an interactive-shell is opened
ARG: Arguments to pass to the sub-process
This copies all cached blobs referred to by assets.json to a directory of your choosing. This is useful when packaging your assets to send to another environment (which may not have internet access).
make blobs runs this for the assets.json file in this repo and creates the blobs directory.
Usage:
htvend [OPTIONS] export [export-OPTIONS]
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
[export command options]
--blobs-dir= Common directory to store downloaded blobs in (default: ${XDG_DATA_HOME}/htvend/cache/blobs)
--cache-manifest= Cache of all downloaded assets (default: ${XDG_DATA_HOME}/htvend/cache/assets.json)
-m, --manifest= File to put manifest data in (default: ./assets.json)
-o, --output-directory= Directory to export blobs to. (default: ./blobs)
Iterates through all referenced and confirm they exist locally and with the correct SHA256.
If --fetch is set, it tries to fetch anything missing.
If --repair is set, then the local manifest is updated if the content has changed since.
make blobs runs this for the assets.json file in this repo, it also runs htvend export.
Usage:
htvend [OPTIONS] verify [verify-OPTIONS]
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
[verify command options]
--blobs-dir= Common directory to store downloaded blobs in (default: ${XDG_DATA_HOME}/htvend/cache/blobs)
--cache-manifest= Cache of all downloaded assets (default: ${XDG_DATA_HOME}/htvend/cache/assets.json)
-m, --manifest= File to put manifest data in (default: ./assets.json)
--no-cache-response= Regex list of URLs to never store in cache. Useful for token endpoints. (default: ^http.*/v2/$, /token\?)
--cache-header= List of headers for which we will cache the first value. (default: Content-Type, Content-Encoding, X-Checksum-Sha1)
--fetch If set, fetch missing assets
--repair If set, replace any missing assets with new versions currently found (implies fetch). May still require a rebuild afterwards (e.g.
if they trigger other new calls).
Removes any dangling blobs (ie not referred to by global assets.json cache) from global cache blobs directory.
Pass --all to remove entire global cache.
Usage:
htvend [OPTIONS] clean [clean-OPTIONS]
Application Options:
-C, --chdir= Directory to change to before running. (default: .)
-v, --verbose Set for verbose output. Equivalent to setting LOG_LEVEL=debug
Help Options:
-h, --help Show this help message
[clean command options]
--blobs-dir= Common directory to store downloaded blobs in (default: ${XDG_DATA_HOME}/htvend/cache/blobs)
--cache-manifest= Cache of all downloaded assets (default: ${XDG_DATA_HOME}/htvend/cache/assets.json)
--all If set, remove entire shared global cache.
Frequently asked questions
Can this work with building Docker / OCI images?
Yes. Packaging software into OCI Images is a very useful way to distribute software.
Further using a Dockerfile to populate assets.json is an excellent way to ensure that a build is done from scratch (that is, it pulls through all needed assets) and thus is a great way of producing a canonical assets.json file.
Isn't go mod vendor a better solution for Go code?
Yes it is. We use the assets.json in this repo as an example only - not all languages are as good as Go.
Why is this needed, can't we just ship built images around?
Shipping built images around might work well for your use-case.
This tool recognises that many projects end up being a combination of public upstream images / packages / assets and private application source code.
The intent is to help make it easier to make changes to the private application part without pulling in any other changes from the internet.
Can specialised pull through caches like Artifactory and Nexus serve the same purpose?
Yes, they likely can. However they can be tricky to setup and may require specialist configuration for each package type (e.g. Maven vs Docker vs apt vs Python) and modification of each Dockerfile to use.
This project tests the hypothesis that we can do this at a simple HTTP layer.
Is enterprise support available?
Yes. Please contact [email protected] for information and pricing for enterprise support by our Australia-based local team.