DEV Community

TaiKedz
TaiKedz

Posted on • Edited on

Shell/Python vs Go - practical thoughts form a reimplementation project

This is perhaps not your typical "Golang vs (language)" post... I've seen it said in various places (citations lost, please roll with it) that Go is a good replacement for shell scripting. At first, I was highly skeptical - Go is a compliled, stricly typed language with more boilerplate to "Hello world" than needed in any scirpting language. But as a compiled language, it does make sense for portability, and as a properly and strictly typed language, it makes performing complex operations easier to implement... so it was worth trying out.

TL/DR:

  • Go is a credible alternative to python and shell, for me
  • I did Alpacka/NG as an experiment to convert form shell to Go
  • Publishing Go modules for sharing code is super easy
  • Establishing supply chain trust is an issue
  • Shell and Python still definitely have their places

Alpacka, as a bash script

Several years ago when I was still a bright-eyed and bushy-tailed distro-hopper, I kept on running into the same problem: "which package manager command and keyword combination do I need on this distro again??" I wrote a simple script I called Alpacka and whose command I rendered as paf (naming choice lost to history at this point) to help me. It supported a variety of package managers, and is still available on gitlab as Alpacka

It had however three notable drawbacks:

  • It depended on the local commands, which varied in flavour and version over distro and distro versions, of which bash
  • Bash's syntax is not oriented to complex data structure manipulation
  • Predictable modularisation of source code is hard without workarounds

The version issue was multi-fold: not only was it a sequential version problem (1.0, 1.1, 2.0, etc) but also an implementation problem (BSD variant, GNU variant, Busybox variant) for which outputs and flags would often vary in key places. Depending on given behaviours and outputs of these would be tricky sometimes, and limit the available operations to the common base denominator. At the time for example, I think Busybox did not fully implement Perl Compatible Regular Expressions (PCRE) in its grep implementation (or was it sed?) so implementing certain actions was difficult, if not downright impossible.

Shell scripting limitations

The lack of rich types like maps, and the inability to return arrays (need to pass around in global variables, yuck, pollutes variables space), prevented things like function recursion (possible, with hoops, caveats, boilerplate and metaprogramming) and key-value lookup (perhaps possible, but too tedious to warrant implementing). For example:

greetings() {
    for name in "$@"; do
        echo "Hail $name!" # no return - just echo to stdout
    done
}

# use `()` notation to force an array, split on whitespace
phrases=($(greetings Alex Jay Sam))

# Access an array using `"${varname[@]}"`
# quotes and extra chars mandatory - else various alternative effects may happen
for phrase in "${phrases[@]}"; do
  echo "We say, $phrase"
done

# Problem - prints:
# We say, Hail
# We say, Alex!
# We say, Hail
# We say, Jay!
Enter fullscreen mode Exit fullscreen mode

To get away from the effect of the automatic splitting on whitespace, we must use a global variable array

greetings() {
  # also, make `name` non-global (all vars are global by default)
  local name

  # Set a global variable for the caller to access afterwards
  PHRASES=(:) # old bash compatibility
  # arrays cannot be empty, so this array is initialized with a single member, the string ":"

  for name in "$@"; do
    PHRASES=("${PHRASES[@]}" "Hail, $name!")
  done
}

greetings Alex Jay Sam

# Use array items from 1, upwards: `${varname[@]:1}`
# to blat the initial placeholder
for phrase in "${PHRASES[@]:1}"; do
    echo "We say, $phrase"
done

# prints
# We say, Hail, Alex!
# We say, Hail, Jay!
# We say, Hail, Sam!
Enter fullscreen mode Exit fullscreen mode

Does this make shell scripting "inferior"? No! The whole point of POSIX and POSIX-like shells is to be command-driven, and scripting enables re-running commands with some extra flair. I've talked about this before. Look at this:

# a contrived example, point is about the processing of program outputs, command-by-command
docker images | grep '<none>' | sed -r 's/  +/ /g' | cut -d' ' -f3 | xargs docker rmi
Enter fullscreen mode Exit fullscreen mode

which

  • runs docker images
  • looks for <none>-labelled images
  • extracts the third field (the ID) line by line
  • and converts the above into arguments for docker rmi

When you're familiar with shell syntax, the above is easy to write - I did it in under 20sec. Doing the same with a more generalist programming language would take at least multiple lines, and if you wanted to keep it maintainable, a couple functions too.

But for rich data processing, shell scripting is not typically the best choice.

As for other gripes ...

For the modularisation issue, in the script-based version of Alpacka I leveraged another tool I had, the "bash builder" suite, which transpiles a custom syntax and resolves files from a BBPATH library path, into a single deployable bash script. Again, I had to code this solution as a workaround to limitations with shell scripts. Again, shell scripts have idiosyncracies for gearing towards shell usage, not large-codebase usage. It has been fun to implement, but limitations still remain.

Shell scripting argument parsing is pretty convoluted - you can use the opts utility which behaves in mind-contorting ways, or reference positionals - a function's arguments are accessed the same way the script's arguments are, via $1, $2, etc which prevents a function from directly accessing script args. It's possible - but there's a lot of faff to get it to work.

Thus, the Alpacka/NG Go rewrite was born.

Why not python?

I've written a fair bit of tooling python. It would have been my go-to for this kind of project (after my "bash as main language" phase...), but not for this.

One of the issues with shell scripting remains with python: versioning, and command availability. At least it's not a question of variant usually, but between a newer script on older python (feature not available) or an older script on newer python (feature deprecated and removed), there's plenty of external dependency-ing that can go wrong.

At work we got around this by using pyenv , which went some way to working around the issue, but in some instances, pyenv would not even install (distro too old!)

Compiling to binary solves this issue by just not having runtime dependencies (or so I thought - more on that later....!). But it holds by and large.

This is more of an issue when trying to ship code to machines not under you control - in an enterprise environment, mandating specific distro versions and python versions etc is easier. Developing for the wilderness is another matter, and is the use case I have in mind for Alpacka - so it tracks. And if I need to install things first to support Alpacka correctly, purpose is defeated!

Enter Go

Over the last few months I have been toying around with Go and only in the last month been actively trying to use it fully. Re-writing Alpacka was a good fit for me because

  • it compiles to single deployable file. I want to be able to just download it and start using it, no environment set up
  • I want a set of package lists (package spec file) that apply under conditions, and I want alpacka to take care of it

The single-binary from Go answers the first, and the fact that I could not have the packages spec was due to the lack of complex types in Bash.

Package spec

The spec I wanted to implement for looks something like this:

alpacka:
  variants:
  - release: VERSION_ID<=22, ID=~fedora
    groups: common, fedora
  - release: ID_LIKE=~debian
    groups: common, debian
  - release: ID_LIKE=~suse
    groups: common, debian

  package-groups:
    common:
    - mariadb
    debian:
    - apache2
    fedora:
    - httpd
Enter fullscreen mode Exit fullscreen mode

Note the catering to different names on different distros, and common names to share on all distros. Handy when hopping and wanting everything covered, right?

How it went

Sub processes

First off, running a sub-command in Go is pretty easy - a little more involved than in shell, no better or worse than in python. Because I did it a lot in this script (calls to package managers), I wrapped it in its own struct with some niceties: runner.go :

// .OrFail() comes from the Result object, whih allows a shorthand exit-now to cut down on verbosity
// -- because these calls would happen A LOT.
RunCmd("apt-get", "install", "-y", "htop").OrFail("Could not run installation!")
Enter fullscreen mode Exit fullscreen mode

I didn't end up needing to pipe anything - but rather than "grep" and "cut" and "sed", instead we have actual programming functions that can operate on data types. It is indeed possible to use a subshell and write a command pipe, but I did not have need for it.

Argument parsing

Next up was parsing arguments. I've written before about some of my basic frustrations with out-of-the-box Go, one of which is around argument parsing. I've since solved this issue for myself in the form of my GoArgs package which offers short-flags and long-flags, positional argument unpacking, and additional handy options such as mode-switched flags.

I know it's a solved problem, but I wrote my own argument parser myself because

  1. supply chain security considerations - if I can write it myself, why open myself up to supply chain attacks
  2. go language self-learning - re-writing Alpacka was not my first Go-venture ;-)
  3. go version releasing - I wanted to learn how to release a go module for re-use (and I did!)
  4. customisation - I had a specific idea as to what features I wanted, and what API I wanted, and one is best served by oneself. I did have to add customisations as I went about writing Alpacka-ng

Publishing a Go module via hosted git repo

Publishing a Go module is as simple as using version control (typically Git, not sure what else is supported) and tagging a commit with v<x>.<y>.<z> - explicitly it needs the v in the front, and three numbers. Go does the rest, and retains a hash for repeat-deployment verification. Note that the library project needs to declare itself with its intended publishing URL: set it up with go mod init github.com/user/project

A project depending on that module then just needs to add as it usually would - e.g.

go get github.com/user/[email protected]
Enter fullscreen mode Exit fullscreen mode

... and then start using it

import (
    "github.com/user/project"
)

func main() {
    project.CallFunction()
}
Enter fullscreen mode Exit fullscreen mode

For comparison, Python has pip/conda/uv for adding external modules. Publishing them uses a setup.py which is unwieldy, but so long as you have a template for re-use, it's easy enough. You can add similarly https://github.com/user/project@tag to a requirements.txt file to directly depend via git repo.

Yaml and Supply Chain Trust

Finally, I could implement the package list spec which I designed around YAML. First issue was to find a YAML package because Go does not ship one by default in the standard library. I've found that many languages do this - support JSON, but not YAML. Why? I know JSON is a very useful representation to serialise data, but YAML is a much more comfortable format for similarly serialised data that also needs to be human editable.

Plenty of tutorials exist online, all pointing to gopkg.in/yaml.v3 without a hint as to explaining

  • what is gopkg.in? why should I trust it?
  • who wrote yaml/v3? why should I trust them?

After some digging I found that the official Go project itself recognises gopkg.in even so far as to making certain accommodations for its version notation. This is (at least in part) due to the fact that the site was the main solution for consistent package naming and version notating before the Go project implemented modules. It is a redirector that points to Github repos with a distinct short URL. Repos with a gopkg.in/<PKG>.<VER> can be seen as an "official"/"blessed" package, whereas gopkg.in/<USER>/<PKG>.<VER> are "unofficial"/"we don't verify these guys" sources. gopkg.in/yaml.v3 is an "official" package then, although the Github project it points to has been archived (read-only) since April 2025 (this year). Who knows what shall happen next...

The official contents are published by Canonical Inc, the company behind Ubuntu. Based on this sleuthing, I was finally able to satisfy that the package was not completely random, and my "trust" in the "supply chain" is... acceptable. I now have a target, and a hash, but I'm pretty sure that can be circumvented by a later update on the v3 branch if the redirect gets modified... ah life on the open Internet...

Supply chain attacks are on the rise, the NPM leftpad incident not only highlighted the yanking of packages, but the ability of anyone to subsequently takeover the name of de-published packages. Similarly, typo-squatting can lead to compromises lurking in the deep or just outright hosing a system.

Beyond that, Unmarshalling Yaml/Json is a tad more cumbersome than parsing it in Python and descending dictionaries, but it does have the advantage of ensuring types all the way down.

And then... I was done.

Checking it on Alpine

One of the compile targets was Alpine, though I ended up dropping this target since a/ Alpine's package manager is easy and b/ it is usually only used in Dockerfile specs/IaC in the first place.

I discovered in this exercise however that binaries compiled on a GNU/Linux will not run on Alpine, due to linkages against different system runtime modules (different libc implementations I think?). Huh. A runtime dependency in a "statically" compiled binary. There are nuances to everything after all.

The result then being that an Alpine version of a Go program must be compiled in an Alpine environment, otherwise you get a cryptic bin/paf: not found error, despite the file being indeed there...! Using containers for build helped with that.

Conclusion

Looking back at my work, it seems I have been able to implement this over the course of 8 days, a couple hours per day (and one full day in the mix too). I became more at-ease and fluent with the language as time progressed as one would expect as well. Could I have done this faster in python? Probably, but I would still have a runtime dependency. Could I have done it in bash scripting? With the added manifest, not a chance.

It feels to me that Go is a credible alternative to both shell scripting and python given the requirements. However, I would still say that, with fluency, one would go faster with shell scripts for simple items (the build and install scripts are still shell), and faster in python for "throw it together" operations.

Only You, however, can determine what's throwaway. At the risk of coining an abomination:

💩 Since bad code obviously needs rewritten, write bad python to get going; and if it must indeed grow: scrap it, and write good Go in its place. /jk

😱

Happy coding!

Top comments (1)

Collapse
 
taikedz profile image
TaiKedz

That last quip... it's a joke folks ...! Don't write bad code on purpose, pleeeaase!!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.