News

My campaign to produce Shakespeare's Sonnets: A Graphic Novel Adaptation needs your help! Please sign up at https://www.patreon.com/fisherking for access to exclusive content and the opportunity to be a part of the magic!

I'm also producing a podcast discussing the sonnets, available on
industrial curiosity, itunes, spotify, stitcher, tunein and youtube!
For those who prefer reading to listening, the first 25 sonnets have been compiled into a book that is available now on Amazon and the Google Play store.

Wednesday, 5 May 2021

Crypto Matters - Just Not For The Reasons You Might Think

Photo by Suzy Hazelwood from Pexels

I watched Bill Maher’s recent diatribe against crypto a day or two ago, and suddenly my feeds seem to be filled with people decrying blockchain phenomena like NFTs as pyramid schemes and nonsense.

They’re not entirely wrong.

Fiat vs Crypto

It’s important, however, to take a good, long look at our existing “fiat” currencies before taking potshots at a technology that is fundamentally the same, but easily better in a myriad of ways.

Let’s begin by defining money, a fiction that enables us to transact across domains. It’s a fiction that’s sufficiently decoupled from reality that we can make fair transactions where that wouldn’t otherwise be possible: it’s not easy to determine the value of an item of clothing in coconuts, or units of electricity. Once upon a time the value of money was tied to scarce natural resources, but for decades it’s been completely artificial, controlled and manipulated by organizations and forces that generally do not have “the greater good” at heart.

Cryptocurrency, on the other hand, by design has no master. Anyone can mine, anyone can play. If I can earn it, I can spend it, and the requirements for setting up a wallet and transacting are so minimal that the most basic of smartphones can handle it with ease. It’s also much more complicated to steal from someone than cash, and nobody needs a bank to let them participate in the economy, or to rob them of huge portions of their paycheques when sending funds to their families back home.

Fantasy vs Reality

Over the past ten years the idea of cryptocurrency has been creeping into the collective conscious, and the enthusiasts who “get it” have been working tirelessly to usher in an envisioned utopia in which we all transact in a wide variety of crypto tokens, where nobody is “unbanked”, a world in which our governments and credit card companies no longer enjoy the leverage they currently have and we can live our lives in a virtual-cash-based economy where privacy reigns and nobody can freeze our bank accounts or make up silly fees and charges for using them.

A world where nobody can “cook the books” because everything is written into an open ledger. A world where reliable, secure, anonymous voting mechanisms are built in to the very fabric of the networks we use.

These dreams are all very well, but they clearly have not materialized… yet. For more than a decade Bitcoin has been considered the literal and figurative “gold standard” of crypto, and where its popularity meets with somehow unanticipated greed we see the energy invested to mine Bitcoin exceeding that of small countries. Ethereum arrived later on the scene with its promise of smart-contracts, an incredible innovation that opens up fin-tech and safe remittance, micropayments and the ad-free production and consumption of content… but transaction volumes are severely limited, encoded in its ridiculously high “gas fees” that make it impractical to make transfers of anything less than small fortunes.

This is not a time to use crypto. This has been a great time to speculate about crypto, as evidenced in the crazy bubbles of the past couple of years, but this is not a time to use crypto.

The Irony

At present, there simply isn’t inherent value in crypto. Money isn’t worth anything if you can’t buy things with it. Most of the engineers who work with crypto are biding their time building wallets and exchanges because that’s what the market will pay them for, but that’s not what makes them excited about crypto. In fact, hoarding and HODLing are holding crypto back from its true purpose — seamless traceless borderless digital payments for everyone — which means that the behaviour of investors is actually preventing crypto from developing the inherent value that speculators have been banking on!

Working vs Staking

For those of you who aren’t familiar: the underlying reason why blockchain mining is so power-hungry, why transaction volumes are so limited and fees so high, is because the mechanism that protects the blockchain is what’s known as “Proof of Work”. To make it nigh-impossible to cheat the system and manipulate the blockchain, miners are required to perform computationally expensive calculations that are simple to validate, and whoever succeeds first achieves the right to write [sorry] the transaction block.

Proof of Work is an extremely clever concept that made perfect sense ten years ago but, sadly, its creator(s) never foresaw just how poorly it would scale.

The New Thing in blockchain tech is Proof of Stake, and by “new” I mean almost as old as blockchain technology itself but not implemented where it matters most. Unlike Proof of Work, Proof of Stake requires “staking” your crypto to buy the right to validate the transactions — in Ethereum’s case, stake 32 ETH and you get to play miner, only you get paid for doing your part without having to set the Earth on fire. Or your brain.

For a (in technological terms) long time Ethereum has been promising to evolve to Ethereum 2.0, but the first real measures were only put in place towards the end of 2020 and according to today’s news things are finally speeding ahead towards this Brand New Day.

Where to with crypto?

After all this preamble, what’s the real takeaway?

It doesn’t matter whether Bitcoin’s value hits $100,000, $1,000,000, or crashes and burns and hits $1, nor does it matter what a single Ether is valued at. It doesn’t matter if you bought in early and made your fortune, or if you missed the boat completely and even now believe it’s too late for your first foray into crypto (it’s not).

What does matter is that crypto has a function, and that function is desperately needed these days, especially for the billions of people who aren’t being served by the existing financial institutions. Personally, I cannot wait for a time when I can be paid and pay safely and instantly, whether for groceries, rent or coffee, and the idea of being able to transact outside of my government’s reach is hugely empowering. I‘m excited that we’re so close to money markets that are fair and inherently non-discriminatory. I’m excited to start diving in to new tech that solves the currently-inconceivable problems of living in societies that don’t run on borders and taxes.

Things may get weird (like the current NFT craze) while we learn how to use crypto, but with a brief look back over our shoulders it becomes apparent that no technology ever got introduced without us experiencing some kind of adjustment phase.

At least, I hope people’s obsessions with selfies is just a phase.

Monday, 5 April 2021

Choosing the right password manager to keep your secrets safe

Photo by George Becker from Pexels

If you’re not using a password manager by now, you should be. Ever since reading the xkcd: Password Strength comic many years ago, I’ve become increasingly frustrated by how the software industry has continued to enforce bad password practices, and by how few services and applications apply best practices in securing our credentials. 

The main reason for password reuse or using poor passwords in the first place is because it’s way too hard to remember lots of good ones.

By forcing us to remember more and more passwords with outdated rules such as demanding symbols, numbers and a mix of uppercase and lowercase characters, most people have turned to using weak passwords, or reusing the same passwords or patterned recombinations of those passwords and leaving us vulnerable to simple exploits.

I recently learned about ‘; — have i been pwned?, and I was shocked to discover that some of the breaches that included my personal data included passwords that I had no idea were compromised… for years. Then I looked up my wife’s email address, and together we were horrified.

Lots of those compromised credentials were on platforms we didn’t even remember we had accounts on, so asking us what those passwords were and whether we’ve reused them elsewhere is futile.

A developer’s perspective

As an experienced software engineer, I understand just enough about security to be keenly aware of how little most of us know and how important it is to be familiar with security best practices and the latest security news in order to protect my clients.

I will never forget that moment a few years back when, while working for a well-established company with many thousands of users and highly sensitive data, I came across their password hashing solution for the first time: my predecessor had “rolled his own” security by MD5 hashing the password before storing it… a thousand times in a loop. As ignorant as I was myself regarding hashing, a quick search made it clear that this was making the system less secure, not more.
This was a professional who thought he was caring for his customers.

In 2019 I put together an open-sourced javascript package, the simple-free-encryption-tool, for simple but standard javascript encryption that’s compatible with C#, after finding the learning curve for system security to be surprisingly steep for something so critical to the safe operations of the interwebs.

The biggest takeaways from my little ventures into information security are as follows:

  1. Most websites, platforms and services that we trust with our passwords cannot be relied upon to protect our most sensitive information.
  2. Companies should not be relying exclusively on their software developers to protect customer credentials and personal data.
  3. As a consumer, or customer, or client, we need to take responsibility for our passwords and secrets into our own hands.
  4. Trust (almost) no-one.

What’s wrong with writing down my passwords on paper?

It’s so hard to remember and share passwords that lots of people have taken to recording them on sticky notes, or in a notebook, and I cannot stress enough just how dangerous a practice this is.

First, any bad actor who has physical access to your desk or belongings and (in their mind) an excuse to snoop on you or hurt you, will generally be privy to more of your personal data than some online hacker who picks up a couple of your details off an underground website. This means that it will be far easier for them to get into your secrets and do you harm.

Second, and far more likely, if those papers are lost or damaged you’re probably going to find yourself in hot water. For example, I’ve run into trouble with my Google credentials before and locked myself out of my account, and even after providing all the correct answers it was still impossible for me to get back in. There are many faceless services like this, so even a simple accident (or just misplacement) and you could find yourself in a very uncomfortable position.

What is a password manager?

A password manager is an encrypted database that securely stores all of your secrets (credentials or others) and enables you to retrieve them with a single set of credentials and authentication factors. Modern password managers tend to provide the ability to synchronize these databases on multiple devices and even inject your credentials directly where you need them.

Things to consider when picking a password manager

Standalone, cloud-based, or self-hosted

For individuals who aren’t prepared to trust the internet (or even their local networks) with their secrets, there are password managers that are designed to be stored and accessed locally. These are essentially interfaces to encrypted database files that reside on your local hard disk, and you are responsible for backing them up and copying them between devices. A word of caution: if you’re synchronizing these databases by uploading them to a file sharing service like Dropbox, you’re operating in a way that’s likely less secure than using a cloud-based service.

Cloud-based solutions are services provided by an organization that allows you to store your secrets on their platforms and trust in their experts to secure them. While user costs may vary, they don’t require any effort when it comes to maintenance, syncing between devices and backing up and they usually provide great interfaces with integrations for desktops, browsers and mobile phones.

An important aspect to take into consideration when it comes to cloud-based solutions is the provider’s reputation and history of breaches. Nobody’s perfect in the world of security — security is a perpetual arms race between the white hats and the black hats — but what speaks volumes is how an organization comports itself when things go wrong. Do they consistently apply best practices and upgrades? Do they react to breaches quickly, transparently, and in their clients’ best interests?

Self-hosted solutions are where you or your organization are required to install and maintain the service on a web server, preferably on a secure internal network, so that your users (your family or coworkers) can operate as if it’s a cloud-based solution. These are generally cheaper for businesses, but somewhat more difficult to maintain and often less secure than cloud-based solutions (depending on the competence of whoever’s responsible for your network), but from a user’s point of view it amounts to the same thing.

Password sharing for family and teams

Some people need to share credentials more than others. In my family, my wife and I are consistently sharing accounts so it doesn’t make sense for us to have individual duplicate copies of our shared accounts in each of our password accounts, and the same goes for me and my coworkers when it comes to our developer and administrator passwords for some of our products and service accounts. For these uses, it’s a good idea to use a solution that facilitates password sharing, and some of the services make it easy to set up groups and group ownership of credentials.

Mobile, OS and desktop browser support

Many password managers provide varying levels of integration for the wide variety of devices and browsers available — some solutions simply won’t give you any more than the barest essentials. Some people prefer to be able to unlock their passwords using biometrics, some prefer not to use their mobile devices at all, so before looking at the feature comparisons it’s worth giving a minute or two of thought towards how you intend to use it.

The good news is that most of the major solutions allow exporting and importing of your secrets, so if you have any doubts about your decisions you probably won’t have to worry too much about being locked in.

Free vs Paid

While pricing is obviously an important factor, I feel like one should first have an idea of what features one needs before comparing on pricing. Most of the solutions offer similar prices per user, with some exceptions.

This is one of those rare situations where, depending on your requirements, you might actually be better off with a free product!

The Feature Comparison

Standalone, cloud-based, or self-hosted

Password sharing for family and teams

Mobile, OS and desktop browser support

Free vs Paid

Summary

With the wide variety of needs and options available, each solution listed above has its benefits and its tradeoffs. I hope you’ve found this helpful, if you have any questions, corrections, comments or suggestions I look forward to reading them in the comments below!

Friday, 2 April 2021

Simple safe (atomic) writes in Python3

 In sensitive circumstances, trusting a traditional file write can be a costly mistake - a simple power cut before the write is completed and synced may at best leave you with some corrupt data, but depending on what that file is used for you could be in for some serious trouble.

While there are plenty of interesting, weird, or over-engineered solutions available to ensure safe writing, I struggled to find a solution online that was simple, correct and easy-to-read and that could be run without installing additional modules, so my teammates and i came up with the following solution:

Explanation:

temp_file = tempfile.NamedTemporaryFile(delete=False,
                                        dir=os.path.dirname(target_file_path))

The first thing to do is create a temporary file in the same directory as the file we're trying to create or update. We do this because move operations (which we'll need later) aren't guaranteed to be atomic when they're between different file systems. Additionally, it's import to set delete=False as the standard behaviour of the NamedTemporaryFile is to delete itself as soon as it's not in use.

# preserve file metadata if it already exists
if os.path.exists(target_file_path):
    copyWithMetaData(target_file_path, temp_file.name)

We needed to support both file creation and updates, so in the case that we’re overwriting or appending to an existing file, we initialize the temporary file with the target file’s contents and metadata.

with open(temp_file.name, mode) as f:
    f.write(file_contents)
    f.flush()
    os.fsync(f.fileno())

Here we write or append the given file contents to the temporary file, and we flush and sync to disk manually to prepare for the most critical step:

os.replace(temp_file.name, target_file_path)

This is where the magic happens: os.replace is an atomic operation (when the source and target are on the same file system), so we're now guaranteed that if this fails to complete, no harm will be done.

We use the finally clause to remove the temporary file in case something did go wrong along the way, but now the very worst thing that can happen is that we end up with a temporary file ¯\_(ツ)_/¯


Saturday, 13 February 2021

Approaching Software Engineering as an evolutionary process

A year or two ago I had an opportunity to sit down with AWS's Marcin Kowalski in a cafeteria and discuss the problems of software development at almost-unimaginable scale. I walked away with a new (for me) conception of software engineering that is part engineering, part organic biology, and I've found this perspective has shifted my approach to software development in a powerful and immensely helpful way.

As Computer Scientists and Software Engineers, we've been trained to employ precision in algorithm design, architecture and implementation: Everything must be Perfect. Everything must be Done Right.

For smaller, isolated projects, this engineering approach is critical, sound and practical, but once we begin to creep into the world of integrated solutions and micro-services it rapidly begins to break down.

Saturday, 23 January 2021

Towards a Python version of my Javascript AWS CDK guide

After successfully using a Typescript CDK project to deploy a python lambda on Thursday, I decided to spend some time this evening creating a Python CDK guide. It's very limited at the moment (just a simple function and a basic lambda layer), but it's a start!

Sunday, 25 October 2020

Keeping Track of your Technical Debt

The impact of technical debt

Over the years the concept of "technical debt" has become a phrase that can generate anxiety and a lack of trust, as well as setting up developers and their managers for failure. The metaphor might not be perfect (Ralf Westphal makes a strong case for treating it like an addiction), but I feel it's pretty apt if you think of the story of The Pied Piper of Hamelin - if you don't pay the piper promptly for keeping you alive, he'll come back to steal your future away.

Maybe I'm being a bit dramatic, but I value my time on this planet and over the course of two decades as a software developer in various sectors of the industry I have (along with countless others) paid with so much of my own precious lifetime (sleep hours, in particular) for others' (and occasionally my own) quick fixes, rushed decisions and hacky workarounds that it pains me to even think about it.

It's almost always worthwhile investing in doing things right the first time, but we rarely have the resources to invest and we are often unable to accurately predict what's right in the first place.

So let's at least find a way to mitigate the harm.

Why TODOs don't help

The traditional method of initiating technical debt is the TODO. You write a well-meaning (and hopefully descriptive) comment starting with TODO, and "Hey, presto!" you have a nice, easy way to find all those little things you meant to fix. Right? Except that's usually not the case. We are rarely able to make time to go looking for more things to do, and even when we can, with modern software practices it's unlikely that searching across all of our different repositories will be effective.

What generally ends up happening, then, is that we only come across TODOs by coincidence, when we happen to be working with the code around it, and the chances are that it'll be written by a different developer from a different time period and be explained in... suboptimal language.

"Why hasn't this been done?", you may well ask.

"Leave that alone! We don't remember why it works," you may well be told.

Track your technical debt with this one simple trick!

No matter the size of your team (even a team of one), you should always be working with some kind of issue tracking software. There's decent, free software out there, so there's really no excuse.

The fix? All you need to do is log a ticket for each TODO. Every time you find yourself writing a TODO, and I mean every single time, create a ticket. In both the ticket and the TODO comment, explain what you're doing, what needs to be done, why it needs to be deferred, and how urgent completing the TODO task is.

Then reference the ticket in the TODO comment.

This way you'll have a ticket - which I like to think of as an IOU - which can be added to the backlog and remembered when grooming and planning. This also provides the developer who encounters the TODO in the code a way to review any details and subsequent conversations in the ticket.

One interesting side effect of this approach? I often find that it's more effort to create the TODO ticket than to do the right thing in the first place, which can be a great incentive for the juniors to avoid the practice.

Another?


After establishing with my team that I will not approve a TODO unless there's a ticket attached, I must admit... I do sleep just a little bit better at night.

Saturday, 24 October 2020

Reading and writing regular expressions for sane people

Your regular expressions need love. Reviewers and future maintainers of your regular expressions need even more.

No matter how well you've mastered regex, regex is regex and is not designed with human-readability in mind. No matter how clear and obvious you think your regex is, in most cases it will be maintained by a developer who a) is not you and b) lacks context. Many years ago I developed a simple method for sanity checking regex with comments, and I'm constantly finding myself demonstrating its utility to new people.

There are some great guides out there, like this one, but what I'm proposing takes things a step or two further. It may take a minute or two of your time, but it almost invariably saves a lot more than it costs. I'm not even discussing flagrant abuse or performance considerations.


Traditional regex: the do-it-yourself pattern


The condescending regex. Here you're left to your own devices. Thoughts and prayers.

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

(example taken from this question)


Kind regex: intention explained


It's the least you can do! A short line explaining what you're matching with an example or two (or three).

// parse a url, only capture the host part
// eg. protocol://host
//     protocol://host:port/path?querystring#anchor
//     host
//     host/path
//     host:port/path
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;


Careful regex: a human-readable breakdown


Here we ensure that each element of the regex pattern, no matter how simple, is explained in a way that makes it easy to verify that it's doing what we think it's doing and can modify it safely. If you're not an expert with regex, I recommend using one of the many available tools such as regexr.com.

// parse a url, only capture the host part
//     (?:([A-Za-z]+):)?
//         protocol - an optional alphabetic protocol followed by a colon
//     (\/{0,3})(0-9.\-A-Za-z]+)
//         host - 0-3 forward slashes followed by alphanumeric characters
//     (?::(\d+))?
//         port - an optional colon and a sequence of digits
//     (?:\/([^?#]*))?
//         path - an optional forward slash followed by any number of
//         characters not including ? or #
//     (?:\?([^#]*))?
//         query string - an optional ? followed by any number of
//         characters not including #
//     (?:#(.*))?
//         anchor - an optional # followed by any number of characters
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

Now that we've taken the time to break this down, we can identify the intention behind the patterns and ask better questions: why is the host the only matched group? Was this tested? (Because (0-9.\-A-Za-z] is clearly an error, and there are almost no restrictions on invalid characters)

Unless you're a sadist (or a masochist), this is definitely a better way to operate: be careful, and if you can't be careful then at least be kind.

Wednesday, 19 August 2020

Priority pinning in apt-preferences with different versions and architectures

I'm posting this because I've lost too many hours figuring it out myself, the documentation is missing several important notes and I haven't found any forum posts that really relate to this:

Question: How do I prioritize specific package versions for multiple architectures? In particular, I have a number of different packages which I would like to download for multiple architectures, and I would like to prioritize the versions so that if they're not explicitly provided, apt will try to get a version matching my arbitrary requirements (in my case, the current git branch name) and fall back to our develop branch versions.

eg. I would like to download package my-package for both i386 and amd64 architectures and I would like to pull the latest version that includes my-git-branch-name before falling back to the latest that includes develop.

Answer:

The official documentation is here.

1. In order to support multiple architectures, all packages being pinned must have their architecture specified, and there must be an entry for each architecture. A pinning for the package name without the architecture specified will only influence the default (platform) architecture:

Package: my-package

Pin: version /your regex here/

Pin-Priority: 1001

2. The entries are whitespace-sensitive, although no errors will be reported if you have whitespace. The following pinning will be silently disregarded:

Package: my-package:amd64

    Pin: version /your regex here/

    Pin-Priority: 1001

3. apt update must be called after updating the preferences file in order for them to be respected and after adding additional architectures using (for example) dpkg --add-architecture i386

The following excerpt from /etc/apt/preferences solves the stated problem:


Package: my-package:amd64

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:i386

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:amd64

Pin: version /-develop-/

Pin-Priority: 900


Package: my-package:i386

Pin: version /-develop-/

Pin-Priority: 900 

It may be worthwhile noting that to download or install a package with a specified architecture and version use the command apt download package-name:arch=version

Sunday, 16 August 2020

Enabling Programmable Bank Cards To Do All The Things

A little while back, Offerzen and Investec opened up their Programmable Bank Card Beta for South Africans, taking over from Root, and I was excited to be able to sign up!

My long-term goal with programmable banking is to integrate crypto transactions directly with my regular banking - which might be possible quite soon as Investec is starting to roll out its open API offering in addition to the card programme - and as far as I can tell that could be a real game-changer in terms of making crypto trading practical for non-investment purposes.
When I joined, though, what I found was a disparate group of really interesting projects and only a two-second time window to influence whether your card transaction will be approved or not.

I decided to bide my time by building a bridge for everyone else's solutions, one which would provide a central location to determine transaction approval and enable card holders (including myself, of course) to forward their transactions securely to as many services as they like without costing them running time on their card logic.

It was obvious to me that this needed to be a cloud solution, and as someone with zero serverless experience1 I spent some time evaluating AWS, Google and Microsoft's offerings. My priorities were simple:
  • right tool for the job
  • low cost
  • high performance
  • good user experience
In the end AWS was the clear winner with the first two priorities, so much so that I didn't even bother worrying about any potential performance tradeoffs (there probably aren't any) and I was prepared to put up with their generally mixed bag of user experience. I also had the added incentive that my current employer uses AWS in their stack so I would be benefiting my employer at the same time.

Overall, I'm very glad that I chose AWS in spite of the trials and tribulations, which you can follow in my earlier post about building a template for CDK for beginners (like myself). In addition to learning how to use CDK, this project has given me solid experience in adapting to cloud architecture and patterns that are substantially different from anything I've worked with before, and I've already been able to effectively apply my learnings to my contract work.

One of the obligations of the Programmable Bank Card beta programme is sharing your work and presenting it; I was happy to do so, but for months I've been locked down and working in the same space as my wife and four year-old (we don't have room for an office or much privacy in our apartment); with all the distractions of home schooling and having to work while my wife and kid play my favourite video games I've barely been able to keep up my hours with my paid gig, so making time for extra stuff? That hasn't been so easy.

A couple of weeks ago my presentation was due, and I spent a lot of the preceding two weekends in a mad scramble to get my project as production-ready as I could - secure, reliable, usable. Half an hour before "go time" I finally deployed my offering, and I was a little bit nervous doing a live demo... I mean, sure, I'd tested all the end points after each deployment - but as we all know: Things Happen. Usually During Live Demos.

Fortunately, the presentation went very well! I spent the following weekend adding any missing "critical" functionality (such as scheduling lambda warmups and implementing external card usage updates), and I'm hoping that some of my fellow community members get some good use out of it (whether on their own stacks or mine).

The code can be found here on GitLab, and a few days ago OfferZen published the recording of my presentation on their blog along with its transcript2 and my slide deck.

Thank you for joining me on my journey!



1 Although I was employed by one of the major cloud providers for a while, I've only worked "on" their cloud rather than "in", and while I do have extensive experience with Azure none of it included serverless functions.

2 The transcript has a few minor transcription errors, but they're more amusing than confusing.

Tuesday, 28 July 2020

Resolving private PyPI configuration issues

The Story

I've spent the last few days wrestling with the serverless framework, growing fonder and fonder of AWS' CDK at every turn. I appreciate that serverless has been, until recently, very necessary and very useful, but I feel confident in suggesting that when deployment mechanisms become more complex than the code they're deploying things aren't moving in the right direction.

That said, this post is about overcoming what for us was the final hurdle: getting Docker able to connect to a private PyPI repository for our Python lambdas and lambda layers.

Python is great, kind of, but is marred by two rather severe complications: the split between python and python3, and pip… so it's a great language, with an awful tooling experience. Anyway, our Docker container couldn't communicate with our PyPI repo, it took way too long to figure out why, here's what we learned:

The Solution

If you want to use a private PyPI repository without typing in your credentials at every turn, there are two options:

1.


~/.pip/pip.conf:



2.


~/.pip/pip.conf:



~/.netrc:



It is important that ~/.netrc has the same owner:group as pip.conf, and that its permissions are 0600 (chmod 0600 ~/.netrc).

What's not obvious - or even discussed anywhere - is that special characters are problematic and are handled differently by the two mechanisms.

In pip.conf, the password MUST be URL encoded.
In .netrc, the password MUST NOT be URL encoded.

The Docker Exception

For whatever reason, solution 2 (the combination of pip.conf and .netrc) does NOT work with Docker.

Conclusion

Amazon's CDK is excellent, and unless you have a very specific use-case that it doesn't support it really is worth trying out!

Oh! And that Python is Very Nice, but simply isn't nice enough to justify the cost of its tooling.

Friday, 3 July 2020

the rights and wrongs of git push with tags

My employer uses tags for versioning builds.

git push && git push --tags

I'm guessing this is standard practice, but we've been running into an issue with our build server - Atlassian's Bamboo - picking up commits without their tags. The reason was obvious: we've been using git push && git push --tags, pushing tags after the commits, and Bamboo doesn't trigger builds on tags, only on commits. This means that every build would be versioned with the previous version's tag!

The solution should have been obvious, too: push tags with the commits. Or before the commits? This week, we played around with an experimental repo and learned the following:

git push --tags && git push

To the uninitiated (like us), the result of running git push --tags can be quite surprising! We created a commit locally, tagged it*, and ran git push --tags. This resulted in the commit being pushed to the server (Bitbucket, in our case) along with its tag, but the commit was rendered invisible. Not even git ls-remote --tags origin would return it, and it was not listed under the commits on its branch, although it showed up with its commit on Bitbucket's feature to search commits by tag quite clearly:



* All tags are annotated, automatically, courtesy of SourceTree. If you're not using SourceTree then you should be, and I promise you I'm not being paid to say that. If you must insist on the barbaric practice of using the command line, just add -a (and -m to specify the message inline if you don't want the editor to pop up, check out the documentation for more details).

Pushing a tag first isn't the end of the world - simply pushing the commit with git push afterwards puts everything in order - unless someone else pushes a different commit first. At that point the original commit will need to be merged, with merge conflicts to be expected. Alternatively - and relevant for us - any scripts that perform commits automatically might fail between pushing the tags and their commits, leading to "lost" commits.

git push --tags origin refs/heads/develop:refs/heads/develop

This ugly-to-type command does what we want it to do: it pushes the commit and its tags together. From the documentation:

When the command line does not specify what to push with <refspec>... arguments or --all, --mirror, --tags options, the command finds the default <refspec> by consulting remote.*.push configuration, and if it is not found, honors push.default configuration to decide what to push (See git-config[1] for the meaning of push.default).

Okay, so that works. But there's no way I'm typing that each time, or asking any of my coworkers to. I want English. I'm funny that way.

git push --follow-tags

Here we go: the real solution to our problems. As long as the tags are annotated - which they should be anyway, see the earlier footnote on tagging - running git push --follow-tags will push any outstanding commits from the current branch along with their annotated tags. Any tags not annotated, or not attached to a commit being pushed, will be left behind.

Resolved!