News

My campaign to produce Shakespeare's Sonnets: A Graphic Novel Adaptation needs your help! Please sign up at https://www.patreon.com/fisherking for access to exclusive content and the opportunity to be a part of the magic!

I'm also producing a podcast discussing the sonnets, available on
industrial curiosity, itunes, spotify, stitcher, tunein and youtube!
For those who prefer reading to listening, the first 25 sonnets have been compiled into a book that is available now on Amazon and the Google Play store.

Monday, 5 April 2021

Choosing the right password manager to keep your secrets safe

Photo by George Becker from Pexels

If you’re not using a password manager by now, you should be. Ever since reading the xkcd: Password Strength comic many years ago, I’ve become increasingly frustrated by how the software industry has continued to enforce bad password practices, and by how few services and applications apply best practices in securing our credentials. 

The main reason for password reuse or using poor passwords in the first place is because it’s way too hard to remember lots of good ones.

By forcing us to remember more and more passwords with outdated rules such as demanding symbols, numbers and a mix of uppercase and lowercase characters, most people have turned to using weak passwords, or reusing the same passwords or patterned recombinations of those passwords and leaving us vulnerable to simple exploits.

I recently learned about ‘; — have i been pwned?, and I was shocked to discover that some of the breaches that included my personal data included passwords that I had no idea were compromised… for years. Then I looked up my wife’s email address, and together we were horrified.

Lots of those compromised credentials were on platforms we didn’t even remember we had accounts on, so asking us what those passwords were and whether we’ve reused them elsewhere is futile.

A developer’s perspective

As an experienced software engineer, I understand just enough about security to be keenly aware of how little most of us know and how important it is to be familiar with security best practices and the latest security news in order to protect my clients.

I will never forget that moment a few years back when, while working for a well-established company with many thousands of users and highly sensitive data, I came across their password hashing solution for the first time: my predecessor had “rolled his own” security by MD5 hashing the password before storing it… a thousand times in a loop. As ignorant as I was myself regarding hashing, a quick search made it clear that this was making the system less secure, not more.
This was a professional who thought he was caring for his customers.

In 2019 I put together an open-sourced javascript package, the simple-free-encryption-tool, for simple but standard javascript encryption that’s compatible with C#, after finding the learning curve for system security to be surprisingly steep for something so critical to the safe operations of the interwebs.

The biggest takeaways from my little ventures into information security are as follows:

  1. Most websites, platforms and services that we trust with our passwords cannot be relied upon to protect our most sensitive information.
  2. Companies should not be relying exclusively on their software developers to protect customer credentials and personal data.
  3. As a consumer, or customer, or client, we need to take responsibility for our passwords and secrets into our own hands.
  4. Trust (almost) no-one.

What’s wrong with writing down my passwords on paper?

It’s so hard to remember and share passwords that lots of people have taken to recording them on sticky notes, or in a notebook, and I cannot stress enough just how dangerous a practice this is.

First, any bad actor who has physical access to your desk or belongings and (in their mind) an excuse to snoop on you or hurt you, will generally be privy to more of your personal data than some online hacker who picks up a couple of your details off an underground website. This means that it will be far easier for them to get into your secrets and do you harm.

Second, and far more likely, if those papers are lost or damaged you’re probably going to find yourself in hot water. For example, I’ve run into trouble with my Google credentials before and locked myself out of my account, and even after providing all the correct answers it was still impossible for me to get back in. There are many faceless services like this, so even a simple accident (or just misplacement) and you could find yourself in a very uncomfortable position.

What is a password manager?

A password manager is an encrypted database that securely stores all of your secrets (credentials or others) and enables you to retrieve them with a single set of credentials and authentication factors. Modern password managers tend to provide the ability to synchronize these databases on multiple devices and even inject your credentials directly where you need them.

Things to consider when picking a password manager

Standalone, cloud-based, or self-hosted

For individuals who aren’t prepared to trust the internet (or even their local networks) with their secrets, there are password managers that are designed to be stored and accessed locally. These are essentially interfaces to encrypted database files that reside on your local hard disk, and you are responsible for backing them up and copying them between devices. A word of caution: if you’re synchronizing these databases by uploading them to a file sharing service like Dropbox, you’re operating in a way that’s likely less secure than using a cloud-based service.

Cloud-based solutions are services provided by an organization that allows you to store your secrets on their platforms and trust in their experts to secure them. While user costs may vary, they don’t require any effort when it comes to maintenance, syncing between devices and backing up and they usually provide great interfaces with integrations for desktops, browsers and mobile phones.

An important aspect to take into consideration when it comes to cloud-based solutions is the provider’s reputation and history of breaches. Nobody’s perfect in the world of security — security is a perpetual arms race between the white hats and the black hats — but what speaks volumes is how an organization comports itself when things go wrong. Do they consistently apply best practices and upgrades? Do they react to breaches quickly, transparently, and in their clients’ best interests?

Self-hosted solutions are where you or your organization are required to install and maintain the service on a web server, preferably on a secure internal network, so that your users (your family or coworkers) can operate as if it’s a cloud-based solution. These are generally cheaper for businesses, but somewhat more difficult to maintain and often less secure than cloud-based solutions (depending on the competence of whoever’s responsible for your network), but from a user’s point of view it amounts to the same thing.

Password sharing for family and teams

Some people need to share credentials more than others. In my family, my wife and I are consistently sharing accounts so it doesn’t make sense for us to have individual duplicate copies of our shared accounts in each of our password accounts, and the same goes for me and my coworkers when it comes to our developer and administrator passwords for some of our products and service accounts. For these uses, it’s a good idea to use a solution that facilitates password sharing, and some of the services make it easy to set up groups and group ownership of credentials.

Mobile, OS and desktop browser support

Many password managers provide varying levels of integration for the wide variety of devices and browsers available — some solutions simply won’t give you any more than the barest essentials. Some people prefer to be able to unlock their passwords using biometrics, some prefer not to use their mobile devices at all, so before looking at the feature comparisons it’s worth giving a minute or two of thought towards how you intend to use it.

The good news is that most of the major solutions allow exporting and importing of your secrets, so if you have any doubts about your decisions you probably won’t have to worry too much about being locked in.

Free vs Paid

While pricing is obviously an important factor, I feel like one should first have an idea of what features one needs before comparing on pricing. Most of the solutions offer similar prices per user, with some exceptions.

This is one of those rare situations where, depending on your requirements, you might actually be better off with a free product!

The Feature Comparison

Standalone, cloud-based, or self-hosted

Password sharing for family and teams

Mobile, OS and desktop browser support

Free vs Paid

Summary

With the wide variety of needs and options available, each solution listed above has its benefits and its tradeoffs. I hope you’ve found this helpful, if you have any questions, corrections, comments or suggestions I look forward to reading them in the comments below!

Friday, 2 April 2021

Simple safe (atomic) writes in Python3

 In sensitive circumstances, trusting a traditional file write can be a costly mistake - a simple power cut before the write is completed and synced may at best leave you with some corrupt data, but depending on what that file is used for you could be in for some serious trouble.

While there are plenty of interesting, weird, or over-engineered solutions available to ensure safe writing, I struggled to find a solution online that was simple, correct and easy-to-read and that could be run without installing additional modules, so my teammates and i came up with the following solution:

Explanation:

temp_file = tempfile.NamedTemporaryFile(delete=False,
                                        dir=os.path.dirname(target_file_path))

The first thing to do is create a temporary file in the same directory as the file we're trying to create or update. We do this because move operations (which we'll need later) aren't guaranteed to be atomic when they're between different file systems. Additionally, it's import to set delete=False as the standard behaviour of the NamedTemporaryFile is to delete itself as soon as it's not in use.

# preserve file metadata if it already exists
if os.path.exists(target_file_path):
    copyWithMetaData(target_file_path, temp_file.name)

We needed to support both file creation and updates, so in the case that we’re overwriting or appending to an existing file, we initialize the temporary file with the target file’s contents and metadata.

with open(temp_file.name, mode) as f:
    f.write(file_contents)
    f.flush()
    os.fsync(f.fileno())

Here we write or append the given file contents to the temporary file, and we flush and sync to disk manually to prepare for the most critical step:

os.replace(temp_file.name, target_file_path)

This is where the magic happens: os.replace is an atomic operation (when the source and target are on the same file system), so we're now guaranteed that if this fails to complete, no harm will be done.

We use the finally clause to remove the temporary file in case something did go wrong along the way, but now the very worst thing that can happen is that we end up with a temporary file ¯\_(ツ)_/¯


Saturday, 13 February 2021

Approaching Software Engineering as an evolutionary process

A year or two ago I had an opportunity to sit down with AWS's Marcin Kowalski in a cafeteria and discuss the problems of software development at almost-unimaginable scale. I walked away with a new (for me) conception of software engineering that is part engineering, part organic biology, and I've found this perspective has shifted my approach to software development in a powerful and immensely helpful way.

As Computer Scientists and Software Engineers, we've been trained to employ precision in algorithm design, architecture and implementation: Everything must be Perfect. Everything must be Done Right.

For smaller, isolated projects, this engineering approach is critical, sound and practical, but once we begin to creep into the world of integrated solutions and micro-services it rapidly begins to break down.

Saturday, 23 January 2021

Towards a Python version of my Javascript AWS CDK guide

After successfully using a Typescript CDK project to deploy a python lambda on Thursday, I decided to spend some time this evening creating a Python CDK guide. It's very limited at the moment (just a simple function and a basic lambda layer), but it's a start!

Sunday, 25 October 2020

Keeping Track of your Technical Debt

The impact of technical debt

Over the years the concept of "technical debt" has become a phrase that can generate anxiety and a lack of trust, as well as setting up developers and their managers for failure. The metaphor might not be perfect (Ralf Westphal makes a strong case for treating it like an addiction), but I feel it's pretty apt if you think of the story of The Pied Piper of Hamelin - if you don't pay the piper promptly for keeping you alive, he'll come back to steal your future away.

Maybe I'm being a bit dramatic, but I value my time on this planet and over the course of two decades as a software developer in various sectors of the industry I have (along with countless others) paid with so much of my own precious lifetime (sleep hours, in particular) for others' (and occasionally my own) quick fixes, rushed decisions and hacky workarounds that it pains me to even think about it.

It's almost always worthwhile investing in doing things right the first time, but we rarely have the resources to invest and we are often unable to accurately predict what's right in the first place.

So let's at least find a way to mitigate the harm.

Why TODOs don't help

The traditional method of initiating technical debt is the TODO. You write a well-meaning (and hopefully descriptive) comment starting with TODO, and "Hey, presto!" you have a nice, easy way to find all those little things you meant to fix. Right? Except that's usually not the case. We are rarely able to make time to go looking for more things to do, and even when we can, with modern software practices it's unlikely that searching across all of our different repositories will be effective.

What generally ends up happening, then, is that we only come across TODOs by coincidence, when we happen to be working with the code around it, and the chances are that it'll be written by a different developer from a different time period and be explained in... suboptimal language.

"Why hasn't this been done?", you may well ask.

"Leave that alone! We don't remember why it works," you may well be told.

Track your technical debt with this one simple trick!

No matter the size of your team (even a team of one), you should always be working with some kind of issue tracking software. There's decent, free software out there, so there's really no excuse.

The fix? All you need to do is log a ticket for each TODO. Every time you find yourself writing a TODO, and I mean every single time, create a ticket. In both the ticket and the TODO comment, explain what you're doing, what needs to be done, why it needs to be deferred, and how urgent completing the TODO task is.

Then reference the ticket in the TODO comment.

This way you'll have a ticket - which I like to think of as an IOU - which can be added to the backlog and remembered when grooming and planning. This also provides the developer who encounters the TODO in the code a way to review any details and subsequent conversations in the ticket.

One interesting side effect of this approach? I often find that it's more effort to create the TODO ticket than to do the right thing in the first place, which can be a great incentive for the juniors to avoid the practice.

Another?


After establishing with my team that I will not approve a TODO unless there's a ticket attached, I must admit... I do sleep just a little bit better at night.

Saturday, 24 October 2020

Reading and writing regular expressions for sane people

Your regular expressions need love. Reviewers and future maintainers of your regular expressions need even more.

No matter how well you've mastered regex, regex is regex and is not designed with human-readability in mind. No matter how clear and obvious you think your regex is, in most cases it will be maintained by a developer who a) is not you and b) lacks context. Many years ago I developed a simple method for sanity checking regex with comments, and I'm constantly finding myself demonstrating its utility to new people.

There are some great guides out there, like this one, but what I'm proposing takes things a step or two further. It may take a minute or two of your time, but it almost invariably saves a lot more than it costs. I'm not even discussing flagrant abuse or performance considerations.


Traditional regex: the do-it-yourself pattern


The condescending regex. Here you're left to your own devices. Thoughts and prayers.

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

(example taken from this question)


Kind regex: intention explained


It's the least you can do! A short line explaining what you're matching with an example or two (or three).

// parse a url, only capture the host part
// eg. protocol://host
//     protocol://host:port/path?querystring#anchor
//     host
//     host/path
//     host:port/path
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;


Careful regex: a human-readable breakdown


Here we ensure that each element of the regex pattern, no matter how simple, is explained in a way that makes it easy to verify that it's doing what we think it's doing and can modify it safely. If you're not an expert with regex, I recommend using one of the many available tools such as regexr.com.

// parse a url, only capture the host part
//     (?:([A-Za-z]+):)?
//         protocol - an optional alphabetic protocol followed by a colon
//     (\/{0,3})(0-9.\-A-Za-z]+)
//         host - 0-3 forward slashes followed by alphanumeric characters
//     (?::(\d+))?
//         port - an optional colon and a sequence of digits
//     (?:\/([^?#]*))?
//         path - an optional forward slash followed by any number of
//         characters not including ? or #
//     (?:\?([^#]*))?
//         query string - an optional ? followed by any number of
//         characters not including #
//     (?:#(.*))?
//         anchor - an optional # followed by any number of characters
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

Now that we've taken the time to break this down, we can identify the intention behind the patterns and ask better questions: why is the host the only matched group? Was this tested? (Because (0-9.\-A-Za-z] is clearly an error, and there are almost no restrictions on invalid characters)

Unless you're a sadist (or a masochist), this is definitely a better way to operate: be careful, and if you can't be careful then at least be kind.

Wednesday, 19 August 2020

Priority pinning in apt-preferences with different versions and architectures

I'm posting this because I've lost too many hours figuring it out myself, the documentation is missing several important notes and I haven't found any forum posts that really relate to this:

Question: How do I prioritize specific package versions for multiple architectures? In particular, I have a number of different packages which I would like to download for multiple architectures, and I would like to prioritize the versions so that if they're not explicitly provided, apt will try to get a version matching my arbitrary requirements (in my case, the current git branch name) and fall back to our develop branch versions.

eg. I would like to download package my-package for both i386 and amd64 architectures and I would like to pull the latest version that includes my-git-branch-name before falling back to the latest that includes develop.

Answer:

The official documentation is here.

1. In order to support multiple architectures, all packages being pinned must have their architecture specified, and there must be an entry for each architecture. A pinning for the package name without the architecture specified will only influence the default (platform) architecture:

Package: my-package

Pin: version /your regex here/

Pin-Priority: 1001

2. The entries are whitespace-sensitive, although no errors will be reported if you have whitespace. The following pinning will be silently disregarded:

Package: my-package:amd64

    Pin: version /your regex here/

    Pin-Priority: 1001

3. apt update must be called after updating the preferences file in order for them to be respected and after adding additional architectures using (for example) dpkg --add-architecture i386

The following excerpt from /etc/apt/preferences solves the stated problem:


Package: my-package:amd64

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:i386

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:amd64

Pin: version /-develop-/

Pin-Priority: 900


Package: my-package:i386

Pin: version /-develop-/

Pin-Priority: 900 

It may be worthwhile noting that to download or install a package with a specified architecture and version use the command apt download package-name:arch=version

Sunday, 16 August 2020

Enabling Programmable Bank Cards To Do All The Things

A little while back, Offerzen and Investec opened up their Programmable Bank Card Beta for South Africans, taking over from Root, and I was excited to be able to sign up!

My long-term goal with programmable banking is to integrate crypto transactions directly with my regular banking - which might be possible quite soon as Investec is starting to roll out its open API offering in addition to the card programme - and as far as I can tell that could be a real game-changer in terms of making crypto trading practical for non-investment purposes.
When I joined, though, what I found was a disparate group of really interesting projects and only a two-second time window to influence whether your card transaction will be approved or not.

I decided to bide my time by building a bridge for everyone else's solutions, one which would provide a central location to determine transaction approval and enable card holders (including myself, of course) to forward their transactions securely to as many services as they like without costing them running time on their card logic.

It was obvious to me that this needed to be a cloud solution, and as someone with zero serverless experience1 I spent some time evaluating AWS, Google and Microsoft's offerings. My priorities were simple:
  • right tool for the job
  • low cost
  • high performance
  • good user experience
In the end AWS was the clear winner with the first two priorities, so much so that I didn't even bother worrying about any potential performance tradeoffs (there probably aren't any) and I was prepared to put up with their generally mixed bag of user experience. I also had the added incentive that my current employer uses AWS in their stack so I would be benefiting my employer at the same time.

Overall, I'm very glad that I chose AWS in spite of the trials and tribulations, which you can follow in my earlier post about building a template for CDK for beginners (like myself). In addition to learning how to use CDK, this project has given me solid experience in adapting to cloud architecture and patterns that are substantially different from anything I've worked with before, and I've already been able to effectively apply my learnings to my contract work.

One of the obligations of the Programmable Bank Card beta programme is sharing your work and presenting it; I was happy to do so, but for months I've been locked down and working in the same space as my wife and four year-old (we don't have room for an office or much privacy in our apartment); with all the distractions of home schooling and having to work while my wife and kid play my favourite video games I've barely been able to keep up my hours with my paid gig, so making time for extra stuff? That hasn't been so easy.

A couple of weeks ago my presentation was due, and I spent a lot of the preceding two weekends in a mad scramble to get my project as production-ready as I could - secure, reliable, usable. Half an hour before "go time" I finally deployed my offering, and I was a little bit nervous doing a live demo... I mean, sure, I'd tested all the end points after each deployment - but as we all know: Things Happen. Usually During Live Demos.

Fortunately, the presentation went very well! I spent the following weekend adding any missing "critical" functionality (such as scheduling lambda warmups and implementing external card usage updates), and I'm hoping that some of my fellow community members get some good use out of it (whether on their own stacks or mine).

The code can be found here on GitLab, and a few days ago OfferZen published the recording of my presentation on their blog along with its transcript2 and my slide deck.

Thank you for joining me on my journey!



1 Although I was employed by one of the major cloud providers for a while, I've only worked "on" their cloud rather than "in", and while I do have extensive experience with Azure none of it included serverless functions.

2 The transcript has a few minor transcription errors, but they're more amusing than confusing.

Tuesday, 28 July 2020

Resolving private PyPI configuration issues

The Story

I've spent the last few days wrestling with the serverless framework, growing fonder and fonder of AWS' CDK at every turn. I appreciate that serverless has been, until recently, very necessary and very useful, but I feel confident in suggesting that when deployment mechanisms become more complex than the code they're deploying things aren't moving in the right direction.

That said, this post is about overcoming what for us was the final hurdle: getting Docker able to connect to a private PyPI repository for our Python lambdas and lambda layers.

Python is great, kind of, but is marred by two rather severe complications: the split between python and python3, and pip… so it's a great language, with an awful tooling experience. Anyway, our Docker container couldn't communicate with our PyPI repo, it took way too long to figure out why, here's what we learned:

The Solution

If you want to use a private PyPI repository without typing in your credentials at every turn, there are two options:

1.


~/.pip/pip.conf:



2.


~/.pip/pip.conf:



~/.netrc:



It is important that ~/.netrc has the same owner:group as pip.conf, and that its permissions are 0600 (chmod 0600 ~/.netrc).

What's not obvious - or even discussed anywhere - is that special characters are problematic and are handled differently by the two mechanisms.

In pip.conf, the password MUST be URL encoded.
In .netrc, the password MUST NOT be URL encoded.

The Docker Exception

For whatever reason, solution 2 (the combination of pip.conf and .netrc) does NOT work with Docker.

Conclusion

Amazon's CDK is excellent, and unless you have a very specific use-case that it doesn't support it really is worth trying out!

Oh! And that Python is Very Nice, but simply isn't nice enough to justify the cost of its tooling.

Friday, 3 July 2020

the rights and wrongs of git push with tags

My employer uses tags for versioning builds.

git push && git push --tags

I'm guessing this is standard practice, but we've been running into an issue with our build server - Atlassian's Bamboo - picking up commits without their tags. The reason was obvious: we've been using git push && git push --tags, pushing tags after the commits, and Bamboo doesn't trigger builds on tags, only on commits. This means that every build would be versioned with the previous version's tag!

The solution should have been obvious, too: push tags with the commits. Or before the commits? This week, we played around with an experimental repo and learned the following:

git push --tags && git push

To the uninitiated (like us), the result of running git push --tags can be quite surprising! We created a commit locally, tagged it*, and ran git push --tags. This resulted in the commit being pushed to the server (Bitbucket, in our case) along with its tag, but the commit was rendered invisible. Not even git ls-remote --tags origin would return it, and it was not listed under the commits on its branch, although it showed up with its commit on Bitbucket's feature to search commits by tag quite clearly:



* All tags are annotated, automatically, courtesy of SourceTree. If you're not using SourceTree then you should be, and I promise you I'm not being paid to say that. If you must insist on the barbaric practice of using the command line, just add -a (and -m to specify the message inline if you don't want the editor to pop up, check out the documentation for more details).

Pushing a tag first isn't the end of the world - simply pushing the commit with git push afterwards puts everything in order - unless someone else pushes a different commit first. At that point the original commit will need to be merged, with merge conflicts to be expected. Alternatively - and relevant for us - any scripts that perform commits automatically might fail between pushing the tags and their commits, leading to "lost" commits.

git push --tags origin refs/heads/develop:refs/heads/develop

This ugly-to-type command does what we want it to do: it pushes the commit and its tags together. From the documentation:

When the command line does not specify what to push with <refspec>... arguments or --all, --mirror, --tags options, the command finds the default <refspec> by consulting remote.*.push configuration, and if it is not found, honors push.default configuration to decide what to push (See git-config[1] for the meaning of push.default).

Okay, so that works. But there's no way I'm typing that each time, or asking any of my coworkers to. I want English. I'm funny that way.

git push --follow-tags

Here we go: the real solution to our problems. As long as the tags are annotated - which they should be anyway, see the earlier footnote on tagging - running git push --follow-tags will push any outstanding commits from the current branch along with their annotated tags. Any tags not annotated, or not attached to a commit being pushed, will be left behind.

Resolved!

Monday, 22 June 2020

A templated guide to AWS serverless development with CDK

If all you're looking for is a no-frills, quick, step-by-step guide, just scroll on down to The Guide!


The Story

Or, how I learned to let go of my laptop and embrace The Cloud...


Motivation


A while ago I outlined a project that I intend to build, one that I'm rather enthusiastic about, and after spending a few hours evaluating my options I decided that an AWS solution was the right way to go: their options are configurable and solid, their pricing is very reasonable and, as an added bonus, I get to familiarize myself with tech I'll be using for my current contract!

All good.

Once I'd made the decision to use AWS, becoming generally familiar with the products on offer was easy enough and it took me very little time to flesh out a design. I was ready to get coding!

Or so I thought...


Hurdle 1

Serverless. And possibly other frameworks, I'm pretty sure I looked at a few but it seems like forever ago. Serverless shows lots of promise, and I know people who swear by it, but it's only free up to a point and I immediately ran into strange issues with my first deployment. I found the interface surprisingly unhelpful, and it looks like once you're using it you're somewhat locked in.

Pass.


Hurdle 2

Cognito. Security first. Cognito sounds like a great solution, but once I'd gotten a handle on how it works I was severely disappointed by how limiting it is, and getting everything set up takes real developer and user effort (and I suffer enough from poor interface fatigue). After playing around with user pools and looking at various code samples, I realized that I'd rather allow users to register using their email addresses / phone numbers as 2nd-factor authentication (mailgun and twilio are both straightforward options), or use oauth providers like Facebook, Google and GitHub, and I certainly want to encourage my users to use strong, memorable passwords (easily enforced with zxcvbn, and why is this still a thing?!) which Cognito doesn't allow for.

You'll need to configure Lambda authorizers either way, so I really don't think Cognito adds much value.


Hurdle 3

Lambda / DynamoDB. Okay, so writing a lambda function is really easy, and guides and examples for code that reads to and writes from DynamoDB abound. Great! Except for the part where you need to test your functions before deploying them.

My first big mistake - and so far it's proved to be the most expensive one - was not understanding at this point that "testing locally" is simply not a feasible strategy for a cloud solution.


Hurdle 4

The first code I wrote for my project was effectively a lambda environment emulator to test my lambda functions. It was far from perfect, and it did take me a couple of hours to cobble together, but it did the job and I used it to successfully test lambda functions against DynamoDB running in Docker.


Hurdle 5

Lambda Layers. Why do most guides not touch on these? Why are there so few layers guides written for Javascript? It took me a little while to get a handle on layers and build a simple script to create them from package.json files, but as far as hurdles go this was a relatively short one.


Hurdle 6

Deployment. It's nice to have code running locally, but uploading it to the cloud? Configuring API Gateway was a mixed bag of Good Interface / Bad Interface, same with the Lambda functions, and what eventually stumped me was setting up IAM to make everything play nicely together. What's the opposite of intuitive? Not counter-intuitive, in this case, as I don't feel that that word evokes nearly enough frustration.

Anyway, it became abundantly clear at that point that manual deployment of AWS configurations and components is not a viable strategy.


Hurdle 7

A coworker introduced me to two tools that could supposedly Solve All My Problems: CDK and SAM. This seemed like a worthy rabbit-hole to crawl into, but I couldn't find any examples of the two working together!

I began to build my own little framework, one that would allow me to configure my stack using CDK, synthesize the CloudFormation templates, and test locally using SAM. Piece by piece I put together this wonderful little tool, hour by hour, first handling DynamoDB, then Lambda functions, then Lambda Layers...

It was at that point that realization dawned: not only are SAM and CDK not interoperable by design, but SAM does not, in fact, provide meaningful "local" testing. Sure, you can invoke your lambda functions on your local machine, but the intention is to invoke them against your deployed cloud infrastructure. Once I got that through my head, it was revelation time: testing in the cloud is cheaper and better than testing locally.


The Guide

If you're like me, and you intend your first foray into cloud computing to be simple, yet reasonably production-ready, CDK is the easiest way forward and it's completely free (assuming you don't factor in your time figuring it out as valuable, but that's what I'm here for!).

Over the course of the past couple of weeks, I've put together the aws-cdk-js-dev-guide. It's a work in progress (next stop, lambda authenticators!), but at the time of writing this guide it's functional enough to put together a simple stack that includes Lambda Layers, DynamoDB, functions using both of those, API Gateway routes to those functions and the IAM permissions that bind them.

And that's just the tip of the CDK iceberg.


Step 1 - Tooling

It is both valuable and necessary to go through the following steps prior to creating your first CDK project:

  • Create a programmatic user in IAM with admin permissions
  • If you're using Visual Studio Code (recommended), configure the aws toolkit
  • Set up credentials with the profile ID "default"
  • Get your 12 digit account ID from My Account in the AWS console
  • Follow the CDK hello world tutorial


Step 2 - Creating a new CDK project

The first step to creating a CDK project is initializing it with cdk init, and a CDK project cannot be initialized if the project directory isn't empty. If you would like to use an existing project (like aws-cdk-js-dev-guide) as a template, bear in mind that you will have to rename the stack in multiple locations and it would probably be safer and easier to create a new project from scratch and copy and paste in whatever bits you need.


Step 3 - Stack Definition

There are many viable approaches to setting up stages, mine is to replicate my entire stack for development, testing and production. If my stack wasn't entirely serverless - if it included EC2 or Fargate instances, for example - then this approach might not be feasible from a cost point of view.

Stack definitions are located in the /lib folder, and this is where the stacks and all their component relationships are configured. Here you define your stacks programmatically, creating components and connecting them.



I find that the /lib folder is a good place to put your region configurations.



Once you have completed this, the code to synthesize the stack(s) is located in the /bin folder. If you intend, like I do, to deploy multiple replications of your stack, this is the place to configure that.



See the AWS CDK API documentation for reference.


Step 4 - Lambda Layers

I've got much more to learn about layers at the time of writing, but my present needs are simple: the ability to make npm packages available to my lambda functions. CDK's Code.fromAsset method requires the original folder (as opposed to an archive file, which is apparently an option), and that folder needs to have the following internal structure:

.
.nodejs
.nodejs/package.json
.nodejs/node_modules/*

You can manually create this folder anywhere in your project, maintain the package.json file and remember to run npm install and npm prune every time you update it... or you can just copy in the build-layers script, maintain a package.json file in the layers/src/my-layer-name directory and run the script as part of your build process.


Step 5 - Lambda Functions

There are a number of ways to construct lambda functions, I prefer to write mine as promises. There are two important things to note when putting together your lambda functions:

1. Error handling: if you don't handle your errors, lambda will handle them for you... and not gracefully. If you do want to implement your lambda functions as promises, as I have, try to use "resolve" to return your errors.

2. Response format: your response object MUST be in the following structure:


This example lambda demonstrates using a promise to return both success and failure responses:



Step 6 - Deployment

There are three steps to deploying and redeploying your stacks:
  1. Build your project
    If you're using Typescript, run
    > tsc
    If you're using layers, run the build-layers script (or perform the manual steps)
  2. Synthesize your project
    > cdk synth
  3. Deploy your stacks
    > cdk deploy stack-name
    Note: you can deploy multiple stacks simultaneously using wildcards
At the end of deployment, you will be presented with your endpoints. Unless your lambda has its own routing configured - see the sample dynamodb API examples that follow - simply make your HTTP requests to those URLs as is.


    Sample dynamodb API call examples:
  • List objects
    GET https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects
  • Get specific object
    GET https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects/b4e01973-053c-459d-9bc1-48eeaa37486e
  • Create a new object
    POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects
  • Update an existing object
    POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects/b4e01973-053c-459d-9bc1-48eeaa37486e

Step 7 - Debugging

Error reports from the API Gateway tend to hide useful details. If a function is not behaving correctly or is failing, go to your CloudWatch dashboard and find the log group for the function.

Step 8 - Profit!!!

I hope you've gotten good use of this guide! If you have any comments, questions or criticism, please leave them in the comments section or create issues at https://github.com/therightstuff/aws-cdk-js-dev-guide.