Rust/WebAssembly on AWS Lambda@Edge (CloudFront)
Ever felt limited by the languages on AWS Lambda@Edge? Wanted to run Rust for your CloudFront triggers, but re:Invent 2020 disappointed you in that matter? Let me show you one way of how you can still get it done.
tl;dr:
Check out the repo and give it a spin for yourself. You can also jump ahead to the code part and skip the story section.
At the last AWS re:Invent 2020 the company with a smile did announce quite some changes, improvement, and new products/services. Notably also a lot of interesting stuff in the AWS Lambda space. Yet one area—one very dear to me—was left out completely, total radio silence, nothing, nada: no improvements or new features for Lambda@Edge (L@E). This made me very sad. 😢
I tinkered with a custom solution in the past and also got some experiments running, but it was a pretty hand-rolled approach and nothing I could give to others if they wanted to do the very same.
I sat down yet again and took some of my learnings into a repository, so we have a starting point for some Wasm based serverless apps on the edge.
As of today (January 2021) AWS only offers two languages for Lambda@Edge: Python
and Node.js
. Both are available in relatively decent versions, too.
Since I have never bothered with Python too much in my life, but have my fair share of experience with JavaScript (JS) and node, I stuck to the latter as the trampoline environment for the setup. Also I know that it comes with WebAssembly support out of the box, which wouldn't be the case for Python anyway.
So what is it gonna be anyway?
As you've maybe guessed the project will be some Rust, which gets compiled down to WebAssembly. This artefact then can be loaded and executed in the node environment.
A word about performance: I have not evaluated a "JS vs Wasm" comparison, and it's also not part of this exercise. There have been people and article in the past vouching for one side or another, all with their own benchmarks. So I won't bother you with that and advise you to take your own measurements.
WebAssembly will not beat all JavaScript code, especially very fine-tuned one. V8 (the underlying JavaScript engine for both Chrome and Node.js) is a very performant beast and comes with just in time (optimizing) compilation for further boosts.
The Rust code in Wasm clothing can give you probably certain garantuees you miss from JavaScript, but again you have to evaluate if the benefits are worth the effort.
Potentially you might also consider to switch to Python as your runtime instead. At least that language should have real integers as far as I know. 😉
No doubt you can build and deliver very fast and also safe programs with Rust/WebAssembly. Especially if you need some specific algorithms and computations where JS/node might not be the best and you would resort to some (probably C-based) native libraries anyway.
There are only a few issues with that:
You have not full control of the execution environment of your edge functions. Sure, you can introspect with a test function what you're dealing with, but how sure will you really be, that the environment on CloudFront does provide the exact same system and system dependencies as your local development environment (or the non-edge Lambda environment for that matter)? AWS has a track record of not providing you with reproducible local environments. In fact, it looks like they get away with it even further, since the announcement for containerization support for regular AWS Lambda. People, who know me, also know that I'm not a big fan of big docker images, but I'm afraid that's what we will see now happening there. I hope AWS promotes good containerization guidelines to prevent that waste-of-storage mess. Furthermore I really don't want to see docker on the edge for that reason. One can just hope, right?
You work in a very constrained environment. Check the current limits for Lambda@Edge: the zipped code can use up to 50MB on the origin side and only 1MB max if it shall be deployed for the viewer side. Of course, this is usually still plenty of spaces for most use cases, packaging up plain JS results in very small archives. But once you take the first issue into consideration, then this could actually become another problem for you.
The size restriction can be mitigated for JS-only code pretty easily by bundling the code with WebPack, Parcel, or Rollup. General advise is anyway to never deploy the unoptimized package especially when you want to push it to the edge. The node_modules
folder grows very big, can still have quite some bloat even after an npm prune --production
, because it only looks at packages, but not the content of them. Yep, my minimalism brain kicked in again.
The system dependency problem can only be solved by using solely pure JavaScript libraries and packages. That might work for a while, but eventually some use case might demand a non-JS solution (either a native library or some executable).
For example let's say you want to build an image transformation function and want to use sharp
, a very well-known package in the JS ecosystem, then you already end up with around 37 MiB of data in your node_modules folder alone. Zipped up it's still around 13 MiB. That might be enough for you to run it as a trigger on the origin side of your CloudFront distribution; it's just about showing you how quickly a node project can grow.
If size and dependency management are not an issue
- Maybe you love Rust (or any frontend language capable of being compiled to WebAssembly).
- Maybe you love WebAssembly.
- Maybe you do not have good JavaScript/Node.js expertise in-house.
- Maybe you want to build your product with better safety.
- Maybe it should be more robust, too.
- Maybe you want to show AWS, that we need more than just Python and Node.js on the edge.
- Maybe you have some other valid reason to escape that limiting cage.
Whatever your reasons are, I hear you.
AWS is improving on one side, but also losing it on another. When it comes to CDNs (Content Delivery Networks) and Edge Computing, the competition is now sprinting ahead of AWS.
I cannot say a lot about Fastly's offering, it's mostly behind some beta doors, and mere mortals like myself are not allowed to peek. They have their fastlylabs, but that's for experimentation, not the final offering. So I don't even bother to check it out.
I can tell a bit more about Cloudflare though, because their edge computing offering is available and affordable to small businesses and individuals (even free in most cases). Check out Workers, really do! I have already played around with Workers and KV storage, it's a quite pleasant experience. I might write about a setup for them in the future as well.
Let's get started
GitHub repository to follow along:
https://github.com/asaaki/rust-wasm-on-lambda-edge
$ tree -L 1 -a -I .git # output truncated for brevity
.
├── .github - GitHub Actions workflow and dependabot
├── .gitignore
├── Makefile - very convenient make targets
├── README.md
├── fixtures - simple test fixtures and script
├── node - Node.js wrapper
├── rust - Big Business Here!
└── <and some more …>
Ingredients
- Makefile for project level management
- TypeScript (TS) for the Node.js part
- Type definitions for AWS Lambda
- Rollup as the bundler
- Rust for the, well, Rust part
- wasm-pack for WebAssembly building
zip
to package up the function for upload- Example fixtures and code to have a very quick and dirty request test
- GitHub Actions workflow for continuous integration (CI) purposes
On your machine you need to install Rust, node, wasm-pack, and zip, if not present yet. The workflow for GitHub Actions has that already sorted out for you.
This article won't give you steps to get your local development environment set up, please use a search engine of your choice and look up how to do it.
Node.js wrapper
I adopted a rollup based approach, since it's quite easy to get configured and also something we use at work. I always found webpack a little bit too cumbersome, and parcel is just yet another new kid on the block. I'm pretty sure you can adjust the project to your needs. All we need here is: compile TS to JS and bundle up everything into a single JS file. In the past I found the WebAssembly dependency management very tricky, in the end I used a plain "move the .wasm file into the final bundle" approach, which just works fine, because I did not want to inline the WebAssembly code (as most plugins try). Maybe you have a smarter solution for that, please open a pull-request in the repo. Just keep in mind: wasm-bindgen creates already a pretty decent module loader, so there is no need to work around that, but I fail to get any of these bundlers to move the wasm files along with it into the bundle output directory.
I use TypeScript here, because it gives you some nice support during development. Also the aws-lambda type definitions were useful to create the Rust structs and adjust the serialization. (AWS is actually very strict about the JSON you pass around, "something": null
for absent data does not work, either it is a completely required field with a strict type like a string, or it should not be in the JSON at all).
In general for any bigger frontend or node backend project I would recommend to use TS nowadays. While not every of your dependencies might come with type definitions, at least within your own codebase you can enforce strict rules and type checks.
To make the node wrapper as slim as possible, we pass the event and context data directly into the WebAssembly and return whatever it returns.
Btw if you do return a specific struct instead of a generic JsValue
, then TS checks will also kick in and use the auto-generated type definitions from the wasm-bindgen process.
For a quick baseline test and project I did not go down that road yet, as it would require to replicate all the TS specific stuff in the Rust code (wasm-bindgen cannot do everything fully automated yet). This is a great opportunity to create a utility crate for that, basically like rs2ts but in reverse direction. I wish aws-lambda-events already had those CloudFront events in it, but sadly they don't.
Yet also be aware of certain type limitations, read on to learn more about them.
Rust business logic
The Rust code is also nothing special so far.
// src/lib.rs
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;
mod types;
use std::panic;
use types::cloudfront as cf;
use wasm_bindgen::{intern, prelude::*};
use web_sys::console;
type JsValueResult = Result<JsValue, JsValue>;
#[wasm_bindgen(start, final)]
pub fn start() {
panic::set_hook(Box::new(console_error_panic_hook::hook));
console::log_1(&intern("(wasm module start)").into());
}
#[wasm_bindgen(final)]
pub async fn handler(event: JsValue, _context: JsValue) -> JsValueResult {
console::log_1(&intern("(wasm handler request call)").into());
let request = cf::Event::request_from_event(event)?;
// TODO: Fancy biz logic here ...
request.to_js()
}
Note: The displayed code might not be up-to-date with the repository version.
There is one function (start
) which is triggered when the Wasm module gets loaded. You can use it to set up some internal state if needed. We only used it here for configuring the panic handler; whenever an unrecoverable error happens, it gets logged via console.error
, helps immensely with debugging. And as we do console logging anyway, there shouldn't be any significant overhead for that part. The compilation output will probably a bit bigger because it needs to store more information for the better panic output.
The other—probably way more interesting—function is handler
, which takes the inputs from the JS side, does … a lot of nothing, and returns a request JSON blob for CloudFront to deal with.
Currently the machinery reads the arbitrary JsValue
and tries to deserialize it into a struct, so we can deal with it in code. This is definitely not the most efficient way of doing it, but the conversions in and out really avoid some current existing pain points.
For example wasm-bindgen has not a great story around Rust enums, for now only very simple C-style enums are allowed. Meaning: for our CloudFront (CF) event data, which can be more strictly typed into either a CF request or response event, this does not play well with Rust's richer enums, as we cannot convince wasm-bindgen to use them. There is an open issue around this topic, but it was created just recently and thus no work has been done yet. Similarly Rust's Vec
is also not fully supported yet (see issue 111), which might be the even bigger issue for some of us.
Workarounds can be a lot of Options and serialization skips, as I do internally anyway.
Some transformation overhead can be addressed by using serde-wasm-bindgen, but in my example repo I'll use it only for the input side (deserialization). On serialization a collection like HashMap or BTreeMap gets turned into an ES2015 Map, which is unfortunate as well, because they cannot be JSON stringified.
As you can see, currently there are trade-offs to be made in all corners, but that shouldn't stop us to explore further.
In the current state of the project I have provided pretty strict structs and enum for the CloudFront event data, it even surpasses now the TypeScript interfaces, which makes my point from the previous section pretty obsolete now. I still wish it was easier to autogenerate Rust types from TS definitions. The only good thing about CloudFront related data is, that it won't change that much … if at all. Some APIs in AWS have been stable for years now, so a "write once, use forever" approach might be sufficient.
Performance
I tested against a simple CloudFront distribution with S3 origin, within that bucket a small static HTML file to be served.
I live in Berlin, Germany, my closest AWS region is eu-central-1
(Frankfurt), and the CloudFront POP is usually FRA2-C1
as well.
I have pretty stable and fast connection, 100Mbit/s or more in download speed with a ping around 20 to 30ms are common for me.
Keep in mind: the following numbers are only true for me. You might observe completely different performance. Therefore I also recommend to measure the different baselines.
All tests will use the following command:
wrk -d 30 -c 5 -t 5 -R 5 -L https://<MY_DISTRIBUTION_DOMAIN>/
-d 30
- run it for 30 seconds-c 5
- only 5 connections, it's not supposed to be a heavy load test-t 5
- just aligning it to the connection count-R 5
- rate (throughput) of 5 (requests/s), also to avoid unnecessary contention-L
- latency statistics; I will only use the shorter summary
The numbers are fairly low because I don't want to test the overall performance of CloudFront in my area, but want to get consistent and repeatable numbers for L@E in general. Under high load the performance is impacted by many other factors, which we cannot really control.
To ignore the cache, all functions will be deployed as Viewer Request triggers, "Include body" enabled to give it a full round picture (no body payload will be sent and used though).
Furthermore after deployment I will run a warm-up round to eliminate the cold-start period, I'm not interested in those bad numbers. I know they are horrible, but we cannot do anything about it really; eventually a function will tear down and a new one needs to be spawned. So each wrk
will be run twice, but only the second run will be used here.
"No trigger" baseline
What's in it?
Bare CloudFront distribution with S3 origin, small HTML file as index (~1.4 kB filesize).
Results
Running 30s test @ https://<MY_DISTRIBUTION_DOMAIN>/
5 threads and 5 connections
Thread calibration: mean lat.: 29.903ms, rate sampling interval: 56ms
Thread calibration: mean lat.: 29.821ms, rate sampling interval: 53ms
Thread calibration: mean lat.: 29.696ms, rate sampling interval: 51ms
Thread calibration: mean lat.: 30.232ms, rate sampling interval: 65ms
Thread calibration: mean lat.: 30.708ms, rate sampling interval: 66ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.39ms 1.30ms 29.10ms 84.00%
Req/Sec 0.98 4.00 20.00 94.23%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 23.09ms
75.000% 23.73ms
90.000% 24.64ms
99.000% 29.04ms
99.900% 29.12ms
99.990% 29.12ms
99.999% 29.12ms
100.000% 29.12ms
Let's remember a p99 of around 30ms. Depending on internet weather it varies a bit, ±5ms are not uncommon.
"Empty Node.js handler" baseline as Viewer Request trigger
What's in it?
Empty here means only that the handler is not doing anything.
'use strict';
exports.handler = async (event) => {
return event.Records[0].cf.request;
};
Yep, that's all: take the request object out of the event and pass it down. It's the smallest possible function you could write. It would also be the most useless, too.
Results
Running 30s test @ https://<MY_DISTRIBUTION_DOMAIN>/
5 threads and 5 connections
Thread calibration: mean lat.: 43.793ms, rate sampling interval: 81ms
Thread calibration: mean lat.: 47.265ms, rate sampling interval: 106ms
Thread calibration: mean lat.: 44.360ms, rate sampling interval: 90ms
Thread calibration: mean lat.: 44.558ms, rate sampling interval: 84ms
Thread calibration: mean lat.: 44.164ms, rate sampling interval: 81ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 37.21ms 3.76ms 56.29ms 87.00%
Req/Sec 0.97 3.15 12.00 91.21%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 36.32ms
75.000% 37.95ms
90.000% 40.19ms
99.000% 49.98ms
99.900% 56.32ms
99.990% 56.32ms
99.999% 56.32ms
100.000% 56.32ms
So the baseline with Lambda@Edge being active for the p99 is now 50ms. On rainy days basically twice as slow as without any triggers. Again, account for some variance around ±5ms.
The reported duration in the Cloudwatch logs for the function is between 0.84ms and 1.53ms, so on average around 1ms simplified. This begs the question, where the other overhead went. There are roughly 20ms unaccounted for and missing. A tribute to the performance gods? I don't know. 🤷
Just keep that gap in mind. I guess this is the overhead between the Cloudfront request handling and call out to the L@E execution environment, somehow all this stuff needs to be orchestrated behind the scenes. It's just sad that I cannot find those timings anywhere. The pure CloudFront logs are also not conclusive.
The maximum memory used is 67 to 68 MB.
The simple Rust/WebAssembly module as Viewer Request trigger
What's in it?
See the code in the repo, I used the exact same version of it.
Results
Running 30s test @ https://<MY_DISTRIBUTION_DOMAIN>/
5 threads and 5 connections
Thread calibration: mean lat.: 44.980ms, rate sampling interval: 90ms
Thread calibration: mean lat.: 50.256ms, rate sampling interval: 167ms
Thread calibration: mean lat.: 44.296ms, rate sampling interval: 92ms
Thread calibration: mean lat.: 45.668ms, rate sampling interval: 87ms
Thread calibration: mean lat.: 49.321ms, rate sampling interval: 154ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 36.71ms 2.23ms 47.78ms 82.00%
Req/Sec 0.95 2.82 11.00 89.07%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 36.38ms
75.000% 37.34ms
90.000% 38.62ms
99.000% 44.32ms
99.900% 47.81ms
99.990% 47.81ms
99.999% 47.81ms
100.000% 47.81ms
No, the Wasm function didn't get magically faster than the "no-op" node version. With a p99 around 45ms it is just in the same ±5ms corridor.
We can conclude from this, that the module has no significant performance hit.
Reported duration is usually somewhere between 1.25ms and 1.50ms. I haven't seen it dropping further below, so let's say there is a 250µs on top of the average node baseline.
The maximum memory used is between 75 to 77 MB. I guess this is the additional allocation for the WebAssembly module, yet I'm not too worried about that. I assume the overhead can be amortized by running more memory efficient code within the module instead of the node environment. I'm pretty sure that a plain old JavaScript object needs more memory than a Rust struct.
Conclusion
This is all great news: you can run WebAssembly on AWS Lambda@Edge without a noticeable performance penalty. Now write your Rust code and run it on the edge.
Of course I do hope that in the future this will become more native. There's a lot of development happening in the WebAssembly space.
But maybe I've also convinced AWS to not move any faster, because we can solve the problem ourselves. And for a behemoth organization like them it can take many years to deliver even the smallest improvements which we consider to be no-brainers, … and then we wonder why it took them so long in the first place.
Yet I stay optimistic in general. I know that they know that Edge Computing is some hot stuff right now. They even launched a very specialized offering called AWS Wavelength. I'm looking forward to test this once it's more widely available.