I’m going to talk about my recent experience with serverless.
I’ve recently had the pleasure of working with serverless, namely AWS Lambdas, and, to cut a long story short, I’m impressed. There are some very particular and significant benefits to going down the serverless route, but there are also some key issues and problems which need to be overcome. I’m going to discuss some of this here, along with some of the architecture decisions that were made and whether I think I’ve made the right decisions.
So, lambdas, or serverless units, what are they?
They are small functions which operate for a short amount of time and then die.
So how should I treat them?
Treat them as small stateless bash scripts or something similar. You aren’t going to be loading an IoC framework, you aren’t going to be loading some magic do everything framework, you are going small, tiny, miniscule. For example, whilst you can program some lambdas using Java, I wouldn’t hesitate to banish Spring and Hibernate from the toolchain, they are too big and too clunky to start pulling into your lambdas.
Ok, no frameworks, I get it…
No, not no to all frameworks, just to the big and clunky ones. You need to be pragmatic here and you need to understand what a framework is pulling in.
… okay, but why?
This is where we need to understand some key constraints of serverless.
The first time a lambda is run, it will do some initialisation, think docker containers, there is a bit of stuff going on under the hood here, ok. So, to get an answer from a lambda for the first time will take a lot longer than any immediately preceding invocation. The reason for this is that there is some kind of container being instantiated with your code, and then it gets executed. The container doesn’t immediately die and is available for subsequent executions if, and only if, those executions occur within some unspecified amount of time (cloud provider dependent), say 5 minutes. So, if you have a request every second then you are good, the first invocation will probably take up to 5 seconds (depending on your code) and then other invocations will take circa 500ms (all numbers are made up but realistic, the timings completely depend on your code and what you are doing).
If you are going to start pulling in a framework, spring for example, each time spring is initialised, and depending on the amount of RAM and CPU you allocate to your lambda then you could potentially be looking at up to several minutes of just initialisation. So, you would need to adjust your lambda timeouts appropriately from whatever default, 6 seconds in AWS lambda, to 180 seconds to be safe. Then if you’ve not gotten the habit of tidying up after yourself or exiting properly or various things like that then suddenly you are looking at an ugly situation. If something is failing, or you end up doing eager fetching and start loading your entire DB into memory then it’s going to be 3 minutes before you fail out, if you’ve got multiple invocations, then bad news. You want small timeouts and you just don’t have the headroom to start loading in those kinds of clunky frameworks. Don’t get me wrong, they have their place, but it’s not here.
… so, I can share state right?
Technically, yes, but you shouldn’t. At least, you shouldn’t share domain state. A good practice is to share any DB connection state, this can seriously help your database, but if you start sharing and depending on domain state then just stop, it’s going to end badly. You can’t guarantee that your lambda container is going to live long enough, you can’t guarantee that the provider isn’t going to just kill all alive but inactive containers because of “reasons”. Don’t share state.
Yeah, ok, I get you. What’s this about RAM and CPU?
These are just typical pieces of config you can set which determines the amount of RAM and CPU allocated to your lambda for the duration of its execution. You will be charged more, the more you use.
Ok. So, what else do I need to worry about?
One of the other big challenges with serverless is how you handle data and your database(s). This is a difficult problem to solve. In microservice land the idea is that each microservice has its own data. This is a bit too difficult with individual lambdas, however, if you use something such as serverless.com then you can create a serverless stack which is just a collection of lambdas/functions/whatever the cloud provider chooses to call them. This means that you have a stack which is a collection of lambdas, all able to share similar code. Now, you want to think about how you group these together, and typically I would suggest you go for vertical slices of business domain. The only exception I’d take with this is the UI set of lambdas. It can be extremely useful to treat the way the data is visualised as a separate business slice, this will save you some interesting headaches later on with permissions and so forth.
Woah there, slow down a bit.
Ok, the key things to take away from that last part are:
- Group individual lambdas into vertical business slices and serverless stacks
- Make each stack manage it’s own data, don’t share the DB between stacks
- Consider making the GET requests for the UI its own stack
- serverless.com makes this super easy
Ok, I’ve got you. So, why should I use them then?
This is a good question, lambdas, and serverless implementations do present interesting challenges. It is important to realise that it’s a different set of challenges to microservices and monoliths, each has pro’s and con’s and it’s not a simple decision to just go with any of the available options.
These are the main reasons why I’d go for lambdas:
- Pay for what you need
- Insane scaling capacity
- Zero server maintenance
- Easier to visually isolate functions
- Very quick to get going with serverless.com
I cannot overstate how important the above points are and I’m going to elaborate a little more here. You literally pay only for what you need, and unless you are doing an insanely highly chatty application or service there is a good chance you’ll get a significant amount of capacity for free. Scaling just happens, you don’t need to worry about maxing out a server CPU or RAM, you can invoke as many concurrent lambdas as you like (there might be some soft limits you need to have removed first and it’s worth looking at each of the cloud providers to verify the exact nature of this). This next one is incredibly important, zero server maintenance. This means you don’t need to pay to have a team of people there managing bare metal (if you roll your own servers), or spending a significant amount of team time patching VM’s, recovering from outage because the VM’s are going down for some reason. Seriously, this is one of the big ones, especially for none IT focused companies. Because functions need to be so small there is better motivation for teams to keep things small and simple, it requires discipline, but the benefits are a lot easier to see.
There are some issues with serverless computing that require thought. Architecting is a different beast now and getting things split in the right way is suddenly a lot more important. However, if you manage to work through those issues then there are significant advantages to going down the serverless route as I’ve highlighted above. One additional thing to consider though is that, if there is any part of your chain that isn’t serverless in some way then that will instantly be the main bottleneck, i.e. if you have a db instance rather than a serverless alternative, then the boxes/vm’s backing that are likely going to be the point of contention. You can manage this with good performance testing, but it’s worth keeping in mind.