Spawning workers based on queue workload
Main Thread • 5 min read
Earlier this week I deployed a feature which automatically spawns new worker servers for Shift based on the job queue workload. This reduced server costs by 97% and wait times by 80%.
In this post, I want to share the backstory leading up to this feature. If you're interested in the development of this feature, I wrote most of the code in this series of live streams.
The feature
When someone runs a Shift it is pushed onto a job queue. The job is picked up by one of the worker servers and processed. All this is managed by Laravel and Horizon.
The challenge is these jobs are pretty intense. They are very file I/O heavy, and require more CPU and memory. Each job might take anywhere from 20 seconds to 10 minutes. And multiple jobs may run concurrently.
Depending on the worker specs, there are only a certain number of jobs I can process at a time. To process more would require increasing the (v)CPUs and memory of the server. Easily solved by paying more money.
So, the feature would be to add more workers based on the job queue workload. Although these workers would be smaller (less CPU and memory), there would be more of them. The classic scale horizontally, instead of vertically. When the workload dropped, extra workers would be removed.
YAGNI, YAGNI, OK
Y'all know me, I'm all about YAGNI (you aren't gonna need it). I call YAGNI on a lot of things. Sometimes I call YAGNI simply because other developers don't call YAGNI enough.
I called YAGNI on this feature. It bounced around my Todo List for over two years. Each time it bubbled up it was deprioritized. Either other features were more pressing or because I called YAGNI.
I felt comfortable doing so because it truly wasn't needed. Two years ago there weren't many subscribers to a Shifty Plan. So aside for a few days after a new Laravel release, I wasn't maxing out my current workers.
Last year, while subscribers had grown, there was no Laravel release. So the workers only maxed out during the weekly subscriber automation. This automation runs on a separate worker. So it doesn't affect the Shifts coming in from running immediately.
This year, with the release of Laravel 9 (which had been postponed for 18 months) there was an increase of runs. Instead of the few day surge following the release, it was over a week. In addition the new release led to more subscriptions. I'd also added additional services like the Test Generator, CI Generator, and Workbench. All of which were more intensive to run.
So when this feature bubbled up to the top of the list again, I said, "OK".
It not about cost, it's about time
The additional worker servers were $15/month. While the feature interested me, it was impossible to justify the development cost. Sure, Shift is just me. So I can spend as much or as little time as I want on it.
It's true, Shift is a Company of One. But it's important to remember the “Company” part. Companies need to be profitable to survive. It's easy to think I individually could spend my time building any feature I want. But within the context of the company, I could be wasting resources. One little feature doesn't seem like a big deal. But enough wrong features and Shift might not be what it is today.
All this is to say that it is important to value your time. That's actually one of Shift's biggest competitors. Despite Shift's incredible value, devs want to upgrade their application manually. Simply to avoid spending $19. Some might think they're saving money. That's because they don't value their time.
I do. Time is the most precious resource I own. I'd rather spend time with my family, doing live streams, or woodworking than on a feature that no one needs.
Said another way, I don't like wasting time. And bringing that back to the servers, when purchasing a server for a month it's unused a far majority of that time. One of the things that sold me on finally building this feature was efficiency. I'd add the servers when I need them. Then remove them when I don't.
A win-win
From the intro hook, we know building this feature was a win-win from a savings and user experience. So, what were the wins?
Shifts run in real time. However, some of the subscriber automation takes longer to run. Specifically the weekly automation which runs any time Laravel tags a new release. This normally happens every Tuesday. But really at any time they can tag a release. Some weeks they tag multiple.
To get through the automation for all of the subscribers could take up to 4 hours. Which initially justified creating its own worker server. But going back to the intensity of these jobs, in order to reduce the time I would need to increase the size of the server. Therefore paying more.
This feature allowed me to scale horizontally, instead of vertically. In this case, instead of one larger server processing all the jobs, multiple smaller servers would process the jobs. This switch actually reduced the runtime from the weekly automation from 4 hours to 32 minutes.
In addition because I'm spawning new workers during demand and destroying them during lulls, I pay the hourly rate. Scaling vertically I pay a continually increasing amount monthly ($30/month, $35/month, etc). With most of that time the server sitting idle.
Scaling horizontally, I use a smaller server, but spawn up to 3 of them. These servers are roughly $0.04/hour. That means instead of $30/month, I pay $0.48/month ($0.04/hour/server x 3 servers x 4 hours/month).
Shift is all about automation
The savings were impressive. They surpassed the best of the best case scenario when I was running calculations to justify the feature. It's really blown me away.
Yet aside from the user experience improvements, the cost savings are a bit laughable. Shift has made over $1,000,000 in revenue. Saving $29/month is arguably inconsequential. Heck, the fact that I was only paying $30/month was already good. So paying $0.48/month is ridiculous.
That's kind of fits for Shift though - ridiculous automation for a ridiculous value. Automatically spawning and destroying servers based on queue workload means one less thing I have to worry about. Less worry keeps working on Shift and running the business fun.