AWS Lambda taught me to write better queries.
If you have experience working in a startup environment, especially bootstrapped ones with no mindblowing valuation or seed funding, you should understand that the most scarce resource is money, and most of the time we spend more time trying to save as much money as possible. Our stack is Django, MySQL and VueJS.
We had a dozen customers using one of our products and our server fees were growing by the day and were already hitting $100/mo to get the context, that's over 36,000 naira. That's not a lot, but we were struggling with our business model and were barely able to break even let alone make a profit. We had only 2 profitable customers who were considerably large and the other clients were merely consuming more resources. We needed to find ways to cut down our server cost (fast). Taking a look at our server configuration, we were only fully utilizing our servers between 1PM and 4PM when all clients are very active. After then, our server utilization goes down and by 8PM it's flat at below 5% until the next day when it begins to pick up by 8AM.
Well, it was time for the shift and we began moving to serverless with AWS lambda, the offer seemed juicy and our estimates showed we could cut down our server fees by over 60%, i.e eliminate paying for servers on weekends and at night. Also, we had better availability. Our diamond customer was having their data conversion from their service providers in a few weeks, their data conversion requires importing over 30,000pages of pdf document into the database. By the end of the conversion in a single day, we had nearly 1million database insert queries.
Given that with AWS lambda and aurora, you pay for what you use (execution time, memory, I/O and queries), and we were still testing before fully migrating to lambda, within a few days of repeatedly converting data from pdf, our bill jumped to over $20USD. I'll outline a few tricks we used to reduce this cost.
1. Read operations are cheaper
Read operations are cheaper to work with than write operations, we had to use read operations to reduce total write operations.
Here's what I mean; Whenever a new pdf document is uploaded and to be converted, we had to empty our db records and upload entirely new records, "WHY?", because each new upload invalidates the previous conversion. It's a bit complex to explain, but imagine a scenario where a hospital receives a list of people who have health insurance coverage and every month, this list changes (either people stop renewing their insurance, are deceased, have been fired or retired from service, or new people are enrolled), other information such as address and age could change and the fastest and most convenient way to go was just invalidate the last upload and create all records afresh.
How we used Read operations was to first check our database for only missing or updated new records and only insert new rows. This reduced the number of write records drastically and hence reduced our cost, but not very significantly.
2. Bulk operations instead of Transactions
Transactions are able to speed up database writes, especially when you're writing across several tables. However, with django there's something even faster
with transaction.atomic():
[Person.objects.create(**person) for person in people]
The above snippet is cool, but even better
Person.objects.bulk_create([Person(**person) for person in people])
Knowing there's very much more that can be done to improve, I'm still actively learning and if this sort of optimization interests you, a good resource would be Two Scoops of Django 3.x: Best Practices for the Django Web Framework