Sales Visit Tracking System Design: Exploring EC2 vs. Serverless Architectures

Balancing Cost, Performance, Reliability and Complexity Between EC2-Based and Serverless Architectures for Handling Critical Data

Zaid Akel
6 min readSep 21, 2024

Problem Statement

A customer asked to build a sales visit tracker application for the sales department of a medical device company. The company has 30 salesmen, each one of them visits 15 medical clinics a day, and has to record their visit details into the new application. The customer already has a database on their on-premise server, which must be used to store the visit details, visit data is crucial; the customer can’t afford to lose any visit details and seeks to keep the hosting cost as low as possible.

Expected Traffic

  • Transactions per day: 30 salesmen X 15 visits = 450
  • Transactions per month: 450 X 26 days = 11700
  • Seconds per day: 8 working hours X 60 minutes X 60 seconds = 28800
  • Average TPS: 11700 (transactions) / 28800 ≈ 0.4

Payload

{
"clientId": "102",
"userId": "23",
"proposedDevices": [1, 2, 5],
"soldDevices": [1, 5],
"dateTime": "2023-09-20T15:45:30Z",
"clientNotes": "",
"additionalNotes": "",
.....
}

System Design Acceptance Criteria

In addition to meeting the functional requirements, the system must be fault-tolerant: failover mechanisms should be in place to ensure that the system continues to operate seamlessly even during outages, scalable: account for higher number of transactions; assume the company hires 5 more salesmen, and each increases their productivity to do 20 visits, the system should continue to operate normally or require minimal infrastructure modifications, performance: response times for adding new visits should be minimal, even if all salesmen added a new visit on the same second (30 TPS), cost-efficiency: the new system must be up and running during working hours, the customer wishes to avoid incurring additional costs outside working hours.

Option #1: Three-Tier Architecture

Build a new web application, deploy it on an EC2 machine, and leverage the customer’s database by creating new tables for users and configuration. Salesmen will access the application, which in its turn will store the new transactions synchronously on the customer’s database, and confirm to the salesman on the other end that their visit is stored successfully.

[X] Fault-tolerant: Any failure in connecting to the on-premise server will lead to data loss. Unless the application detects such failure and either asks the user to retry again later, or stores it locally to be resent to the database later. The first option won’t be the best in terms of user experience, and the latter will add complexity to the application by building a retry mechanism.

[X] Performance: Given the sync nature of this architecture, salesmen will need to wait until the data is stored in the on-premise server, which might take long due to network latency or the on-premise server takes more than expected to respond.

[X] Scalability: This architecture can be easily scaled vertically by adding more resources to the server, or horizontally by replicating the EC2 machine and adding a load-balancer in front of those EC2 machines. However, scalability is still limited by the on-premise database, which might struggle to handle traffic spikes.

[✓] Cost: Given the low traffic, an on-demand t4g.small EC2 instance should be enough, which costs 12.26USD (as of the time of writing this article). Reserved EC2 instances are even cheaper, but committing for 1–3 years is not an option.

[✓] Complexity: This design is straightforward, comprises a one new component (EC2). However, to achieve the required reliability, we could store the visit's data temporarily and build a retrying mechanism, which will increase the complexity of the new system.

Option #2: Serverless Architecture

A loosely coupled system that doesn’t directly depend on the customer’s (on-premise) server. The proposed architecture consists of compute and storage resources, and asynchronously stores the visits in the customer’s database.

  1. CloudFront: the entry point for the application. It has two behaviors configured based on the requested URL. The first to process frontend assets from an S3 bucket, and the second to handle visits requests from an API gateway.
  2. S3: Two buckets, one for storing frontend assets, and the second to store visits data as a backup mechanism.
  3. DynamoDB: store configuration and users data
  4. API Gateway: triggers the Lambda function to process users requests
  5. Lambda: handles salesmen requests to log in, add new visits and return lookups.
  6. SQS: acts as the integration point between the new application, and the customer’s on-premise database. Maximum retention time for unconsumed visits is 14 days.
  7. Visits Listener Application: a background job that listens to the defined SQS for new visits, consumes those visits and inserts them into the customer’s database.

[✓] Fault-tolerant: With 99.9% or higher uptime SLA for the used AWS services, we shouldn’t worry about the system’s availability in terms of infrastructure. In case the customer’s on-premise server was down for any reason for more than 14 days, visits can be recovered from the S3 bucket. Lastly, if the visits listener application failed to consume those visits, they will be automatically returned to the queue for predefined times as a retry mechanism, and eventually moved to a dead letter queue if they aren’t processed successfully.

[✓] Performance: Given visits are sent asynchronously to the customer’s database, salesmen will only wait for the Lambda processing time, which includes inserting a new message to SQS and storing a new file to S3.

[✓] Scalability: Leveraging serverless services, all the used AWS resources can be automatically scaled.

[✓] Cost: Assuming a monthly transactions count of 15,000 based on the anticipated traffic on the 11700 transactions per month and leaving room for expansion, the monthly cost should be less than $1 per month. Let’s check the breakdown, which are based on AWS Calculator considering Always Free services.

  • CloudFront ($0): free tier includes 1 TB of data transfer out to the internet, and 10 million HTTP(S) Requests
  • Lambda ($0): free tier includes 1 million free requests per month, up to 3.2 million seconds of compute time
  • DynamoDB ($0): free tier includes 25 GB of storage, and read/write capacity enough to handle up to 200M requests per month.
  • API Gateway ($0.53): for simplicity, let’s assume other API calls, such as login and lookups, will increase API requests by 10X, reaching to 150,000 REST calls per month.
  • SQS ($0.1): pushing 15,000 requests to SQS and consuming up to 1GB per month through the internet
  • S3 ($0.17): let’s assume 150,000 GET requests to serve frontend assets transferring a total of 10 GB, 15,000 PUT requests to store visits and total storage of 1GB monthly

[X|✓] Complexity: This design proposes adding 5 more resources compared to the previous one. The complexity of this design depends on the team’s familiarity with cloud and AWS services. However, you’ll get some features out of the box, such as retrying on failed messages, throttling through API gateway and caching frontend assets at the CloudFront distribution level. Also, the operational overhead should be lowed, since the engineering team don’t need to manage infrastructure directly.

To conclude, I believe that the second option adds slightly more complexity, but it is justifiable for all the other non-functional requirements. Comparing both options, costs saving reaches up to 12X for 15,000 transactions leveraging AWS free tier, and the second option will remain cheaper even if the number of transactions quintupled, but what if we are building a software that receives 15 million transactions per month, would serverless architecture remain cheaper?

--

--

Zaid Akel
Zaid Akel

Written by Zaid Akel

Technology leader & consultant | Working @ Amazon | Ex-Expedia | Passionate about growing engineering teams, building scalable solutions and cloud computing

No responses yet