Learn the advantages of scaling at the frontend to deliver user experiences more consistently, faster, and with higher quality.
When you are dealing with an application visited by millions of people every day, with hundreds of transactions per second, the focus is always on scaling the backend, resiliency, and virtual machines tuning, but very few actually talk about using cache tiers to scale at the frontend.
On the scale backend solution, we have a huge cost of operations and infrastructure to maintain in order to keep a world-class availability application. We always talk about capacity planning and scalability to be able to handle requests with an increasing amount of traffic. We look at CPU cycles, threads, the amount of HTTP requests, all very important points to measure, monitor and plan around, but we never focus on scaling the frontend, and in most cases, that subject is always off the table.
Scaling at the backend involves allocating enough machines (physical and/or virtual) or enough resources to handle the large amounts of expected traffic. This means a large infrastructure to monitor and maintain, and if we have a lot of nodes, then we must have a load balancer too, which will end with a new layer to maintain and monitor.
We all know that cache allows us to serve content faster, with fewer origin hits, and even deliver higher levels of availability, but we use it as is and sometimes not in an intentional and intelligent way. The good news is, we do not need a huge infrastructure to maintain and monitor it, because far fewer hits are hitting the origin.
When working with frontend, we have three main types of cache, defined by where the cache is set.
1. Browser cache
With browser cache, the browser reads the server response, checks the cache control rules, and stores the response on the user’s computer. Then, for subsequent requests, the browser does not need to go to the server, as it gets the content from the local copy.
Browser cache is the fastest cache to retrieve and the easiest to use, but it is the one that we have least amount of control over. Because we can not invalidate browser cache on demand, final users have to clear their own cache. In addition, certain browsers may choose to ignore rules that specify not to cache certain content in favor of their own strategies, like offline browsing.
Figure 1 – Firefox cache
2. Content Delivery Network (CDN) cache – Proxy cache
With Proxy cache, the request for content hits this cache layer, and cached content is served rather than the original server.
It is time to think about a Content Delivery Network (CDN) and Edge caching. The CDN purpose is to provide availability and performance for content served over the Internet. There are several ways that this is accomplished, from providing Global Traffic Management (GTM) services to route content to the closest or fastest datacenter to providing Edge serving.
When a user in San Francisco tries to reach your site in your datacenter in Hong Kong, the request travels across numerous hops. Each hop is a router connected to the Internet, and with each hop, the request is adding tens or even hundreds of milliseconds of latency. The routers are intelligent enough to find the faster route, but you can minimize the latency with Edge caching.
Edge caching
Edge caching is where a CDN will provide a network of geo distributed servers that in theory will reduce time to load by moving the serving of the content closer to the end user. This is called Edge serving, because the serving of the content has been pushed to the edge of the networks, and the servers that serve the content are sometimes called Edge nodes.
With Edge caching, your datacenters are located in Hong Kong, but the CDN has an Edge node in Miami that mirrors your content. You have eliminated all the hops from New York to Hong Kong to just a few hops and a few milliseconds of latency, therefore reaching the end user faster.
Now, you just need to set the right Cache-Control and ETag headers to serve your content. When you combine the benefits of GTM and Edge caching, you drastically increase your potential uptime.
If you have two or more datacenters that host the content and you have a failure on one of them, the CDN will notice that your origin is not responding and automatically switch to your good datacenter. Even if all your datacenters go offline, the CDN will use the last successful response cached, better known as Last Known Good.
Another benefit of using a CDN is the offload traffic, sometimes called the CDN Cache Hit Ratio, which is the amount of traffic absorbed by the CDN that never touches the origin.
The ratio is calculated by dividing the total number of offload requests or cached responses with the total number of requests. If your CDN offload is 90 % and you have 25 million requests daily, the CDN absorbs 22.5M requests, and your origin just responds as 2.5M requests. That means you start handling 289 requests per second, and with the CDN, you handle 28 requests per second, drastically reducing the amount of infrastructure you need on the origin.
3. Application cache
With the application cache layer, or the cache available for your application, you could make web calls, API calls, or database calls without having to make those calls repeatedly. This is generally implemented at the server side and will make your web server respond to requests faster because it doesn’t have to wait for upstream to respond with data. We could name a few like memcached, database cache like redis or couchbase, your local disk, etc.
Cache rules
The most important thing to think of when caching your content is how frequently that content is updated. All sorts of content can benefit from caching, even if the TTL is small. Therefore, it is worth performing an analysis of frequency of usage and updates and setting cache rules based on that.
Types of cache
- Cold cache: an empty cache that results in mostly cache misses.
- Warm cache: cache that started receiving requests and that has begun retrieving objects and filling itself up.
- Hot cache: all cacheable objects that are retrieved, stored, and up to date.
Cache always starts cold, either with no objects stored or with objects stored stale. When the requests start coming in, the server retrieves the object and fills the cache. A hot cache will start to cool and turn off as time goes by and the content starts to expire.
A cache is considered fresh if its creation date is within the max-age. That window is the Time To Live (TTL) of the object.
Types of content
- Static Content: This is the most obvious content to cache because it is shared across users and does not change often. It includes fonts, images, CSS, and JavaScript files that are shared and will not be updated frequently. You could adjust their cache-control rules for an immediate and noticeable improvement of your application performance.
- Personalized content: The most challenging scenario is how to use cache on sites that are primarily made up for personalized content. You can conduct a user study to see how your test group uses your application.
Common problems
While caching is a great approach to addressing scale and performance, it comes with its own unique set of potential issues. The following are the most common of them:
- Bad response cached: You got an HTTP 500 error, the response was cached, but somehow, no alarms were raised. Even your Application Performance Monitor (APM) is reporting no errors, so your CDN cached the response for the next seven days.
Sometimes, we serve errors with the wrong response code, the application crashes, and we respond with an HTTP 200 ok response. You have to be careful about the headers you are returning on your responses. You have to respect the HTTP status codes to avoid these kinds of problems.
- Storing private content: You set up the cache rules wrong, and now you are caching private content for 15 minutes. Everybody goes crazy about this error, and you start getting calls from angry customers complaining that they are seeing the wrong information.
All your team must be educated about how your CDN works, how cache works, and how important customer privacy is. Test your CDN, and you should have an environment with the same cache rules as your production environment.
- GTM is ping-ponging between datacenters: You are in charge of a site with a lot of visits, you trust in your CDN, and everything is working perfectly until something goes wrong. Someone changed the cache rules in the last deploy and forgot to turn it on for certain assets, the number of requests spikes, and it is basically a DDoS to each of your servers that are running out of memory and starting to go down, starting to reboot, and continuing to go up and down.
You have to be aware of how cache impacts your infrastructure. If your infrastructure is too lean, you may have a full system failure if something goes wrong with your CDN. Capacity testing and planning ahead of time without cache turned on would let you know your upper limits of scale.
You must have a Contingency Plan that includes these:
- Invalidating cache
- Fingerprint the URLS
- Kill switch
If you invalidate the cache, you have to be sure the problem was solved before invalidating it, because once it is invalidated, the cache goes cold, and the warm process will start. If you fingerprint your URLs (add a unique ID to each one), you can know which part of the cache is invalidated without deleting all your cache.
Your CDN provider must give you a kill switch to purge all your cache in case of an emergency. This is an extreme and radical call to nuke all your cache and return to your origin.
To sum up
Every architecture and system is different, and with all the necessary information to start implementing a frontend scaling system, you can enjoy the benefits of the frontend caching, but be sure to follow these simple steps:
1. Evaluate your architecture
Which architectural philosophy is your application leaning toward? Are you backend heavy? Does every click on your site refresh and HTTP round-trip? If so, you’ll get minimal benefits scaling to the frontend as is. Your first step should be to migrate to a more modern, web-friendly architecture where you can take advantage of innovations that have taken place over the last five years, like asynchronous loading of content and RESTful APIs.
This will allow you to make your frontend highly cacheable, and you would only need to make roundtrips to call APIs asynchronously without refreshing your page and interrupting the experience.
2. Cache your static content
When you look at each page of your website using the web developer tools on the Network tab, do you see any 304 code returned? No? Why not?
The best way to make sure all your static content gets cached is to store all of it under a certain directory and apply the appropriate cache rules.
3. Evaluate a CDN provider
Google is your friend, and most of the CDN providers out there have free trial accounts. Sign up and try their benefits for yourself.
- Test it with a segment of your traffic in the CDN hosted application,
- Add the CDN as one of your datacenters in your round robin rotation, or
- Go for all and point all your traffic to the CDN
Compare your data, look at your web performance metrics, and look at the CPU usage for your origins with and without CDN.
Comments? Contact us for more information. We’ll quickly get back to you with the information you need.