Introduction
Caching is an important concept in building scalable systems. Systems that have caching implemented tend to perform better than those that do not have caching implemented. Hopefully, this article will expose some basic concepts about caching.
What is Caching?
Before we define caching, we would define a cache. A cache is a software or hardware that is used to store something, usually data, in a temporary computing environment.
From our definition of cache above, Caching is the process of temporarily storing snapshots of some data.
What type of data can be cached?
The types of data that can be cached include static and dynamic data.
Dynamic data are data that changes more often and has an expiry time. Examples of this include; prices of stocks or digital money
Static data are data that does not change more often. Examples of these are images, CSS files, script files
Where can we use caching?
Database caching: This allows systems to increase throughput and lower the data retrieval latency associated with databases. The cache acts as an adjacent layer to your database that your app can access before hitting the database in order to improve performance
Content Delivery Network(CDN) caching: CDNs refer to a geographically distributed group of servers which work together to provide fast delivery of internet content. As our applications are exposed to high traffic, HTML pages, JavaScript files, stylesheets, images and videos can be cached on a CDN to allow for quick transfer to load these contents.
Domain Name System(DNS) Caching: Accessing domain names on the internet involves querying DNS servers to resolve the IP address associated with the domain name. The DNS lookup process is cached on different levels. These include; operating systems(OS), Internet Service Providers (ISP) and DNS servers.
Application Programming Interface(API) caching Applications built with APIs(especially RESTful web services) that expose resources are also cached. This can be due to the high amount of requests on the API from consumers to consume resources. An effective strategy to reduce the load on the API is to cache API responses that return data that do not change frequently.
Advantages of Caching
Some benefits of Caching include
performance improvement on the system
increased response time
reduced use of resources to return data
Disadvantages of Caching
Some cons to caching include:
returning of stale data
complexity of data invalidation
increase in the complexity of the system
Caching Strategies
Cache Aside: This strategy works alongside the database to reduce hits on it. When a user makes a request, the system first looks for the requested data/resource in the cache. If present, data is fetched from the database. If not(cache miss), the data is fetched from the database and the cache is updated with the data. This strategy is suitable for read-heavy situations
Read Through: This strategy is similar to the cache aside strategy. The major difference is that, when a cache miss occurs, the library or framework is solely responsible for maintaining consistency with the backend or database.
Write Through: With this strategy, information written to the database is first written to the cache before it is written to the database. This might increase the time to create data in the system but the system maintains high consistency between the database and the cache.
Write Back: In this strategy, information is written to the cache and after some delay, it is written to the database. A potential risk when using this strategy can be, if cache fails before the database is updated, then data could be lost.
Conclusion
Now that we know some of the basics of caching, we can apply them in our applications knowing why, what strategies to use and some of the pitfalls that come with caching.
References
Cache, Tech Target
Caching Overview, Amazon