HTTP Caching
Caching in HTTP can be tricky sometimes. Getting it right following the spec is not all that difficult, but in reality, different browsers…
Caching in HTTP can be tricky sometimes. Getting it right following the spec is not all that difficult, but in reality, different browsers and versions usually upset us.
Browsing through stackoverflow, you could easily find many having the same struggle. We just can’t or don’t have time to figure out all the edge cases ourselves.
So here are the hard and fast rules I figure out and will follow in the future:
Static Resources
Content that never change: JS and CSS files, images, and any kind of binary files all fall in this category.
By never, I really mean never. It’s common best practice to versionize static resources. Whenever they change, so do their URL change.
Here are the simple rules for static resources:
Embed fingerprint in either the file name or path. Avoid using query string for the fingerprint. Also, ensure the generated URLs differ on more than 8-character boundaries.
Use these HTTP headers:
Cache-Control: public, max-age=31536000
Expires: (a year from now)
ETag: (based on content)
Last-Modified: (some time in the past)
Vary: Accept-Encoding
It’s that simple for static resources.
Dynamic Resources
Depending on the application’s requirement on freshness and privateness, different cache control setting should be used.
For non-private and constantly changing resources (think of stock ticker), the following could be used:
Cache-Control: public, max-age=0
Expires: (now)
ETag: (based on content)
Last-Modified: (some time in the past)
Vary: Accept-Encoding
The effect is that the resource could be cached publicly (by browsers as well as by proxy servers). Each time before browsers use the resource, they would check whether there’s a newer version and download it if there is.
Note that with this, browsers have some flexibility on revalidation. Typically when users click back/forward buttons, browsers do not revalidate but instead just use the cached version. If you’d like more strict control, say browsers must revalidate even when clicking back/forward buttons, use:
Cache-Control: public, no-cache, no-store
Not all dynamic resources are to become stale right way. If they can be fresh for at least 5 minutes, use:
Cache-Control: public, max-age=300
With this, browsers only revalidate after 5 minutes. Before that, cached content is used directly. If strict control over staleness is also required after 5 minutes, you can add must-revalidate
:
Cache-Control: public, max-age=300, must-revalidate
For private or per-user content, replace public
with private
to avoid the content being cached by proxies:
Cache-Control: private, …
Cache-Control and Expires
When both Cache-Control
and Expires
are used, Cache-Control
takes precedence.
Using both Cache-Control
and Expires
is meant to gain wider support (by different browsers and versions). Of course, they should be configured to mean the same freshness to avoid any confusion.
See Expires: vs. Cache-Control: max-age.
ETag and Last-Modified
These headers are used when browsers do revalidation. Basically, browsers just blindly store the values of these headers received from the server, and later when validating, browsers send conditional request with these values to the server (via headers If-None-Match
and If-Modified-Since
, respectively).
Note that validation only occurs after the resource expired.
It is up to the server when both header If-None-Match
and If-Modified-Since
are present in conditional requests. However, since it is the server generates ETag
and/or Last-Modified
, in practice, there’s not much problem. Most browsers do send both if available.
See What takes precedence: the ETag or Last-Modified HTTP header?
One frequent suggestion is to avoid the use of ETag
. This is not always a valid suggestion. ETag
indeed provides more precise control on whether content is really changed. The default Apache method for generating ETag
takes file inode, size, and last modified date time as input. This makes the generated ETag
value pretty useless in a load balanced environment, because each server will generate a different ETag
value for the same file. This is probably the only issue that causes a lot of people to disable ETag
completely, which is not really necessary as long as a single unique ETag
value is generated for exactly matching file content.
See Should your site be using etags or not?
Manually hitting Ctrl-R
When hitting Ctrl-R, browsers send request with the following headers to check if it needs to refresh the cache content:
Cache-Control: max-age=0
If-None-Match: …
If-Modifed-Since: …
Note that this is not really just talking to the original server, but meant for any proxy servers along the way. Essentially it revalidates the content. If 304 is replied, browser uses the cached content.
Vary: Accept-Encoding
This header might be unfamiliar to some.
When a resource is gzip compression enabled and is cached by proxy servers, clients not supporting gzip compression would get incorrect data (that is, compressed) without this. It instructs the proxy servers to cache two versions of the resource: one compressed, and one uncompressed. The correct version of the resource is delivered based on the request header.
Another reason is the reality: Internet Explorer does not cache any resources that are served with the Vary
header and any fields but Accept-Encoding
and User-Agent
. So adding this header in exactly this way is to ensure these resources are cached by IE.