Tuesday 10 September 2013

Optimize squid caching Hit Rate

To optimize squid cache and get a bigger cache Hit ratio , we need to tune some configuration. The default configuration just run without optimization in cache usage and bandwith saving.

First, if our target is have a bandwidth saving, we need to configure the cache_replacement config.

The options are :

Least Recently Used (LRU)
This is the default method use by squid for cache management. Squid starts by removing the cached objects that are oldest (since the last HIT). The LRU policy utilizes the list data structure, but there is also a heap-based implementation of LRU known as heap lru.

Greedy Dual Size Frequency (GDSF)
GDSF (heap GDSF) is a heap-based removal policy. In this policy, Squid tries to keep popular objects with a smaller size in the cache. In other words, if there are two cached objects with the same popularity, the object with the larger size will be purged so that we can make space for more of the less popular objects, which will eventually lead to a better HIT ratio. While using this policy, the HIT ratio is better, but overall bandwidth savings are small.

Least frequently used with dynamic aging (LFUDA)
LFUDA (heap LFUDA) is also a heap-based replacement policy. Squid keeps the most popular objects in the cache, irrespective of their size. So, this policy compromises a bit of the HIT ratio, but may result in better bandwidth savings compared to GDSF. For example, if a cached object with a large size encounters a HIT, it'll be equal to HITs for several small sized popular objects. So, this policy tries to optimize bandwidth savings instead of the HIT ratio. We should keep the maximum object size in the cache high if we use this policy to further optimize the bandwidth savings.

So the configuration can be tuned, if you need bandwidth saving or Hit Ratio, its up to you the administrator
Here are the config for saving more bandwidth :

memory_replacement_policy     lru
cache_replacement_policy heap LFUDA
  
Then we need to cache the static files usually found in website as .css / .js / .jpg / .png / .gif . Usually the file rarely changed even in a dynamic websites. Some of the website provide caching information in the response the webserver provide, but sometimes they are not.

We can also override the caching information returned from a website, so we can utilize our cache server more optimal.

The config will be in refresh_pattern . With this, we can enforce the caching of some file extension, because squid use regex to match the rules we create inside squid.

The default signature are :

refresh_pattern [-i] regex min percent max [OPTIONS]

So for example we want to cache all jpg file and ignore caching options provided by the webserver response in the header, so the config will be :

refresh_pattern -i .jpg$ 0 60% 1440 ignore-no-cache ignore-no-store reload-into-ims

The meaning areit will match .jpg file with case insensitive, min time consider fresh is 0 min, and the age of file is 60% more from the Last-Modified-Date in header , and the age is more than 1440, the file considered stale in cache.

The other params is squid will ignore the header information , like ignore-no-cache , ignore-no-store .
The reload-into-ims will make squid to convert the no-cache directive in HTTP Headers to the If-Modified-Since Headers. This will used when the Last-Modified headers not available from webserver response.


0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More