Hierarchical caching

2009-01-09

Tobias Schlitt

One of the cool new features in the new 2008.1 release of the eZ Components library is hierarchical caching.

Until now, we supported several types of cache storages. Some of these utilize the file system to store data, others can use APC or Memcache. While file based caches are usually large, since disk space is getting cheaper and cheaper, they are also much slower than memory based caches. RAM caches in contrast are blazingly fast, but memory is still much more limited than disk space. Therefore it is sensible, that the most important data for a website is stored in memory while less important stuff gets cached on the disk.

image_1

The new ezcCacheStack class in the eZ Cache component provides an automatic way of realizing this. You simply stack together an arbitrary number of storages. The stack will store every item into all of the stacked caches. You can configure how many items may reside in a storage. A replacement strategy class takes care about purging a certain number items in case a storage runs full. On restore, the stack will fetch the desired item from the topmost cache it is still stored in.

Replacement strategies shipped with the eZ Cache component provide you with 2 well-known cache algorithms: Least Recently Used (LRU) and Least Frequently Used (LFU). The first one keeps track on when a cache item was last used and discards items that have not been used for the longest time, in case a storage runs full. LFU, in contrast to that, purges items that have been used least frequently. If none of these strategies fits your needs, you can always implement your own strategy, quite easily.

Using the cache stack with an appropriate replacement strategy allows you to simply ignore which items are stored where and simply use the stack as your only cache storage.

There are lots of other cool things in the 2008.1 package, which was released last Monday. We have 3 new components: Document, to parse and render different document formats, Feed, for creation and aggregation of XML feeds, and Search, which is a search engine abstraction layer, modelled after the PersistentObject component. Beside that, some other components got major new features. Feedback as usually highly appreciated. Enjoy!

Comments

Hello Tobias,
I was just looking at the overview, but wouldn't it also make sense to take into account the size of what you are going to cache before sending it through the layers? Also without looking into the actual code it looks like it defines the amount of items to cache not based on size limits.
Wouldn't you want to take the highest storage size and likely store it on the slowest cache (hard disk) since it would take up most of the cache in memory if put there?

Mike Willbanks at 2008-06-23

Hi Mike!
Thanks for your feedback.
Yes, we also considered to take the size into account. However, it is quite hard to determine the size of an object/array in PHP, if you don't know anything about the structure and cannot write it to a file. AFAIK you'd need to serialize the structure first and check the string length then. This would be quite some overhead and usually result in double invokation of the serialization, if you take e.g. the APC storage.
We chose the number of items as a good indicator, since you normally know what kind of objects are to be stored (e.g. models or pure text).
Further ideas in this direction are highly appreciated! :)
Regards, Toby

Toby at 2008-06-23