A Newb Learns Python

Python Pickle File Caching

I recently worked on writing a script that included an expensive query on the database. Every time I ran the script it did the query over again. Rather than hammer the database I decided I would cache the result in a file until I had finished writing and debugging the script. Below is the simplified version of how it worked.

import os
import cPickle as pickle

def cached_data():
    cache_path = "/tmp/cached-data.pickle"
    if not os.path.exists(cache_path):
        # The cache doesn't exist, create it and populate it
        result = generate_data() 
        cache_file = open(cache_path,'wb')
        # Write it to the result to the file as a pickled object
        # Use the binary protocol for better performance
        pickle.dump(result, cache_file, protocol=1)
        cache_file.close()
    return pickle.load(open(cache_path,'rb'))

def generate_data():
    """
    This function actually generates the data when it isn't cached.
    The data generated is often expensive to compute so caching helps with performance.
    """
    return []

Just fill in the generate_data() function with the expensive operation you are performing and then call cached_data(). After the first generation it will pull from the file cache.