Python Pickle File Caching
I recently worked on writing a script that included an expensive query on the database. Every time I ran the script it did the query over again. Rather than hammer the database I decided I would cache the result in a file until I had finished writing and debugging the script. Below is the simplified version of how it worked.
import os
import cPickle as pickle
def cached_data():
cache_path = "/tmp/cached-data.pickle"
if not os.path.exists(cache_path):
# The cache doesn't exist, create it and populate it
result = generate_data()
cache_file = open(cache_path,'wb')
# Write it to the result to the file as a pickled object
# Use the binary protocol for better performance
pickle.dump(result, cache_file, protocol=1)
cache_file.close()
return pickle.load(open(cache_path,'rb'))
def generate_data():
"""
This function actually generates the data when it isn't cached.
The data generated is often expensive to compute so caching helps with performance.
"""
return []
Just fill in the generate_data() function with the expensive operation you are performing and then call cached_data(). After the first generation it will pull from the file cache.