A Newb Learns Python

Careful with your set operators

Pythons conditional syntax (e.g. “if not even” or “if foo in bar”) is usually pretty natural. However, it is so natural that you may find yourself wanting to use it to perform set operations, bad idea…

>>> by_2 = set([i for i in range(10) if i % 2 == 0])
>>> by_2
set([0, 8, 2, 4, 6])
>>> by_3 = set([i for i in range(10) if i % 3 == 0])
>>> by_3
set([0, 9, 3, 6])
>>>
>>> # get the intersection of the two sets
>>> by_2 & by_3
set([0, 6])
>>> 
>>> # get the union of the two sets
>>> by_2 | by_3
set([0, 2, 3, 4, 6, 8, 9])
>>> 
>>> # PROBABLY NOT WHAT YOU WANT
>>> by_2 and by_3
set([0, 9, 3, 6])
>>> # behaves the same as
>>> 3 and 2
2
>>> # similarly...
>>> by_2 or by_3
set([0, 8, 2, 4, 6])
>>> # behaves the same as
>>> 3 or 2
3

Searching Your Python Command History

It took me far too long to realize that the Python interpreter supports ‘reverse-i-search’ just like in Bash. Just the standard “Ctrl+R” shortcut like in Bash.

For those not familiar with it, reverse-i-search lets you interactively search your command history. Instead of hitting up a bunch of times find a command you used a while ago, just hit Ctrl+R and start typing text in the command. It will match the last command you typed that contains that text. If you want to go to the previous match just hit Ctrl+R again.

>>> a = 1
>>> b = 2
>>> c = 3
>>> d = a + b + c
>>> 
>>> e = 4
>>> f = 5
(reverse-i-search)`d': d = a + b + c

How did I compute “d” again? The above is the result of me typing Ctrl+R and “d”.

Easily Readable Unix Timestamps In Python

If you need readable unix timestamps stop mucking with strftime and its impossible to remember formatting, just use time.ctime()

>>> import time
>>> time.time()
1334182778.770024
>>> time.ctime()
'Wed Apr 11 22:20:29 2012'
>>> t = time.time()
>>> time.ctime(t)
'Wed Apr 11 22:20:34 2012'

Python Pickle File Caching

I recently worked on writing a script that included an expensive query on the database. Every time I ran the script it did the query over again. Rather than hammer the database I decided I would cache the result in a file until I had finished writing and debugging the script. Below is the simplified version of how it worked.

import os
import cPickle as pickle

def cached_data():
    cache_path = "/tmp/cached-data.pickle"
    if not os.path.exists(cache_path):
        # The cache doesn't exist, create it and populate it
        result = generate_data() 
        cache_file = open(cache_path,'wb')
        # Write it to the result to the file as a pickled object
        # Use the binary protocol for better performance
        pickle.dump(result, cache_file, protocol=1)
        cache_file.close()
    return pickle.load(open(cache_path,'rb'))

def generate_data():
    """
    This function actually generates the data when it isn't cached.
    The data generated is often expensive to compute so caching helps with performance.
    """
    return []

Just fill in the generate_data() function with the expensive operation you are performing and then call cached_data(). After the first generation it will pull from the file cache.

Flatten a list

>>> nested = ["I", ["have", "some", "nested"], ["entries"], ["eek"]]
>>> import itertools 
>>> list(itertools.chain(*nested))
['I', 'have', 'some', 'nested', 'entries', 'eek']

Only works on a single level of indentation

Create a dictionary from a list of keys

>>> keys = ["C","G","F#"]
>>> # create the dictionary, value defaults to None
... {}.fromkeys(keys)
{'C': None, 'F#': None, 'G': None}
>>> # give it a default value
... {}.fromkeys(keys, True)
{'C': True, 'F#': True, 'G': True}
>>> 
>>> # you can also use dict() in place of {}
... dict().fromkeys(keys, True)
{'C': True, 'F#': True, 'G': True}

Python Interpreter Utility Files

I keep a Python file around with a bunch of imports, some helpful utility methods and a bunch of code to setup database connections. Rather than having to copy-paste that code into my interpreter each time I start a new one up,  you can just use “execfile” to bring it in all at once.

For example say my utils file looks like: 

import os, sys
from time import time
from datetime import datetime
def timeago(t):
    return str(datetime.now() - datetime.fromtimestamp(t))

I can just pop open a python interpreter and type:

$ python
>>> execfile("/home/jonathan/bin/pyutils.py")
>>> time()
1321479210.8605671
>>> timeago(time() - 900000)
'10 days, 10:00:00.000008'
>>>

Its a really simple way of bringing in a bunch of imports you may need or a set of methods you use all the time.