Fast union/intersection of two sequences?

Skip Montanaro (skip@automatrix.com)
Tue, 24 Jan 1995 02:47:38 -0500

Is there any faster way to compute the union/intersection of two sequences
than by looping over one list testing for inclusion in the other? For
example:

l1 = [1, 2, 3, 4, 5]
l2 = [1, 3, 5, 7, 9]

union = l1
intersection = []
for e in l2:
if e not in l1: union.append(e)
if e in l1: intersection.append(e)

This works fine for short sequences, but fails miserably for long ones. (My
lists are each several hundred elements long and I wind up often with a
resulting union of well over 1000 elements. The elements are themselves
Python classes, so there is a __cmp__ function to contend with that may well
be what's killing me.)

I sped things up somewhat (read: I was willing to wait for the process to
complete :-) by sorting the lists and using a binary insert:

def binsert(list, e):
if len(list) == 0: list.append(e)
if type(list[0]) != type(e):
raise TypeError
l = 0
h = len(list) - 1
while l <= h:
m = l + (h-l)/2
mid = list[m]
if mid < e:
l = m + 1
elif mid > e:
h = m - 1
else:
return

list[l:l] = [e]

Before I go and reimplement someone else's wheel or wander off in fruitless
search of the unobtainable, can somebody answer:

1. Is it possible to speed this process up by recoding it in C or will
accessing the list and its components be the bottleneck?

2. Has somebody already implemented reasonably fast set
intersection/union for Python lists?