Re: Fast union/intersection of two sequences?

Reimer Behrends (behrends@buran.fb10.tu-berlin.de)
25 Jan 1995 03:49:23 GMT

Skip Montanaro (skip@automatrix.com) wrote:

: Is there any faster way to compute the union/intersection of two sequences
: than by looping over one list testing for inclusion in the other? For
: example:
[...]

Well, here's the code for intersection. Computing the union is slightly
longer, but works the same way. Time complexity should be O(n*log(n)),
where n = max(len(l1),len(l2)).

def intersect(l1,l2):
l1 = l1[:]
l2 = l2[:]
l1.sort()
l2.sort()

i1 = 0; i2 = 0; i = 0
n1 = len(l1); n2 = len(l2)
l = range(min(n1,n2))

while i1<n1 and i2<n2:
p = cmp(l1[i1],l2[i2])
if p<0:
i1 = i1+1
elif p>0:
i2 = i2+1
else:
l[i] = l1[i1]
i = i+1; i1 = i1+1; i2 = i2+1

return l[:i]

A possible (slight) bottleneck might be qsort() (which is used for
l.sort()), if it's implemented using 'quicksort'. If you can keep the
arrays sorted all the time, you can not only get rid of the bottleneck,
but also get down to O(n) time complexity.

As an aside, one might try to generalize the above scheme by converting
the lists to a better kind of priority queue, thus possibly reducing
the number of comparisons further.

Reimer Behrends