I had some fun today; one of the responses I got after yesterday's
question (how to use "filter" with arguments) stated what
I had overlooked: why not just loop over all the elements in
the dictionary, see if they are in bounds, and if so, append
to the new list??
Just to remind you: I have a file which looks like the following:
independent variable # counts
-------------------- --------
253.0 8.000
255.0 7.000
257.0 9.000
259.0 4.000
261.0 11.00
263.0 14.00
265.0 16.00
267.0 40.00
269.0 75.00
271.0 90.00
273.0 138.0
275.0 158.0
277.0 129.0
279.0 141.0
281.0 123.0
283.0 94.00
285.0 74.00
287.0 58.00
289.0 60.00
291.0 43.00
293.0 52.00
295.0 35.00
[ ... ]
What I wanted to do was to find the region (of a user-specified
width) which contained the largest number of counts. You
can imagine sliding a window of width 10 over this list, and
adding up all the y-values in this window; then find the position
of the window where this sum is maximum.
What I had was a class defintion as follows:
class checklims:
def init(self,lolim,hilim):
self.lolim = lolim
self.hilim = hilim
return self
def checkit(self,val):
if val >= self.lolim and val < self.hilim :
return 1
else :
return 0
which was used in the following program fragment. Here _counts_ is a
dictionary where the values are the y-values in the data example above,
and the keys are the x-values (both converted to floats.)
_gat_hw_ is one-half the width of the window, and I simply slide the
window from one end of the spectrum to the other.
binlist = counts.keys()
binlist.sort()
start_center = binlist[0] + gat_hw
end_center = binlist[-1] - gat_hw
binwid = binlist[1] - binlist[0]
curr_center = start_center
sum = { }
while curr_center <= end_center :
filter_obj = checklims().init(curr_center-gat_hw, curr_center+gat_hw)
filter_func = filter_obj.checkit
newlist = filter(filter_func,binlist)
current_data = { } # data dict for current window
for i in newlist :
current_data[i] = counts[i]
stats = sstat(current_data)
# print curr_center, stats
sum[curr_center] = stats[0]
curr_center = curr_center + binwid
Then later I ask for the maximum of sum's values, and find the corresponding
index.
This program was taking about 29 seconds of user time to run on one
data file. I then tried the "simple" solution discussed at the
beginning of this post. Here is the same relevant program fragment:
binlist = counts.keys()
binlist.sort()
start_center = binlist[0] + gat_hw
end_center = binlist[-1] - gat_hw
binwid = binlist[1] - binlist[0]
curr_center = start_center
sum = { }
while curr_center <= end_center :
lolim = curr_center - gat_hw
hilim = curr_center + gat_hw
current_data = { } # data dict for current window
for i in binlist :
if lolim <= i < hilim :
current_data[i] = counts[i]
stats = sstat(current_data)
sum[curr_center] = stats[0]
curr_center = curr_center + binwid
This version of the program ran almost three times as fast!! (10.5 user
seconds). The speedup is even more significant when you realize
that just loading python, optimizing the code, and executing the
pipe-based read are already contributing about four or so seconds.
Lesson: believe what the FAQ says: you can really speed up your
program by looking at the Python itself; some operations are much
more expensive than others.
JT