# [Dev] Profiling techniques

Phillip J. Eby pje at telecommunity.com
Fri Sep 9 14:38:08 PDT 2005

Due to frequent net outages during my attempt at an IRC profiling clinic
earlier this week, we didn't get to go through the basic stats-printing
functions that day.  So Katie asked if I would write up a summary of the
stuff that didn't get covered, so that folks could bring profiling data for
part 2 of the clinic.

We had gotten as far as running Python with "-im profile_tests testname" in
order to profile one or more unit tests, and get into a Python prompt with
a stats object:

\$ python -im profile_tests
application.tests.TestSchemaAPI.SchemaTests.testDeriveFromCore
.
----------------------------------------------------------------------
Ran 1 test in 1.922s

OK
347794 function calls (345724 primitive calls) in 9.728 CPU seconds

Ordered by: internal time
List reduced from 627 to 10 due to restriction <10>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
204/6    1.738    0.009    6.966    1.161
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
1    0.475    0.475    0.486    0.486 repository\schema\types.py:2(?)
1    0.262    0.262    0.400    0.400 application\parcel.py:4(?)
36    0.259    0.007    0.331    0.009
repository\item\item.py:28(__init__)
4491/3539    0.219    0.000    1.928    0.001
repository\schema\kind.py:418(iterAttributes)
3316    0.189    0.000    0.642    0.000
1    0.173    0.173    0.175    0.175 repository\schema\kind.py:2(?)
450    0.157    0.000    0.429    0.001
10459    0.155    0.000    0.223    0.000

>>>

At this point, the local variable 's' is a "Stats" object:

>>> s
<pstats.Stats instance at 0x025AC760>

The Stats class is documented in detail at:

http://python.org/doc/2.4.1/lib/profile-stats.html

But the key features you need to know about are 'sort_stats()' and
'print_stats()'.  Here, I'll sort the calls by "cumulative time", and print
the top 20 functions/methods:

>>> s.sort_stats('cum').print_stats(20)
347794 function calls (345724 primitive calls) in 9.728 CPU seconds

Ordered by: cumulative time
List reduced from 627 to 20 due to restriction <20>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    9.728    9.728 <string>:1(?)
1    0.000    0.000    9.728    9.728
release\bin\lib\unittest.py:743(__init__)
3    0.000    0.000    7.433    2.478
application\schema.py:1201(reset)
6    0.039    0.007    7.267    1.211
204/6    0.007    0.000    6.966    1.161
repository\util\sax.py:183(parseFile)
204/6    1.738    0.009    6.966    1.161
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
3    0.000    0.000    5.935    1.978
repository\persistence\repositoryview.py:1053(__init__)
3    0.000    0.000    5.934    1.978
repository\persistence\repositoryview.py:39(__init__)
3    0.000    0.000    5.934    1.978
repository\persistence\repositoryview.py:92(openView)
3    0.001    0.000    5.933    1.978
1    0.046    0.046    5.104    5.104
release\bin\lib\unittest.py:766(parseArgs)
1    0.000    0.000    5.058    5.058
release\bin\lib\unittest.py:789(createTests)
1    0.000    0.000    5.058    5.058
1    0.082    0.082    5.058    5.058
1    0.053    0.053    4.976    4.976
application\tests\testschemaapi.py:1(?)
1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
1    0.000    0.000    4.623    4.623
release\bin\lib\unittest.py:793(runTests)
1    0.000    0.000    4.623    4.623
release\bin\lib\unittest.py:692(run)
1    0.000    0.000    4.623    4.623
release\bin\lib\unittest.py:427(__call__)
1    0.000    0.000    4.623    4.623
release\bin\lib\unittest.py:420(run)

<pstats.Stats instance at 0x025AC760>
>>>

Details on how to read these printouts, by the way, can be found at:

http://python.org/doc/2.4.1/lib/module-profile.html

Different sort orders are useful for different kinds of analysis.  If you
are trying to find out what code to speed up, a sort by "time" is more
useful, but I find that if you're trying to make your program *faster* -
i.e., make it run less code, then a cumulative sort is usually more
useful.  Nonetheless, I always look at the "time" sort first, because
sometimes one or two functions are using the lion's share of the program's
run time, and you want to pay attention to that.  However, usually that
lion's share isn't because the function is slow, but because it's being run
thousands or tens of thousands of times.  (This is one way that the
'percall' columns come in handy; if the percall is 0.000, then the function
is going to be hard to optimize, and you need to look further up the call
tree in order to stop calling the function so often.)

So, sorting by cumulative time usually gives you a better grasp on the
overall situation.  If you look at the first stats printout above, you'll
get the impression that there are a couple of seconds spent on XML
processing, and that everything else is a quarter-second here, half second
there, no big deal.  However, if you look at the second printout, you'll
see that over 70% of this test's runtime is spent in loading repository
packs.  Looking at the 'ncalls' column, you can see that schema.reset() is
called 6 times, and ends up invoking loadPack() 6 times, but loadPack() is
then recursively reinvoked a total of 204 times, indicating that each
initial loadPack() causes 34, maybe 35 XML files to be loaded.  And you can
figure this out just from the profile.

However, if you go down the chart a little, you'll also see that 5.1
seconds is spent in code called from the unittest module itself, including
4.976 seconds in TestSchemaAPI.py itself.  Where is this time coming
from?  The ":1(?)" at the end of the line tells us that this is the module
initialization code, and most of the time in module initialization code is
spent doing imports.  This suggests that almost half of the profiled
runtime is spent simply importing the repository and the libraries it uses,
and that perhaps much of the loadPack() call time is attributable to
importing as well.

What does this tell us?  Well, what it tells me is that profile_tests is
broken.  :)  Including the import time in this profile is making it
impossible to see what's really going on.  Mostly, I've been using
profile_tests to profile things that take so long as to make the import
times moot, or else things where I *wanted* the import time included, like
profiling Chandler startup.  This example, however, makes it clear that
unittest's gathering of tests needs to occur before the profiler is
started, and I will be changing profile_tests to do exactly that.  So,
rerunning the statistics with the new and improved profile_tests program:

>>> s.sort_stats('cum').print_stats(20)
238484 function calls (237037 primitive calls) in 4.221 CPU seconds

Ordered by: cumulative time
List reduced from 441 to 20 due to restriction <20>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    4.221    4.221
release\bin\lib\unittest.py:793(runTests)
1    0.000    0.000    4.220    4.220
release\bin\lib\unittest.py:692(run)
1    0.000    0.000    4.220    4.220
release\bin\lib\unittest.py:427(__call__)
1    0.000    0.000    4.220    4.220
release\bin\lib\unittest.py:420(run)
1    0.000    0.000    4.220    4.220
release\bin\lib\unittest.py:280(__call__)
1    0.000    0.000    4.220    4.220
release\bin\lib\unittest.py:245(run)
2    0.000    0.000    3.806    1.903
application\schema.py:1201(reset)
4    0.000    0.000    3.700    0.925
136/4    0.005    0.000    3.698    0.924
repository\util\sax.py:183(parseFile)
136/4    0.490    0.004    3.698    0.924
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
2    0.000    0.000    2.826    1.413
repository\persistence\repositoryview.py:1053(__init__)
2    0.000    0.000    2.825    1.413
repository\persistence\repositoryview.py:39(__init__)
2    0.000    0.000    2.825    1.413
repository\persistence\repositoryview.py:92(openView)
2    0.000    0.000    2.825    1.412
82    0.002    0.000    2.177    0.027
repository\persistence\packhandler.py:38(startElement)
78    0.008    0.000    2.172    0.028
repository\persistence\packhandler.py:98(itemStart)
132    0.017    0.000    2.152    0.016
1    0.000    0.000    2.139    2.139
application\tests\testschemaapi.py:29(tearDown)
1    0.000    0.000    1.861    1.861
application\tests\testschemaapi.py:26(setUp)
82    0.002    0.000    1.502    0.018
repository\persistence\packhandler.py:52(endElement)

This gives us a different picture.  Now, loadPack() is over 87% of the
runtime, with the actual XML parsing (in libxml2.SAXParseFile) accounting
for 10%.  So, somewhere around 77% of the runtime is in processing the
loaded pack data.  Looking at the cumulative times for startElement and
endElement -- 3.679 when put together -- appears to confirm this hypothesis.

(By the way, in case I didn't mention it earlier, the times shown in the
profiler are *relative*, not real time.  The test takes under 1 second in
"real life", so you can't treat these times as having something to do with
wall clock time.)

To go deeper, we can now begin to look at call trees:

>>> s.print_callees('startElement')
Ordered by: internal time
List reduced from 441 to 3 due to restriction <'startElement'>

Function
called...
repository\item\itemhandler.py:753(startElement)
repository\item\itemhandler.py:77(startDocument)(306)    0.007

repository\item\itemhandler.py:85(startElement)(2434)    0.248

repository\item\itemhandler.py:415(__init__)(306)    0.037

repository\item\itemhandler.py:85(startElement)
repository\item\itemhandler.py:160(attributeStart)(396)    0.056

repository\item\itemhandler.py:176(valueStart)(116)    0.033

repository\item\itemhandler.py:427(refStart)(554)    0.081

repository\item\itemhandler.py:468(itemStart)(306)    0.038

repository\schema\types.py:1424(valueStart)(20)    0.001

repository\util\sax.py:173(errorOccurred)(2434)    0.048

repository\persistence\packhandler.py:38(startElement)
repository\persistence\packhandler.py:61(packStart)(4)    0.004

repository\persistence\packhandler.py:98(itemStart)(78)    2.137

repository\util\sax.py:173(errorOccurred)(82)    0.048

<pstats.Stats instance at 0x0257A5D0>
>>>

This actually gave more data than we wanted; we wanted the "startElement"
function on line 38 of packhandler.py:

>>> s.print_callees('packhandler.py:38')
Ordered by: internal time
List reduced from 441 to 1 due to restriction <'packhandler.py:38'>

Function
called...
repository\persistence\packhandler.py:38(startElement)
repository\persistence\packhandler.py:61(packStart)(4)    0.004

repository\persistence\packhandler.py:98(itemStart)(78)    2.137

repository\util\sax.py:173(errorOccurred)(82)    0.048

<pstats.Stats instance at 0x0257A5D0>
>>>

Ah, that's better.  print_callees() and print_callers() accept a string
that's matched against the output lines, so you can use all or part of a
name or filename/line # to narrow down the output.  Anyway, we can now see
that most of the time spent in startElement() is actually being spent in
itemStart (line 98 of packhandler.py).  And we can drill into that:

>>> s.print_callees('packhandler.py:98')
Ordered by: internal time
List reduced from 441 to 1 due to restriction <'packhandler.py:98'>

Function
called...
repository\persistence\packhandler.py:98(itemStart)

repository\persistence\repositoryview.py:282(find)(4)    0.114

c:\cygwin\home\pje\c6\chandler\repository\util\path.py:19(__init__)(4)
0.032

c:\cygwin\home\pje\c6\chandler\repository\util\sax.py:173(errorOccurred)(64)
0.048

release\bin\lib\ntpath.py:59(join)(140)    0.011

release\bin\lib\sre.py:178(compile)(6)    0.000

<pstats.Stats instance at 0x0257A5D0>
>>>

So, 2.114 seconds are spent in loadItem(), which is very close to the total
time spent in startElement and is 50% of the total test execution time.

Since I only chose to analyze this test because it's very very short, and
because I've now shown nearly all of the ways to use a stats object, I'm
going to stop analyzing it here, and transition to profiling the Chandler UI.

Profiling in the Chandler UI is pretty simple.  From the "Test" menu select
"Start Profiler", and then do whatever it is you want to profile.  Then
select "Stop Profiler".  This creates a file called "Events.prof", which
you can analyze by opening a Python shell from the "Test" menu, and then
doing something like this (it helps if you maximize the window):

>>> s.sort_stats("cum").print_stats(20)
187634 function calls (170697 primitive calls) in 3.675 CPU seconds

Ordered by: cumulative time
List reduced from 874 to 20 due to restriction <20>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
185/56    0.004    0.000    3.675    0.066
parcels\osaf\framework\blocks\block.py:563(callNamedMethod)
1    0.001    0.001    3.666    3.666
parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)
2/1    0.000    0.000    3.662    3.662
parcels\osaf\framework\blocks\block.py:122(postEventByName)
2/1    0.000    0.000    3.662    3.662
parcels\osaf\framework\blocks\block.py:94(post)
2/1    0.000    0.000    3.662    3.662
parcels\osaf\framework\blocks\block.py:539(dispatchEvent)
129/1    0.005    0.000    3.662    3.662
129/126    0.002    0.000    3.639    0.029
parcels\osaf\framework\blocks\block.py:542(callProfiledMethod)
1    0.000    0.000    3.600    3.600
parcels\osaf\framework\blocks\trunk.py:65(onSelectItemEvent)
2/1    0.000    0.000    3.600    3.600
parcels\osaf\framework\blocks\trunk.py:33(wxSynchronizeWidget)
2/1    0.001    0.000    3.598    3.598
parcels\osaf\framework\blocks\trunk.py:81(installTreeOfBlocks)
68/1    0.031    0.000    3.521    3.521
parcels\osaf\framework\blocks\block.py:199(render)
2698/1110    0.104    0.000    2.618    0.002
219/136    0.020    0.000    2.602    0.019
4154/1175    0.069    0.000    2.561    0.002
8791/1640    0.037    0.000    2.549    0.002
792/574    0.005    0.000    1.850    0.003
140    0.003    0.000    1.722    0.012
repository\persistence\dbitemio.py:475(<lambda>)
140    0.008    0.000    1.719    0.012
repository\schema\kind.py:63(_setupClass)
20    0.014    0.001    1.711    0.086
repository\schema\kind.py:93(_setupDescriptors)
1429/667    0.111    0.000    1.686    0.003
repository\schema\kind.py:418(iterAttributes)

As you can see, it's exactly the same procedure to manipulate the stats
object as with run_tests, but you need to start by loading the
'Events.prof' file.

One last technique, which works with either way of getting profiling
data...  Suppose we wanted to find out who's calling __iter__ in
'print_callees':

Ordered by: internal time
List reduced from 874 to 1 due to restriction <'linkedmap.py:389'>

Function
was called by...
parcels\osaf\framework\blocks\block.py:199(render)(134)    3.521
parcels\osaf\framework\blocks\block.py:267(unRender)(9)    0.018
parcels\osaf\framework\blocks\block.py:467(pushView)(49)    0.499
parcels\osaf\framework\blocks\containerblocks.py:23(wxSynchronizeWidget)(228)
0.079
0.002
parcels\osaf\framework\blocks\containerblocks.py:345(wxSynchronizeWidget)(2)
0.007
parcels\osaf\framework\blocks\detail\detail.py:90(synchronizeDetailView)(7)
0.111
parcels\osaf\framework\blocks\detail\detail.py:105(reNotifyInside)(122)
0.108
parcels\osaf\framework\blocks\detail\detail.py:203(showReentrant)(3)
0.000
0.001
parcels\osaf\pim\items.py:516(getSharedState)(62)    0.004
parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)(10)
3.666
repository\item\query.py:60(_run)(24)    0.031
repository\item\sets.py:106(_iterSource)(8)    0.019
repository\schema\kind.py:418(iterAttributes)(1678)    1.686
repository\schema\kind.py:567(_kindOf)(46)    0.113

<pstats.Stats instance at 0x09A66BC0>
>>>

As you can see, compared to not having any data, the Python profiling tools
are better than nothing, but on an absolute scale they sort of suck.  The
Python profiler hasn't received much development attention for many years;
in fact the stats facility is so old that it goes back to before Python had
*packages*, so the output is pretty sucky.  If we're going to be doing a
lot of profiling, it might be worthwhile to invest some effort in creating
better reporting/analysis tools to let us crunch the raw data better.