[Dev] Profiling techniques

Phillip J. Eby pje at telecommunity.com
Fri Sep 9 14:38:08 PDT 2005


Due to frequent net outages during my attempt at an IRC profiling clinic 
earlier this week, we didn't get to go through the basic stats-printing 
functions that day.  So Katie asked if I would write up a summary of the 
stuff that didn't get covered, so that folks could bring profiling data for 
part 2 of the clinic.

We had gotten as far as running Python with "-im profile_tests testname" in 
order to profile one or more unit tests, and get into a Python prompt with 
a stats object:

$ python -im profile_tests 
application.tests.TestSchemaAPI.SchemaTests.testDeriveFromCore
.
----------------------------------------------------------------------
Ran 1 test in 1.922s

OK
          347794 function calls (345724 primitive calls) in 9.728 CPU seconds

    Ordered by: internal time
    List reduced from 627 to 10 due to restriction <10>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     204/6    1.738    0.009    6.966    1.161 
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
         1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
         1    0.475    0.475    0.486    0.486 repository\schema\types.py:2(?)
         1    0.262    0.262    0.400    0.400 application\parcel.py:4(?)
        36    0.259    0.007    0.331    0.009 
repository\item\item.py:28(__init__)
4491/3539    0.219    0.000    1.928    0.001 
repository\schema\kind.py:418(iterAttributes)
      3316    0.189    0.000    0.642    0.000 
repository\util\linkedmap.py:156(__setitem__)
         1    0.173    0.173    0.175    0.175 repository\schema\kind.py:2(?)
       450    0.157    0.000    0.429    0.001 
repository\util\classloader.py:11(loadClass)
     10459    0.155    0.000    0.223    0.000 
repository\util\linkedmap.py:398(iterkeys)


 >>>

At this point, the local variable 's' is a "Stats" object:

 >>> s
<pstats.Stats instance at 0x025AC760>

The Stats class is documented in detail at:

     http://python.org/doc/2.4.1/lib/profile-stats.html

But the key features you need to know about are 'sort_stats()' and 
'print_stats()'.  Here, I'll sort the calls by "cumulative time", and print 
the top 20 functions/methods:

 >>> s.sort_stats('cum').print_stats(20)
          347794 function calls (345724 primitive calls) in 9.728 CPU seconds

    Ordered by: cumulative time
    List reduced from 627 to 20 due to restriction <20>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    9.728    9.728 <string>:1(?)
         1    0.000    0.000    9.728    9.728 
release\bin\lib\unittest.py:743(__init__)
         3    0.000    0.000    7.433    2.478 
application\schema.py:1201(reset)
         6    0.039    0.007    7.267    1.211 
repository\persistence\repositoryview.py:392(loadPack)
     204/6    0.007    0.000    6.966    1.161 
repository\util\sax.py:183(parseFile)
     204/6    1.738    0.009    6.966    1.161 
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
         3    0.000    0.000    5.935    1.978 
repository\persistence\repositoryview.py:1053(__init__)
         3    0.000    0.000    5.934    1.978 
repository\persistence\repositoryview.py:39(__init__)
         3    0.000    0.000    5.934    1.978 
repository\persistence\repositoryview.py:92(openView)
         3    0.001    0.000    5.933    1.978 
repository\persistence\repositoryview.py:116(_loadSchema)
         1    0.046    0.046    5.104    5.104 
release\bin\lib\unittest.py:766(parseArgs)
         1    0.000    0.000    5.058    5.058 
release\bin\lib\unittest.py:789(createTests)
         1    0.000    0.000    5.058    5.058 
release\bin\lib\unittest.py:552(loadTestsFromNames)
         1    0.082    0.082    5.058    5.058 
release\bin\lib\unittest.py:510(loadTestsFromName)
         1    0.053    0.053    4.976    4.976 
application\tests\testschemaapi.py:1(?)
         1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
         1    0.000    0.000    4.623    4.623 
release\bin\lib\unittest.py:793(runTests)
         1    0.000    0.000    4.623    4.623 
release\bin\lib\unittest.py:692(run)
         1    0.000    0.000    4.623    4.623 
release\bin\lib\unittest.py:427(__call__)
         1    0.000    0.000    4.623    4.623 
release\bin\lib\unittest.py:420(run)


<pstats.Stats instance at 0x025AC760>
 >>>

Details on how to read these printouts, by the way, can be found at:

     http://python.org/doc/2.4.1/lib/module-profile.html


Different sort orders are useful for different kinds of analysis.  If you 
are trying to find out what code to speed up, a sort by "time" is more 
useful, but I find that if you're trying to make your program *faster* - 
i.e., make it run less code, then a cumulative sort is usually more 
useful.  Nonetheless, I always look at the "time" sort first, because 
sometimes one or two functions are using the lion's share of the program's 
run time, and you want to pay attention to that.  However, usually that 
lion's share isn't because the function is slow, but because it's being run 
thousands or tens of thousands of times.  (This is one way that the 
'percall' columns come in handy; if the percall is 0.000, then the function 
is going to be hard to optimize, and you need to look further up the call 
tree in order to stop calling the function so often.)

So, sorting by cumulative time usually gives you a better grasp on the 
overall situation.  If you look at the first stats printout above, you'll 
get the impression that there are a couple of seconds spent on XML 
processing, and that everything else is a quarter-second here, half second 
there, no big deal.  However, if you look at the second printout, you'll 
see that over 70% of this test's runtime is spent in loading repository 
packs.  Looking at the 'ncalls' column, you can see that schema.reset() is 
called 6 times, and ends up invoking loadPack() 6 times, but loadPack() is 
then recursively reinvoked a total of 204 times, indicating that each 
initial loadPack() causes 34, maybe 35 XML files to be loaded.  And you can 
figure this out just from the profile.

However, if you go down the chart a little, you'll also see that 5.1 
seconds is spent in code called from the unittest module itself, including 
4.976 seconds in TestSchemaAPI.py itself.  Where is this time coming 
from?  The ":1(?)" at the end of the line tells us that this is the module 
initialization code, and most of the time in module initialization code is 
spent doing imports.  This suggests that almost half of the profiled 
runtime is spent simply importing the repository and the libraries it uses, 
and that perhaps much of the loadPack() call time is attributable to 
importing as well.

What does this tell us?  Well, what it tells me is that profile_tests is 
broken.  :)  Including the import time in this profile is making it 
impossible to see what's really going on.  Mostly, I've been using 
profile_tests to profile things that take so long as to make the import 
times moot, or else things where I *wanted* the import time included, like 
profiling Chandler startup.  This example, however, makes it clear that 
unittest's gathering of tests needs to occur before the profiler is 
started, and I will be changing profile_tests to do exactly that.  So, 
rerunning the statistics with the new and improved profile_tests program:

 >>> s.sort_stats('cum').print_stats(20)
          238484 function calls (237037 primitive calls) in 4.221 CPU seconds

    Ordered by: cumulative time
    List reduced from 441 to 20 due to restriction <20>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    4.221    4.221 
release\bin\lib\unittest.py:793(runTests)
         1    0.000    0.000    4.220    4.220 
release\bin\lib\unittest.py:692(run)
         1    0.000    0.000    4.220    4.220 
release\bin\lib\unittest.py:427(__call__)
         1    0.000    0.000    4.220    4.220 
release\bin\lib\unittest.py:420(run)
         1    0.000    0.000    4.220    4.220 
release\bin\lib\unittest.py:280(__call__)
         1    0.000    0.000    4.220    4.220 
release\bin\lib\unittest.py:245(run)
         2    0.000    0.000    3.806    1.903 
application\schema.py:1201(reset)
         4    0.000    0.000    3.700    0.925 
repository\persistence\repositoryview.py:392(loadPack)
     136/4    0.005    0.000    3.698    0.924 
repository\util\sax.py:183(parseFile)
     136/4    0.490    0.004    3.698    0.924 
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
         2    0.000    0.000    2.826    1.413 
repository\persistence\repositoryview.py:1053(__init__)
         2    0.000    0.000    2.825    1.413 
repository\persistence\repositoryview.py:39(__init__)
         2    0.000    0.000    2.825    1.413 
repository\persistence\repositoryview.py:92(openView)
         2    0.000    0.000    2.825    1.412 
repository\persistence\repositoryview.py:116(_loadSchema)
        82    0.002    0.000    2.177    0.027 
repository\persistence\packhandler.py:38(startElement)
        78    0.008    0.000    2.172    0.028 
repository\persistence\packhandler.py:98(itemStart)
       132    0.017    0.000    2.152    0.016 
repository\persistence\packhandler.py:144(loadItem)
         1    0.000    0.000    2.139    2.139 
application\tests\testschemaapi.py:29(tearDown)
         1    0.000    0.000    1.861    1.861 
application\tests\testschemaapi.py:26(setUp)
        82    0.002    0.000    1.502    0.018 
repository\persistence\packhandler.py:52(endElement)

This gives us a different picture.  Now, loadPack() is over 87% of the 
runtime, with the actual XML parsing (in libxml2.SAXParseFile) accounting 
for 10%.  So, somewhere around 77% of the runtime is in processing the 
loaded pack data.  Looking at the cumulative times for startElement and 
endElement -- 3.679 when put together -- appears to confirm this hypothesis.

(By the way, in case I didn't mention it earlier, the times shown in the 
profiler are *relative*, not real time.  The test takes under 1 second in 
"real life", so you can't treat these times as having something to do with 
wall clock time.)

To go deeper, we can now begin to look at call trees:

 >>> s.print_callees('startElement')
    Ordered by: internal time
    List reduced from 441 to 3 due to restriction <'startElement'>

Function
        called...
repository\item\itemhandler.py:753(startElement)
         repository\item\itemhandler.py:77(startDocument)(306)    0.007

         repository\item\itemhandler.py:85(startElement)(2434)    0.248

         repository\item\itemhandler.py:415(__init__)(306)    0.037

repository\item\itemhandler.py:85(startElement)
         repository\item\itemhandler.py:160(attributeStart)(396)    0.056

         repository\item\itemhandler.py:176(valueStart)(116)    0.033

         repository\item\itemhandler.py:427(refStart)(554)    0.081

         repository\item\itemhandler.py:468(itemStart)(306)    0.038

         repository\schema\types.py:1424(valueStart)(20)    0.001

         repository\util\sax.py:173(errorOccurred)(2434)    0.048

repository\persistence\packhandler.py:38(startElement)
         repository\persistence\packhandler.py:61(packStart)(4)    0.004

         repository\persistence\packhandler.py:98(itemStart)(78)    2.137

         repository\util\sax.py:173(errorOccurred)(82)    0.048


<pstats.Stats instance at 0x0257A5D0>
 >>>

This actually gave more data than we wanted; we wanted the "startElement" 
function on line 38 of packhandler.py:

 >>> s.print_callees('packhandler.py:38')
    Ordered by: internal time
    List reduced from 441 to 1 due to restriction <'packhandler.py:38'>

Function
        called...
repository\persistence\packhandler.py:38(startElement)
         repository\persistence\packhandler.py:61(packStart)(4)    0.004

         repository\persistence\packhandler.py:98(itemStart)(78)    2.137

         repository\util\sax.py:173(errorOccurred)(82)    0.048


<pstats.Stats instance at 0x0257A5D0>
 >>>

Ah, that's better.  print_callees() and print_callers() accept a string 
that's matched against the output lines, so you can use all or part of a 
name or filename/line # to narrow down the output.  Anyway, we can now see 
that most of the time spent in startElement() is actually being spent in 
itemStart (line 98 of packhandler.py).  And we can drill into that:

 >>> s.print_callees('packhandler.py:98')
    Ordered by: internal time
    List reduced from 441 to 1 due to restriction <'packhandler.py:98'>

Function
     called...
repository\persistence\packhandler.py:98(itemStart)
      repository\persistence\packhandler.py:144(loadItem)(132)    2.114

      repository\persistence\repositoryview.py:282(find)(4)    0.114

      c:\cygwin\home\pje\c6\chandler\repository\util\path.py:19(__init__)(4) 
0.032

      c:\cygwin\home\pje\c6\chandler\repository\util\sax.py:173(errorOccurred)(64) 
0.048

      release\bin\lib\ntpath.py:59(join)(140)    0.011

      release\bin\lib\sre.py:178(compile)(6)    0.000


<pstats.Stats instance at 0x0257A5D0>
 >>>

So, 2.114 seconds are spent in loadItem(), which is very close to the total 
time spent in startElement and is 50% of the total test execution time.

Since I only chose to analyze this test because it's very very short, and 
because I've now shown nearly all of the ways to use a stats object, I'm 
going to stop analyzing it here, and transition to profiling the Chandler UI.

Profiling in the Chandler UI is pretty simple.  From the "Test" menu select 
"Start Profiler", and then do whatever it is you want to profile.  Then 
select "Stop Profiler".  This creates a file called "Events.prof", which 
you can analyze by opening a Python shell from the "Test" menu, and then 
doing something like this (it helps if you maximize the window):

 >>> from hotshot.stats import load
 >>> s=load('Events.prof')
 >>> s.sort_stats("cum").print_stats(20)
          187634 function calls (170697 primitive calls) in 3.675 CPU seconds

    Ordered by: cumulative time
    List reduced from 874 to 20 due to restriction <20>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    185/56    0.004    0.000    3.675    0.066 
parcels\osaf\framework\blocks\block.py:563(callNamedMethod)
         1    0.001    0.001    3.666    3.666 
parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)
       2/1    0.000    0.000    3.662    3.662 
parcels\osaf\framework\blocks\block.py:122(postEventByName)
       2/1    0.000    0.000    3.662    3.662 
parcels\osaf\framework\blocks\block.py:94(post)
       2/1    0.000    0.000    3.662    3.662 
parcels\osaf\framework\blocks\block.py:539(dispatchEvent)
     129/1    0.005    0.000    3.662    3.662 
parcels\osaf\framework\blocks\block.py:602(broadcast)
   129/126    0.002    0.000    3.639    0.029 
parcels\osaf\framework\blocks\block.py:542(callProfiledMethod)
         1    0.000    0.000    3.600    3.600 
parcels\osaf\framework\blocks\trunk.py:65(onSelectItemEvent)
       2/1    0.000    0.000    3.600    3.600 
parcels\osaf\framework\blocks\trunk.py:33(wxSynchronizeWidget)
       2/1    0.001    0.000    3.598    3.598 
parcels\osaf\framework\blocks\trunk.py:81(installTreeOfBlocks)
      68/1    0.031    0.000    3.521    3.521 
parcels\osaf\framework\blocks\block.py:199(render)
2698/1110    0.104    0.000    2.618    0.002 
repository\util\linkedmap.py:389(__iter__)
   219/136    0.020    0.000    2.602    0.019 
repository\item\refcollections.py:407(_load)
4154/1175    0.069    0.000    2.561    0.002 
repository\util\linkedmap.py:398(iterkeys)
8791/1640    0.037    0.000    2.549    0.002 
repository\util\linkedmap.py:124(_get)
   792/574    0.005    0.000    1.850    0.003 
repository\persistence\repositoryview.py:921(_setLoading)
       140    0.003    0.000    1.722    0.012 
repository\persistence\dbitemio.py:475(<lambda>)
       140    0.008    0.000    1.719    0.012 
repository\schema\kind.py:63(_setupClass)
        20    0.014    0.001    1.711    0.086 
repository\schema\kind.py:93(_setupDescriptors)
  1429/667    0.111    0.000    1.686    0.003 
repository\schema\kind.py:418(iterAttributes)

As you can see, it's exactly the same procedure to manipulate the stats 
object as with run_tests, but you need to start by loading the 
'Events.prof' file.

One last technique, which works with either way of getting profiling 
data...  Suppose we wanted to find out who's calling __iter__ in 
linkedmap.py 1100 times.  We can use 'print_callers()' instead of 
'print_callees':

 >>> s.print_callers("linkedmap.py:389")
    Ordered by: internal time
    List reduced from 874 to 1 due to restriction <'linkedmap.py:389'>

Function
     was called by...
repository\util\linkedmap.py:389(__iter__)
     parcels\osaf\framework\blocks\block.py:199(render)(134)    3.521
     parcels\osaf\framework\blocks\block.py:267(unRender)(9)    0.018
     parcels\osaf\framework\blocks\block.py:467(pushView)(49)    0.499
     parcels\osaf\framework\blocks\block.py:602(broadcast)(258)    3.662
     parcels\osaf\framework\blocks\containerblocks.py:23(wxSynchronizeWidget)(228) 
0.079
     parcels\osaf\framework\blocks\containerblocks.py:308(adjustSplit)(4) 
0.002
     parcels\osaf\framework\blocks\containerblocks.py:345(wxSynchronizeWidget)(2) 
0.007
     parcels\osaf\framework\blocks\detail\detail.py:90(synchronizeDetailView)(7) 
0.111
     parcels\osaf\framework\blocks\detail\detail.py:105(reNotifyInside)(122) 
0.108
     parcels\osaf\framework\blocks\detail\detail.py:203(showReentrant)(3) 
0.000
     parcels\osaf\framework\blocks\menusandtoolbars.py:861(selectTool)(10) 
0.001
     parcels\osaf\pim\items.py:516(getSharedState)(62)    0.004
     parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)(10) 
3.666
     repository\item\query.py:60(_run)(24)    0.031
     repository\item\sets.py:106(_iterSource)(8)    0.019
     repository\schema\kind.py:56(onItemLoad)(44)    0.122
     repository\schema\kind.py:418(iterAttributes)(1678)    1.686
     repository\schema\kind.py:567(_kindOf)(46)    0.113


<pstats.Stats instance at 0x09A66BC0>
 >>>

As you can see, compared to not having any data, the Python profiling tools 
are better than nothing, but on an absolute scale they sort of suck.  The 
Python profiler hasn't received much development attention for many years; 
in fact the stats facility is so old that it goes back to before Python had 
*packages*, so the output is pretty sucky.  If we're going to be doing a 
lot of profiling, it might be worthwhile to invest some effort in creating 
better reporting/analysis tools to let us crunch the raw data better.



More information about the Dev mailing list