[pylucene-dev] fatal error in GC : too many heap sections.

Marc Weeber marc at weeber.net
Mon Nov 19 11:03:03 PST 2007


Hi Bill,

Earlier this year, I heard on the rumor mill (email list for NXLucene)  
that memory leak problems were only solved in the GCC 4.2  and up  
branch.... May be Andi has a better reply than this

best,

Marc

On Nov 19, 2007, at 7:49 PM, Bill Peverill wrote:

> Hi Everybody!
>
> Sorry this post is so long.
>
> We are currently in the endstage of implementing a new search  
> capability
> over a database with ~ 10 million records at present, which will  
> grow to ~
> 30 million within a month. We are no strangers to python and  
> pyLucene, and
> naturally chose pyLucene for this project. This project has been  
> implemented
> over our present database, and it works great for low user levels.
>
> In a separate project, we have implemented a pyLucene solution for  
> another
> [offline] product with a single-directory index size of about 40MB  
> over a
> limited selection of the same data. This has been working well since  
> May.
>
> ------- Our Problem
>
> Under load testing simulating our online traffic we encounter a  
> mounting
> memory allocation which is never released, until finally the process  
> dies
> with a modal dialog from within pyLucene : "fatal error in GC : too  
> many
> heap sections." This usually occurs once memory has built to a  
> little under
> 2 GB [of 4 GB available.] We have been unable to free this memory  
> once it is
> allocated, without restarting the job, which currently runs in a  
> console
> window. We have not been able to get a stable implementation for  
> more than
> some number of hours.
>
> ------- Our Questions
>
> We would first like to establish that this is a solvable problem.  
> What are
> the largest implementations out there so far with high traffic  
> searching a
> large dataset? Any anecdotal evidence would be great. [Go on; Brag.]
>
> Assuming pyLucene can support our requirements, has anybody  
> encountered this
> particular issue? How did you solve it?
>
> The list-serve archives suggest others have had this issue, but the  
> threads
> have tended to go cold before a specific solution has been  
> documented. The
> bug list has a single reference to a couple thread based memory  
> leaks, but
> these may or may not be similar to ours, and will not be fixed.  
> [Making this
> an unsolvable problem.] Maybe we just didn't find it, so if there is a
> meaningful thread in the archives we'd love to hear about it.
>
> ------- A few things we've tried
>
> we removed Cherrypy from the mix and had our PyLucene code perform  
> searches
> directly, with no effect.
>
> we've tried creating a new IndexSearcher per search, and also tried  
> creating
> a single IndexSearcher that was used over many searches: none of the
> variations we've tried prevented the memory from growing.
>
> ------- A note on our environment
>
> We have two fast dedicated load balanced servers running server  
> 2003, with 4
> GB RAM and a fast RAID stripe to run pyLucene exclusively.  pyLucene  
> is
> called from cherryPy.
>
> The present index is 3.1 GB in size as a 10 directory index; 2.8 GB  
> as a
> single directory index. We will likely implement a multi-directory  
> index in
> anticipation of bumping up against file size restrictions at the OS  
> level
> once we implement the full data set. We anticipate an index in the  
> area of
> 10+ GB. We have not been optimizing our indices as part of our  
> tests: our
> single-folder index consists of 34 files.
>
> For debugging purposes we run CherryPy in console mode, but the  
> production
> release will run as a windows service.
>
> The software versions we use for the search web service are:
>
> Python		version 2.4.4
> Cherrypy		version 2.1.0 (with threading modified to use
> PyLucene.PythonThread)
> Pylucene GCJ 	version 2.2.0-1, built on Windows 2000 with mingw/gcj  
> 3.4.6
> and Python 2.4.3, obtained from the PyLucene download page
>
> Note that because our codebase is not compatible with Python 2.5, we  
> can't
> try the version of PyLucene that has been compiled with Python 2.5.
>
>
> We would be grateful for any advice we can get. (Our gratitude could  
> include
> beer contributions or other compensation if appropriate.) We'd also  
> love to
> hear from people who have NOT had this problem.
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev



More information about the pylucene-dev mailing list