[pylucene-dev] indexing performance
andraz.tori1 at guest.arnes.si
Tue Jul 3 23:32:04 PDT 2007
On Wed, 2007-07-04 at 07:52 +0200, Filip de Waard wrote:
> Until today, I've never had a single worry about performance in my
> short but exciting Python experience.
Lucky you :). You obviously haven't had experience with memory leaks in
long-running daemon processes working with big datasets :). IMHO Python
memory management leaves a lot to be desired from.
Python is still a great tool though.
> However, now I'm trying to index over six million books from a MySQL
> database using PyLucene and I'd like to speed it up.
Cool dataset! :)
> I have posted my indexer script at http://pastie.textmate.org/75938.
> Tomorrow I'll start playing with a profiler, but in the meantime: does
> anyone have any recommendations as to how to be most efficient in
> regard to the Python code, database interaction and of course the
> PyLucene indexing process? Or maybe I'm doing something horribly wrong
> in my script?
I don't think dictcursor is the best option for you, what about trying
however you are proably not losing much time in python but in
python-lucene call conversions and lucene itself.
Considering the simplicity of your program, wouldn't it be really easy
to throw python out of equasion and write it in java entirely ?
> Any pointer would be most appreciated.
> Filip de Waard
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
More information about the pylucene-dev