[pylucene-dev] DbDirectory
Andi Vajda
vajda at osafoundation.org
Tue Dec 7 14:52:45 PST 2004
> Have you produced heavy load on your Chandler about indexing and
> DbDirectory?
No, we have not done much stress-testing yet, especially the PyLucene part,
and especially on Windows.
I did not try the Windows -db- binaries on my box, I do most my development on
Mac OS X, but I did verify that the on my win2k Virtual PC, I can run the
test_DbDirectory.py unit test reliably when I build my own python 2.4 with my
own built Berkeley DB 4.3.21 and my own PyLucene.
I think that it should be as stable with python 2.3.3 and Berkeley DB 4.2.52
which is what Chandler is built with.
The fact that it is not so easy to know what 'stock' windows python's _bsddb
extension is built against, or that it might be statically linked, seems to
indicate that providing PyLucene -db- binaries is not very useful since only
by building everything yourself from source will you be able to verify that
you're actually running the correct libraries.
I'm currently focusing on Chandler repository performance, it seems that
stress-testing is a logical follow-on to that.
I think that the random instabilities you're seeing are most likely due to a
real bug in PyLucene having to do with things getting garbage collected too
soon, something I need to look into.
Andi..
>
> AV> Does the same code work reliably with a FSDirectory ?
> AV> I'm asking because random NullPointerExceptions are usually a sign of
> AV> something being garbage collected too soon, most likely a bug in PyLucene.
> AV> I haven't seen bugs like this in a while but definitely saw them in the past.
>
> AV> Andi..
>
> AV> On Tue, 7 Dec 2004, Yura Smolsky wrote:
>
>>> Hello.
>>>
>>> Guys,
>>> does it work on your Python-PyLucene-BerkleyDB platform?
>>> Check commented place below.
>>>
>>> I think Berkley DB Support is not very stable. Tests for it don't work
>>> on my box.
>>>
>>> Maybe I am doing something wrong?..
>>>
>>> # create datadir
>>> import tempfile, os
>>> datadir = "indexdbok"
>>> if not os.path.exists(datadir):
>>> os.mkdir(datadir)
>>>
>>> # initialize dbenv
>>> import bsddb
>>> dbenv = bsddb.db.DBEnv()
>>> dbenv.open(datadir,
>>> bsddb.db.DB_CREATE |
>>> bsddb.db.DB_INIT_TXN |
>>> bsddb.db.DB_INIT_MPOOL |
>>> bsddb.db.DB_THREAD)
>>>
>>> # create files and blocks databases
>>> txn = dbenv.txn_begin()
>>> filesDb = bsddb.db.DB(dbenv)
>>> filesDb.open('f.db', 'blocks', bsddb.db.DB_BTREE,
>>> bsddb.db.DB_CREATE | bsddb.db.DB_THREAD, txn=txn)
>>> blocksDb = bsddb.db.DB(dbenv)
>>> blocksDb.open('b.db', 'blocks', bsddb.db.DB_BTREE,
>>> bsddb.db.DB_CREATE | bsddb.db.DB_THREAD, txn=txn)
>>> txn.commit()
>>>
>>>
>>> # create an index writer
>>> txn = dbenv.txn_begin()
>>> from PyLucene import DbDirectory
>>> dir = DbDirectory(txn, filesDb, blocksDb, 0)
>>>
>>> from PyLucene import StandardAnalyzer, IndexWriter, Document, Field
>>> writer = IndexWriter(dir, StandardAnalyzer(), True)
>>> writer.setUseCompoundFile(False)
>>>
>>> # when I decrease the number of cycle then it works
>>> # when I increase then writer.optimize() produces
>>> # # def optimize(*args): return
>>> _PyLucene.IndexWriter_optimize(*args)
>>> # # ValueError: java.lang.NullPointerException
>>> for i in range(20):
>>> doc = Document()
>>> doc.add(Field.Keyword("id", "1"))
>>> doc.add(Field.Text("title", "q"*100000))
>>> writer.addDocument(doc)
>>>
>>> writer.optimize()
>>> writer.close()
>>>
>>> txn.commit()
>>>
>>>
>>> filesDb.close()
>>> blocksDb.close()
>>> dbenv.close()
>>>
>>>
>>>
>>> Yura Smolsky,
>>>
>>>
>>>
>>> _______________________________________________
>>> pylucene-dev mailing list
>>> pylucene-dev at osafoundation.org
>>> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>>>
>
>
>
>
> Yura Smolsky,
>
> -> AlterVision.biz <- Alternative Vision of Web Design
>
>
More information about the pylucene-dev
mailing list