[pylucene-dev] Downcast of TermFreqVector to TermPositionVector

Bernhard Jung bernhard at jung.name
Mon Jul 30 03:44:34 PDT 2007


hi everybody,

I stumbled across the problem of using term vectors with position and
offset information in pylucene. I use fields with
Field.TermVectors.WITH_POSITIONS_OFFSETS set and the getTermFreqVector
method of IndexReader to retrieve the term vector, but this is of type
TermFrequencyVector and not of TermPositionVector (a sub-interface of
TermFrequencyVector), which would provide the method getTermPositions
and getOffsets that I want to use.

I patched lucene.cpp of the latest subversion trunk (of 2007-07-30) to
provide downcast methods from TermFrequencyVector to TermPositionVector
(isTermPositionVector and toTermPositionVector).

I'd like to share this patch or be corrected if I somehow follow a wrong
way to get the positions and offsets of terms in a document.

Find attached the patch and an example script that makes use of the
downcasted TermPositionVector.

bernhard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-20070730-termpositionvector-downcast.diff
Type: text/x-patch
Size: 1063 bytes
Desc: not available
Url : http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070730/b61e9c21/patch-20070730-termpositionvector-downcast.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample_termpositionvector.py
Type: text/x-python
Size: 1128 bytes
Desc: not available
Url : http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070730/b61e9c21/sample_termpositionvector.py


More information about the pylucene-dev mailing list