{Spam?} Re: [pylucene-dev] {Spam?} HighFreqTerms from
org.apache.lucene.misc
Dirk Rothe
d.rothe at semantics.de
Tue Apr 1 10:16:09 PDT 2008
On Wed, 26 Mar 2008 16:13:12 +0100, Andi Vajda <vajda at osafoundation.org>
wrote:
>
> On Mar 26, 2008, at 2:16, "Dirk Rothe" <d.rothe at semantics.de> wrote:
>
>> I cannot find the HighFreqTerms Class from [1] in the flattened lucene
>> Namespace. Any obvious reasons why?
>
> Probably because it's in a contrib jar file not currently on the list of
> jar files in the PyLucene build. Adding the jar file to the list in
> Makefile and rebuilding PyLucene should be enough to resolve the issue.
>
> Andi..
Ok, but by inspecting the java code, this was pretty trivial to implement
in Python. Only curiosity, but do you think the java version would be
(significantly) faster. I'm not sure I understand the performance
implications from the jcc bridge.
def getHighFreqTerms(indexPath,fieldName,topN):
''' get top n terms from field given by fieldName '''
reader = IndexReader.open(indexPath)
terms = reader.terms()
result = []
while terms.next():
if terms.term().field() == fieldName:
result.append((terms.docFreq(),unicode(terms.term())))
term = terms.next()
reader.close()
result.sort(reverse=True)
return result[:topN]
More information about the pylucene-dev
mailing list