{Spam?} Re: [pylucene-dev] {Spam?} HighFreqTerms from org.apache.lucene.misc

Dirk Rothe d.rothe at semantics.de
Tue Apr 1 10:16:09 PDT 2008


On Wed, 26 Mar 2008 16:13:12 +0100, Andi Vajda <vajda at osafoundation.org>  
wrote:

>
> On Mar 26, 2008, at 2:16, "Dirk Rothe" <d.rothe at semantics.de> wrote:
>
>> I cannot find the HighFreqTerms Class from [1] in the flattened lucene  
>> Namespace. Any obvious reasons why?
>
> Probably because it's in a contrib jar file not currently on the list of  
> jar files in the PyLucene build. Adding the jar file to the list in  
> Makefile and rebuilding PyLucene should be enough to resolve the issue.
>
> Andi..

Ok, but by inspecting the java code, this was pretty trivial to implement  
in Python. Only curiosity, but do you think the java version would be  
(significantly) faster. I'm not sure I understand the performance  
implications from the jcc bridge.


def getHighFreqTerms(indexPath,fieldName,topN):
     ''' get top n terms from field given by fieldName '''
     reader = IndexReader.open(indexPath)
     terms = reader.terms()
     result = []
     while terms.next():
         if terms.term().field() == fieldName:
             result.append((terms.docFreq(),unicode(terms.term())))
     term = terms.next()
     reader.close()

     result.sort(reverse=True)
     return result[:topN]





More information about the pylucene-dev mailing list