[pylucene-dev] Getting list of unique field values?

Ken Kinder kkinder at gmail.com
Fri Sep 1 09:57:36 PDT 2006


On 8/31/06, Kevin Ollivier <kevino at tulane.edu> wrote:
>
> One thing I'd like to do with my indexes is provide a browsable list
> of various metadata fields, such as Subject, so that users could
> click on any subject in the index and get a list of documents which
> have that subject.

I do something similar. I found that using the MatchAllDocs() query
was indeed too slow. Based on the Lucene In Action examples, I found
that using a term enumerator was faster. On my index of over a million
rows, it took just a few seconds. Based on the LIA example for
distance sorting, try this:

fieldName = 'subject'
uniqueFieldValues = set()

enumerator = reader.terms(Term(fieldName, ""))
if reader.numDocs() > 0:
    termDocs = reader.termDocs()
    try:
        while True:
            term = enumerator.term()
            if term is None:
                raise RuntimeError, "no terms in field %s" %(fieldName)
            if term.field() != fieldName:
                break
            termDocs.seek(enumerator)
            while termDocs.next():
                fieldValue = term.text()
                if fieldValue not in uniqueFieldValues:
                    uniqueFieldValues.append(fieldValue)
            if not enumerator.next():
                break
    finally:
        termDocs.close()


More information about the pylucene-dev mailing list