[pylucene-dev] Getting list of unique field values?
Ken Kinder
kkinder at gmail.com
Fri Sep 1 09:57:36 PDT 2006
On 8/31/06, Kevin Ollivier <kevino at tulane.edu> wrote:
>
> One thing I'd like to do with my indexes is provide a browsable list
> of various metadata fields, such as Subject, so that users could
> click on any subject in the index and get a list of documents which
> have that subject.
I do something similar. I found that using the MatchAllDocs() query
was indeed too slow. Based on the Lucene In Action examples, I found
that using a term enumerator was faster. On my index of over a million
rows, it took just a few seconds. Based on the LIA example for
distance sorting, try this:
fieldName = 'subject'
uniqueFieldValues = set()
enumerator = reader.terms(Term(fieldName, ""))
if reader.numDocs() > 0:
termDocs = reader.termDocs()
try:
while True:
term = enumerator.term()
if term is None:
raise RuntimeError, "no terms in field %s" %(fieldName)
if term.field() != fieldName:
break
termDocs.seek(enumerator)
while termDocs.next():
fieldValue = term.text()
if fieldValue not in uniqueFieldValues:
uniqueFieldValues.append(fieldValue)
if not enumerator.next():
break
finally:
termDocs.close()
More information about the pylucene-dev
mailing list