[pylucene-dev] searching repeated and untokenized fields
lists at hubmed.org
Sun Apr 30 19:44:36 PDT 2006
I have a couple of questions regarding indexing and searching a
document that has repeated values for the same field (specifically,
the authors of a document, in this case):
Firstly, I'm adding the repeated field with this code:
for creator in creators:
doc.add(Field('creator', creator, Field.Store.YES,
but can't find a way to read those fields back out from the index. If
for author in hits[i]["creator"]:
then just the first "creator" entry is returned for that document and
gets split into a list of individual letters - in other words, hits[i]
["creator"] is a string and not a list.
Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to
search an untokenized field using a term that contains spaces. For a
document that has a creator "Doe J", the query
doesn't return any results, and
doesn't match what it needs to.
Has anyone found solutions to these problems already? For the first I
could just replace spaces with underscores during the indexing, but
that wouldn't be the ideal solution.
More information about the pylucene-dev