[pylucene-dev] searching repeated and untokenized fields

Alf Eaton lists at hubmed.org
Sun Apr 30 19:44:36 PDT 2006


I have a couple of questions regarding indexing and searching a  
document that has repeated values for the same field (specifically,  
the authors of a document, in this case):

Firstly, I'm adding the repeated field with this code:

for creator in creators:
	doc.add(Field('creator', creator, Field.Store.YES,  
Field.Index.UN_TOKENIZED))

but can't find a way to read those fields back out from the index. If  
I use

for author in hits[i]["creator"]:
         print author

then just the first "creator" entry is returned for that document and  
gets split into a list of individual letters - in other words, hits[i] 
["creator"] is a string and not a list.


Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to  
search an untokenized field using a term that contains spaces. For a  
document that has a creator "Doe J", the query
creator:"Doe J"
doesn't return any results, and
creator:Doe J
doesn't match what it needs to.


Has anyone found solutions to these problems already? For the first I  
could just replace spaces with underscores during the indexing, but  
that wouldn't be the ideal solution.

alf.



More information about the pylucene-dev mailing list