[pylucene-dev] searching repeated and untokenized fields

Andi Vajda vajda at osafoundation.org
Sun Apr 30 23:53:50 PDT 2006


On Sun, 30 Apr 2006, Alf Eaton wrote:

> I have a couple of questions regarding indexing and searching a document that 
> has repeated values for the same field (specifically, the authors of a 
> document, in this case):
>
> Firstly, I'm adding the repeated field with this code:
>
> for creator in creators:
> 	doc.add(Field('creator', creator, Field.Store.YES, 
> Field.Index.UN_TOKENIZED))
>
> but can't find a way to read those fields back out from the index. If I use
>
> for author in hits[i]["creator"]:
>       print author

I'm not sure I understand what you're trying to do in the code above.
In PyLucene 1.9.1, the way to iterate hits is:

   for i, doc in hits:
       print doc['creator']

If there is more than one field called 'creator' then, you might want to try:
   for i, doc in hits:
      for creator in doc.getFields('creator'):
          print creator

In PyLucene 2.0rc1, you can also say:

   for hit in hits:
       for creator in hit.getDocument().getFields('creator'):
           print creator

If this doesn't work, please send in code that illustrates the problem (that 
would help in understanding and fixing the potential bug(s)).

> Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to search an 
> untokenized field using a term that contains spaces. For a document that has 
> a creator "Doe J", the query
> creator:"Doe J"
> doesn't return any results, and
> creator:Doe J
> doesn't match what it needs to.

Again, please send in code that reproduces the problem. If you can make sure 
that what you're trying to do work in Java Lucene, that's a plus.

Ideally, your sample code would be organized as unit tests.

Thanks !

Andi..


More information about the pylucene-dev mailing list