[pylucene-dev] searching repeated and untokenized fields

Alf Eaton lists at hubmed.org
Mon May 1 18:42:12 PDT 2006


On 01 May 2006, at 02:53, Andi Vajda wrote:

>> Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to  
>> search an untokenized field using a term that contains spaces. For  
>> a document that has a creator "Doe J", the query
>> creator:"Doe J"
>> doesn't return any results, and
>> creator:Doe J
>> doesn't match what it needs to.
>
> Again, please send in code that reproduces the problem. If you can  
> make sure that what you're trying to do work in Java Lucene, that's  
> a plus.
> Ideally, your sample code would be organized as unit tests.

Good idea to do the tests: I realised that StandardAnalyzer was  
converting the search terms to lowercase when used in QueryParser,  
but not when adding untokenized fields to the document using  
IndexWriter, so the two weren't matching. Fixed now, thanks (and it's  
presumably not a PyLucene problem).

alf.

--------

#!/usr/bin/env python

from PyLucene import *

filestore = FSDirectory.getDirectory("test", True)
analyzer = StandardAnalyzer()
filewriter = IndexWriter(filestore, analyzer, True)

doc = Document()

doc.add(Field('author-space', "Doe J", Field.Store.YES,  
Field.Index.UN_TOKENIZED))
doc.add(Field('author-space-tok', "Doe J", Field.Store.YES,  
Field.Index.TOKENIZED))
doc.add(Field('author-underscore', "Doe_J", Field.Store.YES,  
Field.Index.UN_TOKENIZED))
doc.add(Field('author-underscore-tok', "Doe_J", Field.Store.YES,  
Field.Index.TOKENIZED))

filewriter.addDocument(doc)
filewriter.close()

searcher = IndexSearcher("test")

for q in ("Doe J", "Doe_J"):
     for f in ("author-space", "author-space-tok", "author- 
underscore", "author-underscore-tok"):
         #query = QueryParser.parse(q, f, analyzer) # only works for  
tokenized fields
         query = TermQuery(Term(f, q)) # only works for untokenized  
fields
         hits = searcher.search(query)
         print "\nQ: %s\nQuery: %s\n" % (q, query)
         for i, doc in hits:
             print "Result: %s\n" % doc[f]


More information about the pylucene-dev mailing list