[pylucene-dev] searching repeated and untokenized fields
Alf Eaton
lists at hubmed.org
Mon May 1 18:42:12 PDT 2006
On 01 May 2006, at 02:53, Andi Vajda wrote:
>> Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to
>> search an untokenized field using a term that contains spaces. For
>> a document that has a creator "Doe J", the query
>> creator:"Doe J"
>> doesn't return any results, and
>> creator:Doe J
>> doesn't match what it needs to.
>
> Again, please send in code that reproduces the problem. If you can
> make sure that what you're trying to do work in Java Lucene, that's
> a plus.
> Ideally, your sample code would be organized as unit tests.
Good idea to do the tests: I realised that StandardAnalyzer was
converting the search terms to lowercase when used in QueryParser,
but not when adding untokenized fields to the document using
IndexWriter, so the two weren't matching. Fixed now, thanks (and it's
presumably not a PyLucene problem).
alf.
--------
#!/usr/bin/env python
from PyLucene import *
filestore = FSDirectory.getDirectory("test", True)
analyzer = StandardAnalyzer()
filewriter = IndexWriter(filestore, analyzer, True)
doc = Document()
doc.add(Field('author-space', "Doe J", Field.Store.YES,
Field.Index.UN_TOKENIZED))
doc.add(Field('author-space-tok', "Doe J", Field.Store.YES,
Field.Index.TOKENIZED))
doc.add(Field('author-underscore', "Doe_J", Field.Store.YES,
Field.Index.UN_TOKENIZED))
doc.add(Field('author-underscore-tok', "Doe_J", Field.Store.YES,
Field.Index.TOKENIZED))
filewriter.addDocument(doc)
filewriter.close()
searcher = IndexSearcher("test")
for q in ("Doe J", "Doe_J"):
for f in ("author-space", "author-space-tok", "author-
underscore", "author-underscore-tok"):
#query = QueryParser.parse(q, f, analyzer) # only works for
tokenized fields
query = TermQuery(Term(f, q)) # only works for untokenized
fields
hits = searcher.search(query)
print "\nQ: %s\nQuery: %s\n" % (q, query)
for i, doc in hits:
print "Result: %s\n" % doc[f]
More information about the pylucene-dev
mailing list