[pylucene-dev] Pager functionality for python

David Pratt fairwinds at eastlink.ca
Sat Feb 25 17:01:50 PST 2006


Hi Matthew. This is very helpful. I have something currently working 
with hits object but I was a bit concerned that I may have been 
duplicating some existing functionality. I appreciate what you have 
found out about the original method and also the implementation approach 
for BitSet.

I am also just getting into sorting and filtering myself after only a 
short while with pyLucene.

I am also curious about best way of bringing data together from remote 
sources. I will have separate indexes on various servers but also need a 
central set of indexes for the consolidation of this information. I 
guess this is a likely scenario for other folks as well. I was thinking 
maybe a simple web service is best to request the data to the server for 
indexing but most data is 10K to 100K records on each so these will be 
significant files and chew up much bandwidth. I don't know if there is a 
better way yet. Lucene is interesting software.

Many thanks for your help.

Regards,
David

Matthew O'Connor wrote:
> David,
> 
> I was able to find the Java PageFilter source.  It seems
> that the Java code snippet you quoted came from this
> article:
> 
>     http://www.sys-con.com/read/37296.htm
> 
> The article provides code samples linked from the bottom of
> the article:
> 
>     http://res.sys-con.com/story/37296/Walls0712.zip
> 
> The Java PageFilter code is pretty short:
> 
>     import java.io.IOException;
>     import java.util.BitSet;
> 
>     import org.apache.lucene.index.IndexReader;
>     import org.apache.lucene.search.Filter;
> 
>     public class PageFilter extends Filter {
>       private int start;
>       private int end;
>       
>       public PageFilter(int pageNum, int pageSize) {
>         start = pageNum * pageSize;
>         end = (pageNum+1) * pageSize;
>       }
>       
>       public BitSet bits(IndexReader reader) throws IOException {
>         BitSet result = new BitSet(reader.maxDoc());
> 
>         for(int i=start; (i<end) && (i<result.size()); i++) {
>           result.set(i);
>         }
> 
>         return result;
>       }
>     }
> 
> You can implement this in PyLucene like this:
> 
>     import PyLucene
>     class PageFilter(object):
> 
>         def __init__(self, page_num=0, size=10):
>             self.start = page_num * size
>             self.end = self.start + size
> 
>         def bits(self, reader):
>             results = PyLucene.BitSet(reader.maxDoc())
>             for i in xrange(self.start, min(self.end, results.size())):
>                 results.set(i)
>             return results
> 
> Then you can do what you originally tried:
> 
>     hits = searcher.search(query, PageFilter(1, 20))
> 
> HOWEVER, the PageFilter code in Java doesn't work right and
> neither does the PageFilter code in Python.  As far as I can
> tell this is because the article's author made a mistake.
> There's a comment on the article that shows Java Lucene
> users haven't been able to get the PageFilter example to
> work either:
> 
>     http://www.sys-con.com/read/37296_f.htm
> 
> I'm not very experienced with Filters in Lucene (just
> started with Lucene, via PyLucene, a few weeks ago).
> However, reading the Java Lucene documentation it appears
> that the author's strategy isn't going to work right.  You
> can read the Java Lucene doc's yourself:
> 
>     http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Filter.html
> 
> You can read about extending Java Lucene objects from
> PyLucene in the README, here:
> 
>     http://svn.osafoundation.org/pylucene/trunk/README
> 
> Look for the section called "'Extending' Java classes from
> Python".  And you can see an example here:
> 
>     http://svn.osafoundation.org/pylucene/trunk/samples/LuceneInAction/lia/extsearch/filters/SpecialsFilter.py
> 
> FWIW, the logic for pagination is sufficiently simple that
> I'd probably just apply it directly to the hits object.  You
> should be able to figure that out from the examples above.
> Hope that helps.
> 
> -matthew
> 
> David Pratt [fairwinds at eastlink.ca] said:
> 
> 
>>Hi. I've read a few things about paging functionality for the searcher. 
>>I have already rolled my own in the meantime for batching and paging but 
>>still wondering if this functionality already exists somewhere that I am 
>>just unaware of. I am providing a start position and calculating an end 
>>position for xrange based on hits.length() to keep the end position 
>>within the range of results. In any case, I read:
>>
>>Hits hits = searcher.search(query, new PageFilter(1,20));
>>
>>In another, this version:
>>
>>hits = searcher.search(query, 0, 10);
>>
>>I could not locate a PageFilter method in the java docs and the second 
>>method throws an exception.
>>
>>Regards,
>>David
>>_______________________________________________
>>pylucene-dev mailing list
>>pylucene-dev at osafoundation.org
>>http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>>
> 
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> 


More information about the pylucene-dev mailing list