[pylucene-dev] Re:pylucene-dev Digest, Vol 36, Issue 6

Liang Xing gorgonking at 163.com
Mon May 14 18:20:12 PDT 2007


 Thanks. I admit it is a typing error. I choose to give up fetching "contents" for both python and java, but only to fetch the fields "name" and "path". However, I found nearly the same results. Moreover, experiments ongoing shows that great time consuming occurs when fetching the field "name" and "path" for pyLucene, while original java program did the job in a much shorter time. I doublt if it is caused by switch from a python caller to its native funtion. regards, 
ÔÚ2007-05-12£¬pylucene-dev-request at osafoundation.org дµÀ£º
Send pylucene-dev mailing list submissions to pylucene-dev at osafoundation.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.osafoundation.org/mailman/listinfo/pylucene-dev or, via email, send a message with subject or body 'help' to pylucene-dev-request at osafoundation.org You can reach the person managing the list at pylucene-dev-owner at osafoundation.org When replying, please edit your Subject line so it is more specific than "Re: Contents of pylucene-dev digest..." Today's Topics: 1. performance experiment of PyLucene vs Lucene (Liang Xing) 2. Re: performance experiment of PyLucene vs Lucene (Brett Parker) ---------------------------------------------------------------------- Message: 1 Date: Fri, 11 May 2007 12:03:42 +0800 (CST) From: "Liang Xing" <gorgonking at 163.com> Subject: [pylucene-dev] performance experiment of PyLucene vs Lucene To: pylucene-dev at osafoundation.org Message-ID: <1542020201.108411178856222342.JavaMail.root at bj163app90.163.com> Content-Type: text/plain; charset="gbk" [Title] My awful performance experiment of PyLucene vs Lucene [Results] PyLucene ?= 0.5 Lucene(as to the search capacity) with the samples program "SearchFiles.py" provided by PyLucene, and a java program tackling similar task, I found PyLucene show a awful result, that is, the average time for Pylucene in Searching is about twice that of JAVA-Lucene. The best Java Result(365713ms for 6400 searches) (most result lays around 400000ms) The best PyLucene(662815ms for 6400 searches)( mostly result lays around 680000ms) [Prequsitive] Intel-Pentium-D DuralCore 2.8GHZ DDR-1G centos(Linux) kernel 2.6.9 Lucene 2.1.0(ant/java) vs PyLucene 2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF) (even worse result is achieved with lower PyLucene versions) Python 2.5.1 vs Java2 1.5.0_10 [Object : index files] The data source includes a directory and 27000 or so files, size of 0.5kb to 20kb respectively. The Index files is built by a Pylucene test-program, namely IndexFile.py(with the Path Pylucene-X.X/samples/, but is revised a littel by me, to change the "Store Attribute of Field:Content as NO", Since otherwise the memory cost would be so huge with original python program) [object: Testcases] A file with Name "Zop3" containing 6400 English words(as our search words), each within a line. [Major Steps of two programe:Search.java vs xSearchIndex.py] Simply Searching and Retriving performance comparion between the two brother. [Peer Actions that will be summed up in our test] 1.Construct a index Searcher Object(SEARCH) in Java and python languages. 2.Use the Searcher to achieve a search result(HITS) from index already-exist. 3.LOOP within HITS document-object, while reading each field-value of result items. 4.Repeat Step1-3 for arbitary 6399 other similar testcases. 5.Get the Record of total consuming-time, which would be prequistive to achieve the average time. Here goes with my program(xSearchFiles.py)(Search.java) ---- import part: xSearchFiles.py( one complete search procedure )---- def RunSearch(searcher, parser, word): global logger, time_costing local_parse = parser.parse local_search = searcher.search start = datetime.now() hits = local_search(local_parse(word)) #map(Processor, hits) for i in xrange(0, hits.length()): getMethod = hits.doc(i).get getMethod("name"), getMethod("path"), getMethod("contents") end = datetime.now() during = end - start wss = ["[Result]", "[Time]"] wss.insert(1, '\t'+ str(hits.length())) wss.append('\t'+ str(during)+ '\n') logger.writelines(wss) time_costing += during.microseconds/1000 ---- import part: Search.java( one complete search procedure) ---- clock.start(); for (int i = 0; m_words != null && i < m_words.length; i++) { int testonly = 0; Query q = qp.parse(m_words[i]); Hits h = is.search(q); clock.suspend(); System.out.println("\r" + i); clock.resume(); for(int j = 0; j < h.length(); j ++) { h.doc(j).get("name"); h.doc(j).get("path"); h.doc(j).get("contens"); testonly = j; } } clock.stop(); System.out.println("Total: " + clock.getTime() + "ms."); .. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070511/c45e5b77/attachment.html ------------------------------ Message: 2 Date: Fri, 11 May 2007 08:37:49 +0100 From: Brett Parker <iDunno at sommitrealweird.co.uk> Subject: Re: [pylucene-dev] performance experiment of PyLucene vs Lucene To: pylucene-dev at osafoundation.org Message-ID: <20070511073749.GM52222 at amnesiac.heapspace.net> Content-Type: text/plain; charset=us-ascii On Fri, May 11, 2007 at 12:03:42PM +0800, Liang Xing wrote: <snip class="Description + Python Example" /> > ---- import part: Search.java( one complete search procedure) ---- > clock.start(); > for (int i = 0; m_words != null && i < m_words.length; i++) > { > int testonly = 0; > Query q = qp.parse(m_words[i]); > Hits h = is.search(q); > clock.suspend(); > System.out.println("\r" + i); > clock.resume(); > for(int j = 0; j < h.length(); j ++) > { > h.doc(j).get("name"); > h.doc(j).get("path"); > h.doc(j).get("contens"); ^^^^^^^ Surely that should be contents - is this a typo in the mail or was this a copy paste? Because if this is a copy paste, and you're really fetching contens rather than contents, then that might well be why the java is seeming to go twice as fast as the python. > testonly = j; > } > } > clock.stop(); > System.out.println("Total: " + clock.getTime() + "ms."); > .. > Thanks, -- Brett Parker ------------------------------ _______________________________________________ pylucene-dev mailing list pylucene-dev at osafoundation.org http://lists.osafoundation.org/mailman/listinfo/pylucene-dev End of pylucene-dev Digest, Vol 36, Issue 6 *******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070515/1d2ba0e4/attachment.html


More information about the pylucene-dev mailing list