[pylucene-dev] status of pylucene & mod_python
Ofer Nave
ofer at smarter.com
Mon Mar 26 01:21:48 PST 2007
> -----Original Message-----
> From: pylucene-dev-bounces at osafoundation.org
> [mailto:pylucene-dev-bounces at osafoundation.org] On Behalf Of Ofer Nave
> Sent: Sunday, March 25, 2007 4:22 PM
>
> [snip] ...the lesson for me here is I need
> to rearchitect my app so PyLucene code never has to be in the
> apache process. :(
And so I have.
I just finished coding up a module I named ipc.py. It provides a function
called call_in_seperate_process that takes a module name, a function name,
and wildcard args. It then uses popen2.Popen3 to invoke the ipc module as
an executable, and yaml to serialize the request and the args. The ipc
module-as-executable deserializes the request, executes it, uses yaml to
serialize the output of the function, and prints it to STDOUT, where it is
then deserialized by the parent process and returned.
So in effect, if you had a function that does something with PyLucene that
you wanted to call in your apache process, and it used to look like this:
---
from search import do_search
hits = do_search(query)
---
It would now look like this:
---
from ipc import call_in_seperate_process
hits = call_in_seperate_process('search', 'do_search', query)
---
That way you never have to load PyLucene code in your apache process.
NOTE: For the sake of your sanity, debug the functions you want to call
before calling them in this way, as I have not built in any mechanisms yet
for detecting and propogating error messages across the IPC boundary. Also,
do not have your subprocess functions return any data types that you don't
already have loaded in your parent process, such as PyLucene.Document. I've
written a wrapper for my Searcher class that unpacks all the data from the
PyLucene.Hits object and repacks it using native python data structures
(lists and dictionaries), so I never have to worry about where
PyLucene-specific structures will be needed anywhere else in my codebase.
Here's the ipc module (WARNING: some parts are customized for my project
directory structure):
---
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
from popen2 import Popen3
import sys
import yaml
ME = os.path.realpath(__file__)
LIB_ROOT = os.path.split(os.path.split(ME)[0])[0] #
project_root/lib/python/main
sys.path.append(LIB_ROOT)
def call_in_seperate_process(module, function, *args, **kwds):
global ME
request = dict(module=module, function=function, args=args, kwds=kwds)
request_yaml = yaml.dump(request)
child = Popen3(ME, True)
line = child.fromchild.readline()
if line != 'ready\n':
raise Exception
print >> child.tochild, len(request_yaml)
print >> child.tochild, request_yaml
child.tochild.flush()
line = child.fromchild.readline()
response_size = int(line.rstrip('\n'))
response_yaml = child.fromchild.read(response_size)
if len(response_yaml) != response_size:
raise Exception
return yaml.load(response_yaml)
def dyn_import(module_name, attribute):
module = __import__(module_name, globals(), locals(), [attribute])
return vars(module)[attribute]
def main():
print 'ready'
sys.stdout.flush()
line = sys.stdin.readline()
request_size = int(line.rstrip('\n'))
request_yaml = sys.stdin.read(request_size)
if len(request_yaml) != request_size:
sys.exit(10)
request = yaml.load(request_yaml)
function = dyn_import(request['module'], request['function'])
response_yaml = yaml.dump(function(*request['args'], **request['kwds']))
print len(response_yaml)
print response_yaml
sys.stdout.flush()
if __name__ == '__main__':
main()
---
-ofer
More information about the pylucene-dev
mailing list