[Chandler-dev] An another push on MVA based automatic tagging

Markku Mielityinen mmmm at osafoundation.org
Fri Sep 22 16:33:41 PDT 2006


Hi Viksit,

Viksit Gaur wrote:
> Great to see the project's back on track. I started looking at this
> project when Xun was still on it through the SoC, and as far as I'm
> aware, his code resides in the sandbox and is only pluggable into alpha1.
>   

I have reviewed Xun's work from this summer, at least the part that is 
in his sandbox.

> As Philippe suggested earlier, we ought to come up with some sort of a
> plan as to how contributions can be made on this front - in terms of
> features, code, designs et al.

I will be happy to get suggestions and patches from you once I get the 
initial release ready.

>  An earlier email talked about integrating
> PyICU into the chandler libraries themselves.

I am missing something now, what does PyICU have to do with all this?

>  Then of course is the
> actual MVA code,

I have opened a new discussion in this forum for the selection 
computational platform. I suggested using SciPy (or just NumPy) and as 
there has not been any feedback I think that we are going to include 
that into our Chandler distribution. Implementing the necessary MVA 
operations is an easy task (and we can always use other complementary 
libraries if necessary).

>  which as I understand isn't as yet automated
> completely, but instead works on slashdot feeds as test data, and relies
> on external programs to work.
>   
> IMO, the first step should either be to convert existing code to work
> with alpha3 or 4, OR if the differences aren't too many, get the system
> running without significant external dependencies and then update it.
The current implementation is so much different with the one that we are 
going for that I don't see making this side track as a good way to spend 
development resources.

MY CURRENT TODO LIST:
[preliminary work]
1) Learn how to use PyLucene (I am currently reading a book: Lucene in 
Action).
2) Obtain a real world data set with tags (Philippe has agreed to 
prepare one).
3) Implement necessary MVA operations (model building, clustering, etc).
4) Play with the empirical data to see how well the system actually 
works (I am going to make a feasibility study).
5) Get tagging implemented in Chandler (I need to discuss about this 
with Grant next week when he is back here in the office).
6) Select a computational platform that is capable with matrix 
operations and have it included into our Chandler distribution (I will 
discuss about this with Bear and Heikki next week).
[after 1-6 have been taken care of]
7) Decide the best way to implement automatic tagging functionality in 
Chandler.
8) Make all the necessary changes to repository schema, GUI, etc...

There is plenty of work to do so I really hope that we get a quick start 
on items 1-6.

>  As
> for my background, my postgraduate work in Computer Science dealt with
> humanoid robots, machine learning/data mining and AI, and I have a fair
> understanding of the mechanisms underlying this work.
>   

:) It seems that we have somewhat similar training so it should be easy 
for us to understand each other.

Cheers,
    Markku


More information about the chandler-dev mailing list