[Chandler-dev] Topology in tags

Xun Luo sherwoodluo at gmail.com
Sun Aug 6 12:47:11 PDT 2006


At present, most research works to automatically generate a hierachy of tags
are still supervised, i.e. by train and use text classifiers. The
"automation" comes with leveraging of established linguistic corpuses, most
commonly used and  acknowledged is WordNet. WordNet organizes English
vocabulary in a hierachical manner, with the concepts of synonyms,
supernyms, acronyms and subsumptions etc. Normally the sense of a term is
made more clearer if it is put in a context constructed by WordNet. I read
the CIKM paper and that's basically the  key idea of it.

Putting all the computation cost and  management overhead (a fully static
hierachy model will not fit along time, and a hierachy generated on the fly
will involve lots of computation and reference to external
corpus/collections), a key requirement to create tag hierachy is data. This
requirement is hardly to be satisfied  in a PIM like chandler, which
contains just thousands of content items and is highly personalized.  An
alternative  is, as  mentioned, using external data, such as WordNet, or a
model trained from querying established internet categories. (The best
result for similar task, as seen on KDD-CUP 2005,  is  by using  Yahoo
online category,  ODP  online category and statistical method together, and
the external data set involves tens of thousands of training terms and web
pages).

Tag clustering is relatively much easier, in my opinion that's why it is
commonly used for folksonomy. Flickr is definitely using that, although not
quite clear about the underlying mechanism. I know the similar functionaly
provided by del.icio.us is through combination of  statistical  methods
(requires large sample, might be able to be provided by cosmo) and  sense
similarities (reported by a HPL paper on del.icio.us tagging dynamics).

As for the unsupervised tagging for chandler. Which I am currently planning
is through keyword extraction with simple NLP. This is much similar to the
time field extraction project. In my humble opinion, I think a tagging
mechanism similar to Flickr's will already be very satisfying to Chandler
users.

Xun

On 8/5/06, Davor Cubranic <cubranic at cs.ubc.ca> wrote:
>
> Philippe Bossut wrote:
> > *But*, saying that there is no spelled out hierarchies between the
> > tags does not mean that there is no structure between them. Such a
> > structure will need to be deduced through how the tagged items relate
> > to each other. Segmentation techniques should be able to infer a local
> > hierarchy of tags even in the most tangled set. Once the local
> > hierarchy is deduced (and appropriately displayed), one can imagine to
> > turn "off" a whole node ("work" in the example given by Bobby).
> There was a paper at CIKM in 2005 about automatically generating
> hierarchies of tags based on their frequency of co-occurrence:
> http://pages.stern.nyu.edu/~panos/publications/cikm2005.pdf.
>
> Davor
>
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "chandler-dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/chandler-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/chandler-dev/attachments/20060806/7713cd5e/attachment.htm


More information about the chandler-dev mailing list