[Chandler-dev] Collecting Usage Data/Central Logging

Mike Taylor bear at code-bear.com
Tue Jun 27 21:18:11 PDT 2006


On Jun 26, 2006, at 7:18 PM, Ashkan Soltani wrote:

> Questions:
>
> What would be the best way to 'collect' this data, given that users 
> may or may not have network access, or could possibly be firewalled, 
> etc. (*caveat: i'm looking for a quick low-hanging fruit such that 
> more time can be spent on the analysis side of the project)
>
> Possible idea's I've considered is using HTTP in real time (or via 
> implementing a buffering system) to log the data to a central server.  
> There's some direct support for this in the python logging module 
> under HTTPHandler: http://docs.python.org/lib/module-logging.html
>
> Alternatives would be to use a 'sync' procedure, either via rsync/ftp 
> or perhaps even the background sync module to upload the logdata to 
> our servers.  The implementation for this stuff would be a bit more 
> involved, specially since I'm not storing the logdata in the 
> repository and would need to figure out how to encapsulate it, but it 
> might be more compatible with the rest of the chandler methodology.
>
> The last/simplest approach would be to use simple SMTP to post the 
> information since we know that more often than not, users will have 
> 'at least' SMTP access from within Chandler since it is a mail app 
> after all.

Here are some questions and concerns I thought about while reading your 
post:

Your right to be concerned about outgoing ports and transport 
mechinisms - unless we have a server listening on one of the known 
ports (80, 8080, 443, etc) the vast majority of the users will not be 
able to use the service.  This may also be a concern if you decide to 
use email - a lot of people are behind email servers that limit 
outbound email to X number per minute or X amount of traffic and that 
would also require some sort of configuration step so Chandler could 
authenticate to their SMTP server.

My biggest concern is given that we could be logging personal or other 
sensitive information, even indirectly, the transport stream would 
either have to be encrypted or the data encrypted and then sent as 
binary data.

We would also need to be concerned with the number of data packets 
being sent to this server so we can find out sooner than later if 
front-end load (the receiving of the data) or back-end load (the 
expanding of the data and putting it somewhere that devs can access) 
will be the bottleneck.

Your idea of using rsync doesn't seem practical to me as all of this 
data is being generated new on the client side so rsync would just end 
up sending it all anyway - may as well stick to HTTP Put or WebDav or 
something like that.

Using sftp or ftp IMO is also a non-starter as those protocols have 
plenty of issues from the security standpoint.  Just watch how quickly 
our IP would be deluged with script-kiddies when they find out we have 
a FTP port that allows PUTs using computer generated UserIDs.

One method I would propose would be to use XMPP and a pub/sub setup.

---
Bear

Build and Release Engineer
Open Source Applications Foundation (OSAF)
bear at osafoundation.org
http://www.osafoundation.org

bear at code-bear.com
http://code-bear.com

PGP Fingerprint = 9996 719F 973D B11B E111  D770 9331 E822 40B3 CD29


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.osafoundation.org/pipermail/chandler-dev/attachments/20060628/8b26a6d5/PGP.pgp


More information about the chandler-dev mailing list