[Dev] risks [was (db policy) transparent persistence]

David McCusker david at osafoundation.org
Wed Nov 27 11:57:16 PST 2002

Ricardo M. Reyes wrote:
> I think Object Persistence looks really interesting, but there's
> something that bothers me, and maybe it's already taken care in the
> Persistence implementations.

Another thing to worry about, and perhaps the one that's bothering
Patrick Logan (in another thread I'm getting to on my weblog), is the
option to create spaghetti data with persistent objects because the
freedom doesn't impose any disclipline on developers.

It seems to be human nature to fall apart and lose discipline when
no boundaries are encountered to provide rules about what is good and
what's not, especially when coupled with inability to see the results
of no discipline.  Making data more inspectable might help.

But we should also admonish folks to think of their data as having an
actual specified structure to it, even if transparent persistence does
the legwork of storing and fetching.  Folks need to think about the
consequences of their choices to avoid creating messes in data.

> I guess that the storage of objects includes the code of the metods.

No, the storage of objects should only include data and none of the
code methods.  In this sense, we don't actually store objects when
objects are defined as both code and data.

The code is usually factored out and rejoined to the data when they
are instantiated again from the serialized versions of object data.

There are two things to note about this factoring of code from data
when storing objects this way. First, the result is fragile, in the
sense that versioning the code can cause data to get out of sync.

Second, this factoring is likely the historical consequence of coding
environments that cannot easily store code with data, because having
code link and be operative in the system traditionally has byzantine
complications that are not strictly necessary.

Really dynamic languages like Python could actually store code with
data in a database and have this work without serious hitches.  But
it's probably not a good idea, because it's unfriendly to other systems
that want to use the data and are not using Python.  And because we
don't have much practical experience troubleshooting typical problems.

> If this is true, then I think it's a huge security risk. If someone
> tampers with the stored representation of an object, it can replace
> an innocent method with something dangerous, and then Chandler would
> execute that trojanized code next time the object is loaded.

Yeah, that would be a big security risk.  It's also a bit of a security
risk when the data alone is stored too, because many database systems
were implemented by developers not sufficiently paranoid about finding
corrupt bytes in storage (say, random bit patterns) that will cause
some database systems to crash when using them without checking.

In the past when I've written storage code, I never trusted the bytes
I read from disk (or some other remote location), and always checked
them before using them, so failures were soft exceptions rather than
hard crashes.  We might or might not find this level of due diligence
in storage libraries we use that already exist.

If Chandler is successful, and our storage has vulnerabilities that
permit attacks that make us crash, then I'd expect a game of core wars
with folks who want folks to think Chandler is unstable, or otherwise
not as high quality as our goals state.

> Am I wrong? Maybe the code is not stored on disk? or it's encrypted
> somehow?
> I would really like to hear from someone who used Obj. Persistence
> about this.

No it's not stored on disk, in the database.  But it's stored on disk
in the file system where the Python libraries are located. :-/  So the
same attack on code in databases applies in part to attacks on code
in the file system.  But folks have more practice coping with file
system attacks.

--David McCusker

More information about the Dev mailing list