Allow me to put on my arrogant academic hat here and say that
I am not only interested but I should be the ultimate and final
diety on the subject :). My qualifications are that (1) here at
NJIT I am the "teach databases slave" so I can lecture you silly
persons on the subject ad-nauseum (ask my students :); (2) my
dissertation title was "Extending Datalog for Federated Databases"
which goes on for almost 300 pages, including my own little
branch of mathematics I developed for the purpose (apparently,
I should name it after Dr. Suess's Zisser-Zoof seeds "which
nobody wants because nobody needs:); (3) I once exchanged nasty
insults with Jeff Ulman Himself (though he won't remember), and
have generally made an ass of myself at various DB conferences
worldwide :). NOTE THE SMILEYS! THE LAST PARAGRAPH WAS A JOKE
YOU HUMORLESS LITTLE NITS!
but seriously: (4) this possibility is what got me excited
about Python in the first place (after I decided that Common
Lisp would never work out and dbPerl wasn't such a good idea);
(5) I've done some halting experimentation in this area (I'm
working on a relational algebra++ in Python as we speak).
Now, please put on your muck-boots to wade through the following
BS -- after all, I _am_ an academic!
>1) Should the interface take a SQL literal only or should there be "helper"
>functions to allow non-SQL interface to SQL databases?
Yes and yes. I'd like to see
1) a direct interface to underlying SQL engines that just takes
straight SQL strings and returns "query objects".
2) a compositional (algebraic/nonSQL) approach that allows queries
to be constructed and derived directly in python, either with
no visible means of support or using an SQL engine underneith.
3) some sort of SQL implementation directly in python.
If we start with (1) and do it well then we could feed the result
directly into (2) which in turn could be used by (3). Viola!
Python becomes a killer DB glue language! Interoperability anyone?
>2) How should rows returned from the database be represented. My interface
>uses a dictionary, where the key is a string with the column name.
>.... Another approach is to have
>a Row object that hides the representation.
Yes and yes. Any row should "look like a dictionary", but python's
neat OO interface means we can implement it however we like. The only
reason I can think of for using something more complicated than a
dict for a row would be for "updatable cursors" which might need to hold more
information.
>3) Should a select return a cursor object that is interrogated for the
>rows? My implementation does not do this.
Why not? I'd like to see queries be objects that could be
passed around, duplicated, reset, advanced, indexed, and even algebraically
combined with other queries. Let's be expansive and general,
and if for some reason something's too hard to actually implement,
we can just 'raise "hell"' (Ie, punt using errors). A lot of
this would be best implemented in a Python layer which talks to
the C interface something closer to native mode (like Tkinter/tkinter).
>4) How should data be represented? I support strings, ints, and floats
>as native datatypes. What about BLOBs?
When possible a query object should encapsulate types for attributes.
Could use a "dunno" type to indicate that the programmer is "on his
own." I don't see any reason not to use dynamic allocation for big
stuff like the rest of python. But if it's REALLY big my ODBC ref
manual notes that "[SQLGetData] can be used to retrieve character
or binary data in parts" -- so we could wrap big objects in a python
BLOB/bigstring class, which only materializes the piece we want to
look at, eventually -- but push that one for the moment. Oracle
tries to convince customers to buy enough memory so the whole database
will fit "in core". Maybe we should follow their lead.
On the error messages: come on, that's too practical for me.
[I'll have to think about it.]
As an aside, hypothetically if we provide a python interface to
ODBC (and/or the emerging SQL standard API) programmers could use
it to talk to a lot more than just db's -- Excel spreadsheets, for
example. Thank God that Bill Gates (if the two can be distinguished)
is in a position to enforce some standards on the software community!
next step: maybe a moderately formal object architecture for
python/dbglue?
Aaron Watters
Department of Computer and Information Sciences
New Jersey Institute of Technology
University Heights
Newark, NJ 07102
phone (201)596-2666
fax (201)596-5777
home phone (908)545-3367
email: aaron@vienna.njit.edu
===
[A] Microsoft developer took a piece of IBM code that
required 33,000 characters ... and rewrote it in 200
characters.... IBM managers then began complaining....
Measured in lines of code, they said, Microsoft was
actually doing _negative_ work....
-- Paul Carroll, _Big Blues: The Unmaking of IBM_