.. started from awhtml/docs/design/awe-db.tex

Introduction
============

.. Assumption: Basic understanding of object oriented concepts in Python

Suppose you are working from the Python prompt, create some class ``A``
and instantiate that class, referring to the resulting object as ``a``.

.. code-block:: python

   class A:
       pass

   a = A()
   a.x = 29

Would it not be nice if we could leave the Python session at this stage
and, at a later stage, start a new Python session and continue with
object ``a`` as we left it earlier?
Assuming that we would have many different instances of class ``A`` with
different values for the attribute ``.x``, would it not be convenient if
we could find the subset of instances for which the attribute ``.x``
is equal to a certain value?


.. code-block:: python

   from common.database.DBMain import DBObject, persistent

   class A(DBObject):
       x = persistent('This is a persistent integer attribute [m]', int, 37)

   a = A()
   a.x = 29
   a.commit()


In ``common.database.DBMain`` the class ``DBObject`` is defined that serves
as the base class from which each persistent class is ultimately derived.
``DBObject`` delegates the part of the implementation that is specific for
a storage backend to ``DBProxy``.


DBObject, persistent
DBObjectMeta
DBProxy -> DBSqlite3
Database


\label{app:awe-db}

This document describes the specification and Python implementation of
persistent objects on top of a relational database back end. The aim
of this implementation is twofold:

\begin{enumerate}
\item Provide a transparent mapping from a definition of a persistent
class to a table in a relational database, preserving inheritance
relationships, and allowing attributes to refer to other persistent objects.
\item Provide a native Python syntax to express queries, and leverage
the advantages of the relational model (SQL) when using persistent objects.
\end{enumerate}

In this paper we will first introduce a number of concepts from Object
Oriented Programming (OOP) and Relational Database Management Systems
(RDBMS), in order to clarify the problem we wish to solve. We will
then provide the specification of the database interface provide by
the \aw prototype. Finally, we will clarify some of the
implementation issues addressed by the current prototype.

\section{Background}
====================

\subsection{Object Oriented Programming}
----------------------------------------

It is difficult to give a meaningful definition of ``object''.
However, the following ``definition'' introduces some intimately
related terms that will be used throughout this document:

\begin{description}
\item [object] An object is something that comprises {\bf type}, {\bf identity}
and {\bf state}. The type of an object, specifies what kind of object
it is, specifically what kind of behavior the object is capable of.
The identity is what distinguishes one object from another.  The state
of an object specifies the values of the properties of the object.
\end{description}

In Object Oriented Programming (OOP) we have an operational definition
of objects:

\begin{description}

\item [object] An object is an {\bf instance} of a {\bf class}, and
{\bf encapsulates} both data and behavior

\end{description}

The class defines what operations ({\bf methods}) can be performed on
its instances, and what {\bf attributes} those instances will have. In
general `class' and `type' are synonymous, as are `instance' and
`object'. That is, when we talk about the type of an object we mean
the class of which it is an instance.

It is important to note that the values of the attributes of an object
will themselves be objects, although most programming languages
distinguish between (instances of) primitive data types (integers,
strings, etc) and instances of classes.

{\bf Inheritance} is the mechanism by which one can use the definition
of existing classes to build new classes. A child (derived) class will
inherit behavior from its parent (base) class. In defining the child
class the programmer has the opportunity to extend the child class
with new methods and attributes, and/or modify the implementation of
methods defined in the parent class. However, the child class is
expected to conform to the interface (specification) of the parent
class, to the extent that instances of the child class can behave as
if they are instances of the parent class. In particular it is
expected that procedures taking an object of a base type as argument,
should also work when given a derived type as argument. This key
property of objects is called {\bf polymorphism}


\subsection{Persistency}
------------------------

An object is said to be {\bf persistent} if it is able to `remember'
its state across program boundaries. This concept should not be
confused with the concept of a program saving and restoring its data
(or state). Rather, persistency, implies that object identity is
meaningful across program boundaries, and can be used to recover
object state. 

Persistency is usually implemented by an explicit mapping from
(user-defined) object identities to object states and by then saving and
restoring this mapping. However, this implementation assume that the
object identity of the object one is interested in can be
independently and easily obtained. For many applications this is not
the case. On the contrary, one usually has a (partial) specification of
the state, and are interested in the corresponding objects that satisfy
this specification. That is, many interesting applications depend on a
mapping of a partially specified object state to object
identity (and then to object). This is the domain of the relational database.

\subsection{Relational Databases}
---------------------------------

A relational database management system (RDBMS) stores, updates and
retrieves data, and manages the relation between different data. A
RDBMS has no concept of objects, inheritance and polymorphism, and it
is therefore not a-priory obvious that one would like to use such a
database to implement object persistence. However, using the following
mapping

\begin{center}
\begin{tabular}{rcl}
 {\bf type} & $\longleftrightarrow $ & {\bf table} \\
 {\bf identity} &  $\longleftrightarrow $ & {\bf row index} \\
 {\bf state} & $\longleftrightarrow $ & {\bf row value} \\
\end{tabular}
\end{center}

it is (hopefully) obvious that one might, at least in principle,
implement object persistency using a relational database. That is,
given a type and object identity, one can store and retrieve state
from the specified row in the corresponding table.

Relational databases provide a powerful tool to view and represent
their content using structured queries. It would be extremely useful
if we were able to leverage this power to efficiently search for
object whose state matches certain criteria. Special consideration has
to be given to inheritance in this case.

Assume, for example, that we define a persistent type {\tt
DomeFlatImage}, derived from a more general type {\tt
FlatfieldImage}. A query for all R-band flatfield images, should
result in a set including all R-band domeflat images.  This behavior
of queries is what inheritance means in a relational database
context. Hence, a query for objects of a certain type maps to queries
(returning row indices/object identities) on the tables corresponding
to that type, and all of its subtypes. The results of these queries
are then combined in to a single set of all objects, of that type or
one of its sub types, that satisfy the selection.

\section{Problem specification}
===============================

The implementation of the interface (should) address(es) the following
issues:

\begin{description}

\item [defining a persistent class] Defining a persistent class
(type), will give its instances the property of being persistent. The
class definition should provide sufficient information about the
attributes (possible state) of the objects to build the corresponding
database table. This table should be present in the database when the
first object of the class is instantiated. Presently, this is achieved
by dynamically creating the table (if it doesn't yet exist), when
processing the class definition\footnote{This implementation neatly
avoids the problem of having to maintain both the class hierarchy and
the corresponding database schema}

\item [retrieving state of persistent object] Instantiating a
persistent object with an existing object identity should result in
retrieval of state from the database.

\item [saving state of persistent objects] Persistent objects, whose
state has been modified, should save their state to the database
before they cease to exist.

\item [references] persistent objects will contain references to
(read: instances of) other persistent objects. Care has to be taken
that instantiation of a persistent object does not recursively
instantiates all objects it refers to.  Only when the attribute
corresponding to the reference are accessed should the corresponding
object be instantiated.

\item [expressing selections] It should be possible to express selections of
the form 
\begin{equation}
\{x | x \in X \wedge (x.attr1 \in A \wedge x.attr2 \in B \vee x.attr3 \in C ...)\}
\end{equation}
i.e.: the set of all objects of type {$X$} whose attributes have
certain properties. This set should be translated in to an SQL
query to the database, and result in an iterable sequence of objects
satisfying the selection.

\end{description}

In addition, the following issues need to be addressed, though not
necessarily by the interface to persistent objects.

\begin{description}

\item [managing database connections] The interface does not specify how 
or when the database connection is established.

\item [transactions] The interface doesn't specify if and how transactions 
are implemented

\item [efficiency] No effort has yet been made to maximize performance 
and/or scalability. Initial efforts has focussed on a demonstration of
technology and simplicity of implementation.

\end{description}

\section{Interface Specification}
=================================

In this section we describe how to implement and use persistent
objects, using the interface defined in the astro-wise prototype. This
section includes Python source code fragments. For those not familiar
with \Py we advise that they have a look at the main web site at
{\tt http://www.python.org/} and at the \Py tutorial at
{\tt http://www.python.org/doc/current/tut/tut.html}

\subsection{Persistent classes}
-------------------------------

Persistent objects are instances of persistent classes, which specify
explicitly which attributes (properties) are saved in the database. We
call these attributes persistent properties. Executing a program
defining

\subsubsection{Defining persistent classes}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A new persistent class is defined by deriving from an existing
persistent class, or by deriving from the root persistent class
DBObject. E.g.:

\begin{verbatim}
#example1.py
from astro.database.DBMain import DBObject
class A(DBObject):
    pass
class B(A):
    pass
\end{verbatim}

specifies two persistent classes (A and B). Neither of them extends
their parent classes, so instances of A and B will behave exactly like
instances of DBObject.

\subsubsection{Defining persistent properties}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A persistent property is defined by using the following expression in
the class definition:

\begin{verbatim}
prop_name = persistent(prop_docs, prop_type, prop_default),
\end{verbatim}

where, {\tt prop\_name} is the name of the persistent property, and
{\tt persistent} is constructed using three arguments: the property
documentation, the type of the property, and the default value for the
property respectively. For example:

\begin{verbatim}
#example2.py
from astro.database.DBMain import DBObject, persistent
class Address(DBObject):
      street = persistent('The street', str, '')
      number = persistent('The house number', int, 0)
\end{verbatim}

This program defines a persistent class `Address', with two persistent
properties, `street' and `number', of type str(ing) and int(eger)
respectively.

We distinguish between 5 different types of persistent properties,
based on the signature of the arguments to {\tt persistent()}

\begin{description}
\item[descriptors]
If the type of the persistent property is a basic (built-in) type,
then we call the persistent property a descriptor. Valid types are:
integers (int), floating point numbers (float), date-time objects
(DateTime), and strings (str).

\item[descriptor lists]
Persistent properties can also be homogeneous variable length arrays
of basic built in types, called descriptor lists. Valid types are the
same as those for descriptors. descriptor lists are distinguished from
descriptors by the property default. If the default is a python list,
the the property is descriptor list, else it is a simple descriptor.

\item[links]
Persistent objects can refer to other persistent objects. The
corresponding properties are called links. If the type of the
persistent property is a subclass of DBObject, then the property is a
link.

\item[link lists]
Persistent properties can also refer to arrays of persistent objects,
in which case they are called link lists. Link lists are distinguished
from links by the property default. If the default is a python
list, the the property is link list.

\item[self-links]
A special case of links are links to other objects of the same
type. These are called self-links. if no type and default are
specified for the call to {\tt persistent{}}, then the property is a
self-link.
\end{description}

\subsubsection{Keys}
^^^^^^^^^^^^^^^^^^^^

It is possible to use persistent properties as alternative object
identifiers for the default object identifier ({\tt object\_id}). Only
descriptors can be used as keys. Keys are alway unique and indexed.

The special attribute {\tt keys} contains a list of attributes and
tuples of attributes tuples, each specifying one key. For example:

\begin{verbatim}
#example3.py
class Employee(DBObject):
    ssi = pesistent('Social Security Number', str, '')
    name = persistent('Name', str, '')
    birth = persistent('Birth data', DateTime, None)
    keys = [('ssi',), ('name', 'birth')]
\end{verbatim}

In this example {\tt ssi} is a key. The pair of attributes {\tt
('name', 'birth')} is also a key.

\subsubsection{Indices}
^^^^^^^^^^^^^^^^^^^^^^^

Databases use indices to optimize queries. It is possible to specify
which persistent properties should be used as indices.Only descriptors
can be used as indices.

The special attribute {\tt indices} contains a list of attributes
which should be indexed. E.g.:

\begin{verbatim}
# example4.py
class Example(DBObject):
    attr = persistent('A measurement', float, 0.0)
    indices = ['attr']
\end{verbatim}

\subsection{Persistent Objects}
-------------------------------

Having specified persistent classes, we can now use these classes to
instantiate and manipulate persistent objects. In most respects these
objects behave just like instances of ordinary classes. There are two
exceptions: special rules for instantiation, and special rules for
assigning values to persistent properties.

\subsubsection{Object instantiation}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We can distinguish between three different modes of instantiating a
persistent object.

\begin{description}
\item [New] We are creating a new persistent object, for which the
{\tt object\_id} needs to be generated. This can be accomplished by
instantiating an object without specifying {\tt object\_id}.
\item [Existing] We are using an existing object. If the object has
already been instantiated in this application we want a copy to its
reference, otherwise we want an instance, whose state has been
retrieved from the database. This can be accomplished by instantiating
the object with an existing {\tt object\_id}.
\item [Transient] it may be useful to build an object of a persistent
type that is not itself persistent (whose state, will not be save to
the database). This can be accomplished by instantiating the object
with an {\tt object\_id} equal to 0 (zero)
\end{description}

or, in code:

\begin{verbatim}
a = MyObject()               # A new instance of MyObject
b = MyObject(object_id=1000) # An existing instance of MyObject
c = MyObject(object_id=0)    # A transient instance of MyObject
\end{verbatim}

In practice, objects are rarely instantiated with an explicit {\tt
object\_id}, because, we will generally not know the {\tt object\_id}
of the objects we are interested in. Rather, objects are instantiated
using keys or as the result of a query (see below)

Instantiating an object using a key, will result a restored object (if
an object of that key did exist before) or a new object. In code:

\begin{verbatim}
class Filter(DBObject):
    band = persistent('the band name', str, '')
    keys = ['band']

f = Filter(band='V')       # The V-band filter
\end{verbatim}

\subsubsection{Assigning values to properties}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Python is a dynamically typed language. This means that there is no
such thing as the type of a variable. However, since database values
(e.g. columns) are statically typed, the interface performs type
checks when binding values to object attributes. The type is specified
in the property definition, as outlined earlier.

\subsection{Queries}
--------------------

In order to represent selections in native Python code, we have
defined a notation that is based on the idea that a class is in some
sense equivalent to the set of all its instances. To illustrate the
concept, let us give a few examples.

Given a persistent class {\tt X} with persistent property {\tt y},
then the expression
\begin{verbatim}
X.y == 5
\end{verbatim}
represents the set of all instances {\tt x} of {\tt X}, or subclasses
of {\tt X}, for which {\tt x.y==5} is true. To obtain these objects
the expression needs to be evaluated, which can be done by passing it
to the {\tt select} function, which returns a list of objects
satisfying the selection.

Given a class {\tt X} with a descriptor {\tt desc}, a descriptor list
{\tt dsc\_lst}, and a link {\tt lnk}, then
\begin{verbatim}
select(X.desc > 2.0 && X.dsc_lst[2]=='abc' and X.lnk.attr == 5)
\end{verbatim}
will return a list of instances {\tt x} of {\tt X}, or subclasses of
{\tt X}, for which
\begin{verbatim}
x.desc > 2.0 and x.dsc_lst[2]=='abc' and x.lnk.attr == 5
\end{verbatim}
is true.

\subsection{Functionality not addressed by the interface}
---------------------------------------------------------

New persistent objects may have an owner. The owner can defined as the user
running the process in which the persistent object is created or it can be
defined as an attribute of the persistent object.
In either case, it is the responsibility of the implementation of the interface
for a certain database to handle ownership of persistent objects.