Abstraction and Identity

This collection of notes on OOP was never meant to stand alone. It also represents a view of OO circa early to mid 1990s. Some people still find them useful, so here they are, caveat emptor. Special thanks to Gilbert Benabou for taking to time to compile the first printable version of this document and inspiring us to provide it.

[PDF]

Printable Version

Abstraction

An abstraction is a simplified description of a system which captures the essential elements of that system (from the perspective of the needs of the modeler) while suppressing all other elements.

The boundary of the abstraction must be well-defined so that other abstractions may rely on it. This cooperation between abstractions relies on the contract of responsibilities that an abstraction provides. This relationship is sometimes called client and server.

The protocol of a server is the set of services it provides to clients and the order in which they may be invoked.

Classes

In OO, we represent our abstractions mostly as classes.

Booch says,

"A class is a set of objects that share a common structure and a common behavior."

I don't like this definition, since it reverses the chicken/egg relationship between object and class. So, you have a bunch of similar objects? Then they are a class. But how did you get the objects to begin with? From a class definition, of course.

In OOA we will typically recognize the need for an active object from an analysis of the requirement specification. From that active object we can say immediately that we'll need a class. In practice you can't stop your brain from thinking of the two simultaneously. A class consists of two parts, an interface, and an implementation.

Interface

This is the outside view of the class. The part visible by everybody (or at least any object who has the name of an object from this class).

Most OO languages also other interfaces to a class, for example a special one for subclasses, or for other classes in the package.

Implementation

This is the actual code that implements the behavior of the class. Objects are asked to do things in messages which include the name of one of the member functions.

What happens in here can't effect clients of objects of this class, as long as the interface doesn't change.

The instance variables that define what an object may know are part of the protected or private part of the interface. This encapsulates this information, allowing changes in a class without effect on its clients.

UML Representation

Classes are usually found in class diagrams. They may have very little or a lot of detail in their representation.

Each class rectangle can have a name, attributes, and operations compartment.

Example - Temperature Sensor

Booch uses the example of a hydroponics greenhouse to illustrate the elements of the object model. Plants grow in a nutrient solution; all aspects of the greenhouse (temperature, humidity, nutrient content, light, pH, etc) must be carefully controlled. All of these conditions must be monitored and controlled to maximize the yield of the farm.

Since we need to control the various factors, it is clearly important to know what they are. If we had a requirement specification for this hydroponic farm, we would see the word "sensor" in many places.

Consider a sensor that measures the temperature at some location. A temperature is a number with a certain range, and known to a certain precision, given in a certain system of units.

A location is an identifiable, unique place on the farm.

What must a temperature sensor do for us? The requirement spec might look something like this:

"Sensors for temperature, pressure, and humidity will be located throughout the farm."
"Temperature sensors will have a unique location, be able to return the current temperature they are sensing, and be able to calibrate themselves to an actual temperature."

What data does our TempSensor class need to know?

the current temperature
where it lives (unique location)

What behavior does our TempSensor class need?

provide the temperature
calibrate itself

These actions are the contract established by our temperature sensor class for its' clients.

How does this look in C++? First cut:

class TempSensor {
 void calibrate(float actualTemp);
 float currentTemp();
 float temp;
 int location;
};

This simple class should illustrate what we're studying in this aspect of the OO model, mainly the two aspects of implementation and interface that our abstractions support. Here's a second cut:

class TempSensor {
public:
 void calibrate(float actualTemp);
 float currentTemp() const;
private:
 float temp;
 int location;
};

The private part of the class is the set of variables and methods that the TempSensor class uses to perform its responsibilities (i.e. to be and act like a temperature sensor). These are the implementation of the abstraction, and are a private matter for the class.

The const keyword means that the function currentTemp won't modify the variables of the object. They can be applied to const objects, whereas other functions (without the const) cannot.

Should I even be talking about the type of variables we use to store the data at this point? No, since that's a private implementation detail.

All we have here is a definition for an abstraction that is a temperature sensor.

What other things might we know about the temperature sensors we employ in the greenhouse?

name of the sensor manufacturer, model number

We abstracted this detail away, even though we probably will know it, since it isn't germane to the problem. When might this be germane?

So far we also only have a passive, abstract thing (the class definition). This doesn't do any work. To do some work, we need to instantiate an actual, active, dynamic, living, breathing, TempSensor object:

TempSensor gh1Sensor(1);
TempSensor gh3Sensor (2);
float t = gh1Sensor.currentTemp();

Now we have two TS that we can work with. The last line shows us asking the sensor in GreenHouse 1 what is the current temperature; it is an example of a message. Note the four parts.

How did our temperature sensor objects get their initial values? How did location get set? We passed an integer, but who consumed it?

Third cut:

class TempSensor {
public:
 TempSensor(int);
 void calibrate(float actualTemp);
 float currentTemp() const;
private:
 float temp;
 int location;
};

We added the definition of a special function (or method) to our class so that we can initialize objects of that class. This method is invoked automatically everytime an object of the TempSensor class is created.

What if we need to make our sensor smarter, more autonomous? Then it would be nice to have it be able to tell someone when things got too hot.

What's the classic, procedural means of doing this? invoke a call back function, generate an interrupt, poll the sensor, etc.

OO way: send a message. All work is done this way. To send a message our object obviously needs the identity of who it should notify, and maybe some arguments (i.e. the temperature, or the sensors location), since these describe the event that is happening.

class ActiveTempSensor {
public:
 ActiveTempSensor(int, Alert* helper);
 void calibrate(float actualTemp);
 void establishSetpoint(float setpoint, float delta);
 float currentTemp() const;
private:
 // internal details
};

Now our smart temp sensor can call someone when the temp set point is reached. What exactly is Alert* helper? In C++ (and ANSI C), Alert* means "a pointer to type Alert". helper is the name of the object we call for help, at least in terms of this function parameter. In OO terms we should think of this as the identity of some object who will be messaged when our sensor goes over set point.

Aside: what do we know (care) about this helper object?

Suppose the helper we give to a sensor is a complicated object like a Controller. This class has lots of other responsibilities, but one of them is to know what to do when a temperature sensor sends an alert message.

We could have written the ActiveTempSensor class to use a Controller, like this:

ActiveTempSensor(int, Controller* helper);

but we don't really care what the class of the helper object is, so long as it satisfies the Alert interface:

class Alert {
public:
 void alert(int location);
}

All we really care about it is that it will respond to our alert() message.

Consider testing the ActiveTempSensor class. If we had written ActiveTempSensor to use Controller directly, then we could not test an ActiveTempSensor without first constructing a Controller, which might be an expensive, complicated task.

By using the Alert interface, we can make a quick-and-dirty DummyController class just for testing purposes. Design for test often results in better design.

Notice that the setpoint is not included in the constructor method. This means that our sensor may never have its setpoint set. What should it do? Should we allow this?

Example - Pet class

A Pet should know its name and age. Every Pet should know how to tell you their name and age, and set their name and age. Each Pet must keep track of its location and posture (sitting, standing, laying, etc). Every Pet should be able to "come", but it is expected that different types of Pets implement this behavior in their own way.

class Pet {
public:
 Pet(String);
 void setName(String);
 void setAge(int);
 String getName() const;
 int getAge() const;
 void come();
private:
 String name;
 int age;
 int location;
 int posture;
};

Now that you've got your Pet class definition, you put minimal, dummy code into the implementation, and you can begin instantiating and testing your classes.

#include "Pet.h"
#include libc.h
Pet::Pet(String n)
{
 name = n;
}
void Pet::setName(String n)
{
 name = n;
}
void Pet::setAge(int a)
{
 age = a;
}
String Pet::getName() const
{
 return name;
}
int Pet::getAge() const
{
 return age;
}
void Pet::come()
{
}

Identity

Identity is the property of a thing which distinguishes it from all other objects. Humans have id numbers, fingerprints, DNA profiles. All these are representations of the fact that we are each unique and identifiable.

This element of the object model can be confused with state. State is the set of values that an object encapsulates. Two objects may have identical state, and yet are still separate, distinct, identifiable objects.

Objects in an OO system have distinct identity. It is part and parcel of what makes them an object.

Objects

An analogy: an object is to its class, as a toaster is to the toaster factory. Objects are living, breathing, dynamic entities. They get work done. We create them, often at runtime.

Classes are (mostly) static. They define the capabilities or behavior of an object. They (mostly) don't get work done.

The idea of being able to recognize objects and to think about them indepenent of seeing them develops in humans at a very early age.

Objects were first introduced in Simula. Objects in an OOD may be abstract representations of real things, or useful abstractions for solving our problem which don't have a real-world analog. Booch defines them very generally as anything with a "crisply defined boundary".

There are three major things an object has that we'll learn to look for (discover) and, as needed, invent: State, Behavior, Identity.

What an object knows (state) and what it can do (behavior) are determined by its classification. Identity is required so that we may talk to an object without confusing it with another object, and so that we may have more than one object of a given class alive at the same time.

A synonym for object is instance.

State

This is what an object knows. It is a set of properties, or variables (usually static) which can take different values at different times in the objects life (dynamic).

Since objects have state, their reaction to messages can vary depending on when the messages are sent, and what order they are sent in.

For example, an Employee object should refuse to generate a paycheck for itself if the Employee object has a state variable the value of which makes that Employee inactive.

State variables of an object are sometimes referred to as instance variables, or ivars.

The class definition in C++ defines the state of an object, much like a struct does, but allows for a finer degree of control (encapsulation).

Behavior

This is the set of actions that an object knows how to take. Your Pet object knows a set of tricks (behavior) that you can ask it (message it) to perform.

C++ calls the "tricks" that an object knows member functions. A better name is methods.

In general, we can group the methods of a class into five categories:

Modifier Change the state of the object
Selector Accesses, but does not change the state
Iterator Operates across all parts of an object
Constructor Creates an object and initializes its state
Destructor Frees the state and destroys the object cleanly

The responsibilities that an object has to its clients are key to defining the class that an object belongs to. When we do our OOA/D we'll often recognize a need for some action and that need will create some state and behavior in a class. Objects of that class can then fulfill this need for us.

Objects can be viewed as finite state automata, or simply, as machines.

Objects can be either passive, or active. Passive objects do nothing on their own; they only execute when they are messaged by some other object. Active objects have their own thread of control and may take action while other objects are working. This rather complicated business is considered in the Concurrency aspect of the object model.

UML Representation

Objects are shown in rectangles, with their names underlined, and their type or class after their name. Anonymous objects have no name.

Example: C++ GUI Framework

(modified from Booch, page 92)

Confusing the whole issue of identity is the use of pointers in high level languages. Pointers confuse the difference between an object itself and the name of the object.

Imagine some sort of GUI environment where objects that can be displayed on the screen must be described. One of the attributes of an object is obviously its location. A location is a point in a 2d Cartesian space. How do we represent points? We could use a class, so that we could have point objects, but a point really doesn't do all that much, so instead we use a simpler structure.

Booch Rule of Thumb Structure versus Class

If something is just a record of other primitive data types, and has no really interesting behavior of its own, make it a structure.

struct Point {
 int x;
 int y;
 Point() : x(0), y(0) {}
 Point(int xValue, int yValue) : x(xValue), y(yValue)
 {}
};

The Point structure lets us allocate points with initial values, or if not provided, will initialize the point to (0,0).

Now we need to represent a class for anything that can be displayed. We'll anticipate the need for many different types of things that are to be displayed, and make this class the top level of this particular part of the class hierarchy. DisplayItem is an abstract class designed to be subclassed to be useful.

class DisplayItem {
public:
 DisplayItem();
 DisplayItem(const Point& location);
 void draw(); 
 void erase();
 void select();
 void unselect();
 void move(const Point& location);
 int isSelected() const;
 int isUnder(const Point& location) const;
 Point location() const;
};

Reminders of what the C++ means...

Two constructors, one of which takes a Point which is the location of the item.

C++ lets us dynamically allocate classes and structures with the new operator. The new operator, together with a class constructor, is like malloc in C.

Now look carefully at these declarations:

DisplayItem item1;
DisplayItem* item2 = new DisplayItem(Point(75, 75));
DisplayItem* item3 = new DisplayItem(Point(100, 100));
DisplayItem* item4 = NULL;

Here is a picture representing the code fragment above.

The first line creates an instance (object) of class DisplayItem. The name of this object is item1. No matter what values this DislayItem takes on over the course of its lifetime, its name will always be item1. Do we know where it is stored? The initial location for item1 is determined by the default constructor (0,0). How much memory is allocated by the compiler?

The second and third lines allocate two things each. The first is a pointer. A pointer is simply a primitive data type that can be used to hold the address of another data element (structure, object, int, float, etc).

The second thing these lines of code do is to dynamically allocate (from the heap) enough memory to hold a DisplayItem object. Do we know the name of this object? We can't know the name, because they are dynamically allocated (which is why in the picture the names are lines). The object doesn't exist until run-time. Clearly we can't refer, by name, in our program to an object that doesn't exist until run time. Instead we talk to or refer to these objects indirectly, by the names of the pointers that refer to them.

The fourth line creates a pointer, but doesn't allocate an object for it to point to.

What we did above was to create four names (1 object name and 3 pointer names), three DisplayItem objects, and three pointers.

Now we'll operate on our data space. We need to know a little more C++ syntax first. The operator "." is the C structure access operator. In C++ it is used as the messaging syntax. The form is "object.method()". The operator "–>" is the C structure access operator when you must refer to a structure indirectly, through its pointer. In C++, we use "–>" to message an object through its pointer. The form is "object_pointer->method()".

Re-draw the above picture showing the changes the following code fragment makes.

item1.move( item2->location() );
item4 = item3;
item4->move( Point(38,100) );

The first thing to notice is that item1 and item2 (or rather, the object that item2 refers to) have the same state. Having the same state doesn't change their identity; they are still unique, singular things.

The next thing we've done is to have item4 refer to the same object as item3. We have created an alias for this third DisplayItem object. We can now operate on this object with either the name item3, or item4. The third line changed the state of the object using its new name.

Being able to create name aliases for the same object can cause many problems. One of them is the problem of a dangling pointer. Consider what happens if item3 is destroyed. Now item4 will point to nothing.

I asked you if we knew the address, or memory location of item1 when we declared it. We didn't need to know it, since we had its name. There is an operator in C/C++ that lets us find out the address of any data element. & does this.

Re-draw the picture showing the changes the following code fragment makes.

item1.move(Point(0,0));
item2 = &item1;
item4->move( item2->location());

First we put item1 back to the origin.

Then we changed the object that the pointer item2 refers to. Item1 now has a new name ("item2").

Finally, we move item4 to where item2 is. Is that the point (75,75), as the image might imply? No, item2 doesn't refer to that object any longer. The move results in item4 going to the origin.

We've also lost track of the DisplayItem object that item2 originally referred to. This object, and the memory it lives in, is permanently lost to us. There is no way to get it back. This is known as a memory leak.