It seems every guide on "What is OOP" has the word 'abstract' in the first three paragraphs. No-one seems to be able to describe what OOP actually is or how it works or what it does.
They list off all the object-related C++ keywords and their functions in standard jargon and somewhere between private
and static
your average programmer has lost the plot.
This guide will go back to basics and explain OOP from the ground up, from a language that doesn't have objects.
For this guide you'll need some intermediate knowledge of programming in C and/or C++. If you understand structs, pointers, and the difference between the stack and the heap you're probably good.
What is +?
What is 2 + 2
? You would probably say 4
.
What is "Hello" + "World"
? You would probably say "HelloWorld"
.
And what is the operation used in those two expressions? This is where most people will catch on that addition and concatenation are not the same operation.
But the really interesting thing is that despite most languages using the same symbol for concatenation and addition, they are completely incompatible with eachother.
You can't add strings. That's common sense. You can parse strings into numbers and then add them, but that's an operation in and of itself.
Similarly, you can't concatenate numbers. You can cast numbers to strings and concatenate them (And languages with a dedicated concatenation operator like perl and PHP do exactly that) but the numbers themselves can't be concatenated.
After all, while the number 2
could be cast to "2"
it could just as easily be cast to "10"
in base 2, or to "AAAAAg=="
as a 32 bit integer in base64. Just because we usually want base 10 and our languages cast them to base 10 by default doesn't mean the numbers themselves can be concatenated.
This gets even worse when we talk about different types of numbers. The operation for adding two integers in the standard 2's complement our computers use can be summed up (pun intended) as "Just add them in binary and ignore the carry". Conveniently, this means the operation for subtracting is just addition with a negative number (x - y
can be done bitwise as x + ~y + 1
)
But floating point numbers are a whole different beast. It's basically fixed size base-2 scientific notation, which means that adding them together involves a lot of complicated logic involving exponents and stuff.
The point I'm trying to make here, is that regardless of what name it goes by or what symbol you use, the type of operation you can perform on a piece of data depends entirely on the type of the data itself.
Put simply, OOP extends this to user-supplied data structures.
What is a method?
Since OOP was originally a design pattern in a language that had no objects, I'm going to write some C-style pseudocode with the same idea.
typedef enum {
HEARTS,
SPADES,
CLUBS,
DIAMONDS,
} card_suite;
typedef enum {
ACE,
TWO,
THREE,
// ...
QUEEN,
KING,
} card_facevalue;
typedef struct {
card_suite suite;
card_facevalue value;
} card;
Now that we've defined our card
data structure, we need to be able to do something with it. We can interact with the struct's members individually but we can't use the card
as an individual piece of data.
We can't add or concatenate it. We can't do anything with it at all, because we haven't created any operations for it. The standard way to add reusable code is through functions, so let's add some functions for our card
.
char * card_getSuiteName(card * this);
char * card_getValueName(card * this);
char * card_getString(card * this) {
char * suite = card_getSuiteName(this);
char * value = card_getValueName(this);
size_t len = snprintf(NULL, 0, "%s of %s", value, suite);
char * out = malloc(len + 1);
snprintf(out, len + 1, "%s of %s", value, suite);
return out;
}
Here we have some functions that take a pointer to a card struct and return pointers to strings. As is common in C where namespaces aren't a thing, we're going to manually namespace them by prefixing the name of our library and/or data structure.
Data structures with functions designed to operate on them are called "classes" and those functions are called "methods" – and this is the first bit of real OOP lingo in this guide.
What is this?
You'll notice that these methods take a pointer to the card called this
. In C this
is not a keyword, it's just a parameter name. We could just as easily call it self
as in python or rust. It's a convention to have a pointer named this
as the first parameter that allows the method to operate on the value.
Through this you can see how the following methods would be implemented. You would just alter the members of the class via the this
pointer.
void card_setSuite(card * this, card_suite suite);
void card_setValue(card * this, card_facevalue value);
Now let's detour into C++ for a minute. C++ handles most of the OOP boilerplate for you. This obfuscates the internals and probably contributes to the terrible OOP guides I complained about in the first paragraph.
But it's also true that most OOP languages take cues from C++ syntax the way most C-based languages use curly braces, so it's important to know.
Our class will look something like this in C++:
class card {
public:
card_suite suite;
card_facevalue value;
const char * getSuiteName();
const char * getValueName();
char * getString() {
const char * suite = this->getSuiteName();
const char * value = this->getValueName();
size_t len = snprintf(NULL, 0, "%s of %s", value, suite);
char * out = (char *) malloc(len + 1);
snprintf(out, len + 1, "%s of %s", value, suite);
return out;
}
void setSuite(card_suite suite);
void setValue(card_facevalue value);
};
There are a few benefits to C++ that should be immediately obvious. First off, C++ will create and pass your this
pointer automatically.
Secondly, since the methods are syntactically called from the instance directly, we no longer need to namespace them. That means a lot less typing for us.
What is an instance?
We've defined a card class and a few operations we can perform on it (methods) but how do we actually make a card?
card ace_o_spades = {
SPADES,
ACE,
};
So far making a card is pretty easy. In this example ace_o_spades
is an instance of the class card.
But what if we also wanted our card class to also store a dynamically generated PNG of a card face? The process would be a lot more complicated.
What is a static method?
In order to handle this we'll create a static method. A static method is any method in a class that doesn't have a this
pointer to an instance.
card create_a_card(card_suite suite, card_facevalue value) {
card output;
output.suite = suite;
output.value = value;
// Allocate our PNG here
return output;
}
As you can see the card doesn't exist when this method is called, so there's no pointer to pass in as this
. Now if we want to make a card we call the create_a_card
static method. This way if we need to malloc
for a generated PNG or something we can put it in this static method and it will be handled.
But if we allocate memory when we create a card, we'll need to deallocate it when we're done with the card. We can't just let the card fall off the stack, that would cause a memory leak!
So we add another method to deallocate stuff, and then call it whenever we're done with the card:
void destroy_a_card(card * this) {
// Deallocate our PNG here
}
This way if we decide to allocate more stuff in create_a_card
we only have to change destroy_a_card
and as long as it's called everywhere everything should work fine. We've essentially made a more complicated version of malloc
and free
specially for our class.
card ace_o_spades = create_a_card(SPADES, ACE);
destroy_a_card(ace_o_spades);
Now if we're in C++ we don't have to add the this
pointer to methods anyway. Since static methods are the exception, we add the static
keyword to the create_a_card
method so C++ knows that it's different.
class card {
public:
card_suite suite;
card_facevalue value;
static card create(card_suite suite, card_facevalue value) {
card output;
output.suite = suite;
output.value = value;
// Allocate our PNG here
return output;
}
void destroy() {
// Deallocate our PNG here
}
};
We can then call these methods like this:
card ace_o_spades = card::create(SPADES, ACE);
ace_o_spades.destroy();
Note the different syntax for calling static methods in C++.
What is a constructor?
Well you just made one actually. A constructor is a method called when you make a new instance, and a destructor is called when you destroy an instance. But unlike create_a_card
and destroy_a_card
the real convention is card_new
and card_destroy
.
card card_new(card_suite suite, card_facevalue value) {
card output;
output.suite = suite;
output.value = value;
// Allocate our PNG here
return output;
}
void card_destroy(card * this) {
// Deallocate our PNG here
}
Why did I have you do all that with the wrong convention? Because I wanted to show you how static methods work, and in particular how they're called in C++. You see in C++ the constructor and destructor are handled automatically by the language.
class card {
public:
// ...
card(card_suite suite_input, card_facevalue value_input) {
this->suite = suite_input;
this->value = value_input;
// Allocate our PNG here
}
~card() {
// Deallocate our PNG here
}
};
The constructor in C++ is named the same as the class, and the destructor is the same but prefixed with '~'. You don't have to define them if you don't want to, the language will create "empty" constructors and destructors if you leave them out.
You can instantiate the class like this:
card ace_o_spades(SPADES, ACE);
card * king_o_hearts = new card(HEARTS, KING);
delete king_o_hearts;
Notably, in C++ the constructor is not a static method. While it's called without a pre-existing instance, the allocation of the instance is handled by the language before the constructor runs.
Since the structure is already allocated in the constructor, you can use this
to assign to members and call methods. You also don't have to return the instance in the constructor, because the language handles it for you.
If it's constructed like ace_o_spades
here it's allocated on the stack and the destructor will be called when it goes out of scope, while the new
keyword causes a dynamic allocation like malloc
where delete
is the C++ OOP version of free
.
What is inheritance?
So far it's all fine and dandy if we have a 52 card deck, but lots of card decks come with jokers. Jokers have different behavior from normal cards. Most notably they have no suite and getString
shouldn't return $value of $suite
.
Inheritance allows you to reuse large swaths of code. While it's not strictly required for code to be OOP, it's widely considered the "killer feature" of OOP and allows for polymorphism (Code that works even on data types it doesn't know about!)
Let's implement a card54
subclass of card
in C that adds a flag for whether the card is a joker, and let's do it in a way that reuses our old code:
typedef struct {
card _parent;
char joker;
} card54;
card54 card54_new(card_suite suite, card_facevalue value, char is_joker) {
card54 output = {
card_new(suite, value),
is_joker,
};
return output;
}
void card54_destroy(card54 * this) {
card_destroy((card *) this);
}
char * card54_getString(card54 * this) {
if (this->joker) {
char * out = malloc(6);
strcpy(out, "Joker");
return out;
} else {
return card_getString((card *) this);
}
}
char card54_isJoker(card54 * this) {
return this->joker;
}
Let's look at the structure first. The first element in the subclass is the structure of the _parent
class card. The reason for this is found in card54_getString
: If the parent class is at the start of the subclass then the parent and sub class have the same pointer!
That means that when you want a member from a parent class (of any arbitrary depth) you can just cast the pointer to that type and access the member as normal.
So if we have a card54
that isn't a joker and we call card54_getString
then that will cast this
to a card *
and call card_getString
with it and that's how we can reuse methods from a parent class that doesn't need to know the children exist.
We make sure to call the parent constructor in our constructor, and the parent destructor in our destructor.
There are some problems with this though. First off, while we can call both card54_getString
and card_getString
on our card, card_getString
will give us bad data if we have a joker since it doesn't check for the joker field which doesn't exist on the card
class to begin with.
Secondly, we can't stop another card_
method from calling card_getString
assuming it will work. Ideally we would automatically have a call to card_getString
go to card54_getString
if this
was a card54
but we have no way of knowing that it is.
void card_printString(card * this) {
char * name = card_getString(this);
puts(name);
free(name);
}
We can solve this issue using function pointers:
typedef struct card card;
typedef struct card {
card_suite suite;
card_facevalue value;
char * (*getString)(card *);
} card;
char * card_getString(card * this);
card card_new(card_suite suite, card_facevalue value) {
card output = {
suite,
value,
&card_getString,
};
return output;
}
void card_printString(card * this) {
char * name = this->getString(this);
puts(name);
free(name);
}
char * card54_getString(card54 * this);
card54 card54_new(card_suite suite, card_facevalue value, char is_joker) {
card54 output = {
card_new(suite, value),
is_joker,
};
((card *) &output)->getString = (char * (*)(card *)) &card54_getString;
return output;
}
Some notes about this code:
- This is why we use header files:
- You have to declare the
card
typedef ahead of time since the function pointer inside the struct takes a pointer to the struct as an argument - You have to declare
card_getString
beforecard_new
- You have to declare
card54_getString
beforecard54_new
- You have to declare the
- We have to either typecast the function pointer in
card54_new
, or just usevoid *
everywhere
Now we've solved the original problem. If we call the getString
function via the function pointer in the struct then we can call card54
code from card
code that doesn't know card54
even exists!
But this has caused some new problems…
- We have to do this for all our non-static methods
- So now our struct just went from 8 bytes to 40 bytes in size
- And we have to assign the pointers on every construct
typedef struct card {
card_suite suite;
card_facevalue value;
void (*destroy)(card *);
char * (*getSuiteName)(card *);
char * (*getValueName)(card *);
char * (*getString)(card *);
} card;
char * card_getString(card * this);
void card_destroy(card * this);
char * card_getSuiteName(card * this);
char * card_getValueName(card * this);
card card_new(card_suite suite, card_facevalue value) {
card output = {
suite,
value,
&card_destroy,
&card_getSuiteName,
&card_getValueName,
&card_getString,
};
// Allocate our PNG here
return output;
}
void card_printString(card * this) {
char * name = this->getString(this);
puts(name);
free(name);
}
void card54_printString(card54 * this) {
char * name = ((card *) this)->getString((card *) this);
puts(name);
free(name);
}
This is a nightmare for performance, maintenance, and more. The solution is vtables.
What is a vtable?
We only need a separate list of function pointers for each class, not for each instance. In other words, we can make a global static struct with the pointers for this class, and stick the pointer to that structure in our class.
And it's called a vtable.
typedef struct card card;
typedef const struct {
void (*destroy)(card *);
char * (*getSuiteName)(card *);
char * (*getValueName)(card *);
char * (*getString)(card *);
} card_vtable;
char * card_getString(card * this);
void card_destroy(card * this);
char * card_getSuiteName(card * this);
char * card_getValueName(card * this);
static card_vtable card_vtable_concrete = {
&card_destroy,
&card_getSuiteName,
&card_getValueName,
&card_getString,
};
typedef struct card {
union {
card_vtable * vtable;
} _;
card_suite suite;
card_facevalue value;
} card;
card card_new(card_suite suite, card_facevalue value) {
card output = {
&card_vtable_concrete,
suite,
value,
};
// Allocate our PNG here
return output;
}
Here we make a static const struct card_vtable
called card_vtable_concrete
with pointers to our methods, and assign a pointer to it inside our instance. Now we can call card_getString
via this->_.vtable->getString
, and if we point to a different vtable with a different function pointer then the function that's called will also change.
Yeah it means an extra pointer dereference when we need to call a method, and a lot of boilerplate, but it solves our problem. We only need a single pointer worth of memory per instance to make all our methods work, saving lots of memory.
We use a union for the benefit of child classes like card54
:
typedef struct card54 card54;
typedef const struct {
card_vtable _parent;
void (*someCustomThing)(card54 *);
} card54_vtable;
char * card54_getString(card54 * this);
void someCustomThing(card54 * this) {}
static card54_vtable card54_vtable_concrete = {
{
&card_destroy,
&card_getSuiteName,
&card_getValueName,
(char * (*)(card *)) &card54_getString,
},
&someCustomThing,
};
typedef struct card54 {
union {
card54_vtable * vtable;
card parent;
} _;
char joker;
} card54;
card54 card54_new(card_suite suite, card_facevalue value, char is_joker) {
card54 output;
output._.parent = card_new(suite, value);
output._.vtable = &card54_vtable_concrete;
output.joker = is_joker;
return output;
}
void card54_printString(card54 * this) {
char * name = ((card_vtable *) this->_.vtable)->getString((card *) this);
puts(name);
free(name);
}
As you can see we're reusing the same techniques from our earlier adventures:
- The
card54_vtable
has thecard_vtable
at the start so that the offsets are the same - By putting the pointer to the vtable in the class as the first element, it too is always at the same location
- We can then access parent struct members and the vtable pointer through the union
- And we assign the vtable pointer in the constructor
Unfortunately, try as I might I can't get it to compile by "Dynamically" generating card54_vtable_concrete
based on card_vtable_concrete
. The compiler wants a constant expression and refuses to even consider struct assignment as constant, so you have to write out the full vtable of the parent class as well.
In the end that's a LOT of boilerplate. C just isn't designed for this kind of thing.
C++ is designed for it. C++ does all of this for you automatically. The pointer type casting, the vtables, everything. It's all handled under the hood.
class card {
public:
card_suite suite;
card_facevalue value;
card(card_suite suite_input, card_facevalue value_input) {
this->suite = suite_input;
this->value = value_input;
// Allocate our PNG here
}
~card() {
// Deallocate our PNG here
}
char * getString();
const char * getSuiteName();
const char * getValueName();
};
class card54: public card {
public:
char joker;
card54(card_suite suite_input, card_facevalue value_input, char is_joker): card(suite_input, value_input) {
this->joker = is_joker;
}
void someCustomThing();
char * getString() {
if (this->joker) {
char * out = (char *) malloc(6);
strcpy(out, "Joker");
return out;
} else {
return card::getString();
}
}
void printString() {
char * name = this->getString();
puts(name);
free(name);
}
};
Note that C++ uses syntax like card::getString
to explicitly call parent methods from subclasses, in the same way C calls the method directly. If you called this->getString()
you'd just end up recursing.
Also note that the card
constructor automatically runs before the card54
constructor, but if it has arguments you need to tell the language how to call it. The destructor in card54
in this example can be left out entirely. C++ will handle it for you.
It should be obvious now why C++ gained so much more popularity than C did.
What is visibility?
If you know a bit of C you'll understand the difference between header and source files. The header files contain declarations and some definitions needed for your library to be used from outside code, while the source code is compiled to a shared object file (Or DLL on the dreaded windows)
This lets C do something clever. It lets you write code in your source file that's only used internally, and not publicly available for others to use because the function signatures aren't in the header files.
This is called encapsulation, and it's key to presenting a pleasant interface to external code. No-one needs to know the million edge case related functions in an XML library, they just want to parse the file.
We would say that the functions declared in the header file are "visible" to outside code, and functions declared in the source files are hidden.
In C++ visibility becomes a bit more complex because of inheritance. You may frequently have "internal" code that you want to be available to subclasses, but not to be callable from outside.
So we're left with 3 different visibility modes that you'll find in most programming languages: public
private
and the new protected
which can be accessed from specific contexts.
Additionally, C++ lets you mark class members as public
private
or protected
which can stop people changing specific members without your knowledge. Having private members that can only be accessed by methods is a standard pattern called "getters & setters".
class card {
protected:
card_suite suite;
card_facevalue value;
public:
void setSuite(card_suite s) {
this->suite = s;
}
void setValue(card_facevalue v) {
this->value = v;
}
card_suite getSuite() {
return this->suite;
}
card_facevalue getValue() {
return this->value;
}
};
In this example subclasses can access the suite
and value
members directly, that is they are "visible" to subclasses. But they are invisible to outside code so it has to use the setSuite
/setValue
/getSuite
/getValue
methods to do so. You can then add custom behavior to the getters and setters.
What is virtual? What is abstract? Operators? Overloaders? Multiple inheritance? The diamond problem?
Well now we're getting well beyond the scope of a basic guide, but lucky for you there are a lot of other OOP guides that seem to start off here. Good luck!