C. Keith Ray

C. Keith Ray writes about and develops software in multiple platforms and languages, including iOS® and Macintosh®.
Keith's Résumé (pdf)

Monday, January 27, 2014

Some thoughts on C, OO, ObjC, C++

C allows structs to be copied "by value" but C doesn't allow array types to be copied—directly. The following code will not compile:

typedef int FiveIntsArray[5];

FiveIntsArray function5ints( FiveIntsArray arrIn )
{
    FiveIntsArray ret:
    ret = arrIn;
    return ret;
}

int main(int argc,char*argv[])
{
    FiveIntsArray arr = {1,2,3,4,5};
    FiveIntsArray bar;
    bar = function5ints(arr);
    printf("%d, %d, %d, %d, %d\n", bar[0], bar[1], bar[2], bar[3], bar[4]);
}

Yes, it does not compile:



But put all this inside a struct, and it all works!

typedef struct FiveIntsStruct {
    int arr[5];
} FiveIntsStruct;

FiveIntsStruct function5ints( FiveIntsStruct arrIn )
{
    FiveIntsStruct ret;
    ret = arrIn;
    return ret;
}

int main(int argc,char*argv[])
{
    FiveIntsStruct arr = {{1,2,3,4,5}};
    FiveIntsStruct bar;
    bar = function5ints(arr);
    printf("%d, %d, %d, %d, %d\n", bar.arr[0], bar.arr[1], bar.arr[2], bar.arr[3], bar.arr[4]);
}

And I get this output:

1, 2, 3, 4, 5
Program ended with exit code: 0

Ta-da!

However, struct-copying, at least in the early days of C, was thought to be not very efficient, and was often avoided in C code. This tradition is so ingrained in C programmers that I have met 
some programmers who didn't know that struct-copying is allowed in C.

For small structs, and under modern compilers, struct-copying can be just as efficient as passing a single value, and more efficient than passing a pointer to a struct, which requires dereferencing the pointer to access its members. A 64-bit CPU can hold an entire FiveiIntsStruct in a single register.

If you want to do object-oriented programming in C, one way to do that is to start with structs. We can associate functions and structs, and we already have an example of something object-like in the C standard library: the file io functions.

typedef struct FILE { 
    // we don't care what's in here--consider it private.
} FILE;

FILE * fopen(const char *restrict filename, const char *restrict mode);
// fopen allocates & initializes, kind of like a C++ constructor.

int fprintf(FILE * restrict stream, const char * restrict format, …);
// functions taking a FILE* arguments are methods of the FILE "class".

int fclose(FILE *stream);
// cleans up and deallocates, kind of like a C++ destructor.

But a big part of OO is polymorphism. How can we accomplish that? Let's say we want a file-writer, and a network-writer, but the majority of the code should work with both kinds of writer. We can do it a bit like this:

struct Writer;

typedef int (* PrintFunc)(struct Writer * w, char const * format, ...);
// pointer to function that returns int
// and has Writer* as the type of its first argument.

typedef void (* CloseWriterFunc)(struct Writer * w);
// close and de-allocate

typedef struct Writer {
    PrintFunc Print;
    // ...other function pointers
    CloseWriterFunc Close;
} Writer;

Writer * NewFileWriter(const char *restrict filename);
Writer * NewNetworkWriter(const char *restrict serverName);

int main(int argc, char*argv[])
{
    Writer * w;
    int useFile = 1;
    
    if ( useFile )
        w = NewFileWriter("someFile.txt");
    else
        w = NewNetworkWriter("someServer");

    w->Print(w, "etc.");
    w->Close(w);
    w = NULL;
}

Implementing this in one or more .c files:

typedef struct {
    Writer wpart;
    // file-writer specific data members go here.
} FileWriter;

typedef struct {
    Writer wpart;
    // network-writer specific data members go here.
} NetworkWriter;

// must match function-pointers...

static int FileWriterPrint(Writer * w, char const * format, ...)
{
    FileWriter * fw = (FileWriter *) w;
    // real code goes here.
    return 0;
}

static void FileWriterClose(struct Writer * w)
{
    FileWriter * fw = (FileWriter *) w;
    // real code goes here.
    free(fw);
}

static int NetworkWriterPrint(Writer * w, char const * format, ...)
{
    NetworkWriter * fw = (NetworkWriter *) w;
    // real code goes here.
    return 0;
}

// other NetworkWriter functions...

static void NetworkWriterClose(struct Writer * w)
{
    NetworkWriter * fw = (NetworkWriter *) w;
    // real code goes here.
    free(fw);
}

Writer * NewFileWriter(const char *restrict filename)
{
    FileWriter * fw = malloc( sizeof( FileWriter) );
    fw->wpart.Print = FileWriterPrint;
    fw->wpart.Close = FileWriterClose;
    // real code goes here.
    return (Writer *) fw;
}

Writer * NewNetworkWriter(const char *restrict filename)
{
    NetworkWriter * nw = malloc( sizeof(NetworkWriter) );
    nw->wpart.Print = NetworkWriterPrint;
    nw->wpart.Close = NetworkWriterClose;
    // real code goes here.
    return (Writer *) nw;
}

The syntax isn't that nice, but a preprocessor could automate much of this.

And, in fact, both Objective-C and C++ started out as preprocessors generating C code.

There is one problem with this implementation besides its awkward syntax. Each instance of this struct contains copies of the function-pointers. This overhead can be eliminated by another level of indirection.

// there will be one instance of this struct per "class"
typedef struct WriterFunctions {
    PrintFunc Print;
    // ...other function pointers
    CloseWriterFunc Close; // close and de-allocate
} WriterFunctions;

typedef struct Writer {
    // writer data
    WriterFunctions* methods;
} Writer;

static WriterFunctions FileWriterFunctions = {
    FileWriterPrint, FileWriterClose
};

static WriterFunctions NetworkWriterFunctions = {
    NetworkWriterPrint, NetworkWriterClose
};

Writer * NewFileWriter(const char *restrict filename)
{
    FileWriter * fw = malloc( sizeof(FileWriter) );
    fw->methods = FileWriterFunctions;
    // real code goes here.
    return (Writer *) fw;
}

Writer * NewNetworkWriter(const char *restrict filename)
{
    NetworkWriter * nw = malloc( sizeof(NetworkWriter) );
    nw->methods = NetworkWriterFunctions;
    // real code goes here.
    return (Writer *) nw;
}

This is probably not that far from the implementation that C++ uses, where polymorphic ("virtual") function-pointers are kept in "vtables".

C++ contorts the C language into pretending classes are just "special" structs. But a C++ class or struct may have many hidden things: a vtable pointer, an implicitly-defined default constructor calling constructors on its member variables, an implicitly-defined copy constructor calling copy constructors of its member variables, an implicit-defined assignment operator calling assignment-operators on its member variables, and an implicitly-defined non-virtual destructor calling the destructors of its member variables.

And C++ has some gotchas: if the base class in a class hierarchy has an implicitly-defined non-virtual destructor, calling delete on a pointer of the base-class type (but it is pointing to a derived-class object), the derived-class destructor does not get called, which may destroy the correctness of the system.

Objective-C adds classes and objects as something different from structs, adding some pretty self-contained Smalltalk-like syntax to C. Without operator overloading and other contortions of syntax, what you see in Objective-C code is generally what you get. Not much invisible or implicit stuff in the language itself (though the implementation of method-lookup isn't normally invisible). With ARC, retain, release, and autorelease are mostly invisible/implicit, and I'm pretty sure Apple has done some very interesting things at run-time for certain technologies like Core Data.

If you know where the sharp edges are, C++ can be pretty cool. And you can blend C++ and Objective-C if you want to. C++ constructors and destructors will be invoked (invisibly/implicitly) for you in Objective-C's equivalents of constructors and destructors. 

Most of the functionality of ARC could have been implemented with appropriate C++ smart pointer template classes, but I'm sure most Objective-C programmers would not want to sweat the details of the required syntax.

So there it is.

No comments:

Post a Comment