15 August 2014

Java has deep expression problem for beginning students

There are many problems with Java as the first programming language to teach students if we wish to provide the most effective learning experience.  I've written on this in Learn Python instead of Java as your first language in the past even.  So what now?

Newbie, meet the Expression Problem

Stuart Sierra provides a very lucid explanation of the Expression Problem, a classic problem in software programming, in Solving the Expression Problem with Clojure 1.2.  Needless to say, Clojure provides a very clean solution.

Java, however, is a quagmire and requires some heavy OOP software engineering concepts to solve the Expression Problem.  One wouldn't ordinarily think this has anything to do with beginning students just learning to program though, but it does, and here's how.

Imagine our beginning student, "Sam", starts to learn Java and eventually starts to write a classic game of asteroids.  Sam plugs away and gets a decent game of a single player ship shooting lasers at one kind of asteroids to begin working.  Not bad!  But Sam wants to do more.  Sam wants to not just have one kind of (big) asteroids, he also wants to have small asteroids to shoot at.

Alright, so Sam begins to modify the BigAsteroids class to also be able to represent a smaller sized kind of asteroids.  The teacher catches wind of this and tells Sam, "no, that's not good", and that Sam needs to use OOP principles to write a different class for SmallAsteroids.

Now most students would say "why, Mr. Teach", my way works.  But Sam is a good student and does as he's told.

So Sam goes and creates a second class for SmallAsteroids.  Except his program was built presuming that the only things to draw, to shoot lasers at, and to move around, were BigAsteroids.  None of those methods he wrote to draw, to shoot lasers at, and to move around BigAsteroids work for SmallAsteroids.  hmm...  Welcome to the Expression Problem, Sam.



Interfaces to the rescue, of course!  Sam is lucky in that he can side-step the Expression Problem in this case, since he can open up previous classes he wrote to re-compile them against an interface if he wants, while the classic Expression Problem rules out this possibility.  Two problems here right away for our beginning student, Sam.

First problem: having to reopen old classes breaks the Open-Closed Principle.  Second problem: Sam needs to learn to use Java Interfaces.  That means that not only does Sam need to learn a more advanced programming language feature just to write two classes of asteroids, he also gets to practice breaking an important OOP principle.  If OOP is so great, why does it require such heavy machinery just to write two classes while breaking one of its own principles?

Anyway, Sam is unfazed.  He writes an Asteroids Interface, opens up and modifies BigAsteroids to implement the Asteroids interface, then finishes writing SmallAsteroids which also implements the Asteroids interface.

And that still doesn't compile.  Because all the methods he wrote were written statically against BigAsteroids, not the Asteroids Interface.  So Sam has to modify pretty well every method he's ever written to use the Asteroids Interface instead.  All this just to add a second class of asteroids.

You can forgive Sam if he thought the teacher ridiculous for suggesting that Sam's way of opening up his original BigAsteroids class to add support for a smaller size of asteroids was somehow considered worse than having to write an interface, write a second class, modify every method he's ever written, then still having to open up and modify his original BigAsteroids class anyway!

I'm not against interfaces, of course. I know their value as a tool in software development. My point is the student is learning and needs to learn things correctly, but the tools are so ridiculous that doing the right thing means learning ridiculous things, and students are fairly aware of what feels ridiculous.  In the above scenario, "Sam" will likely find the whole concoction ridiculous, perhaps even deciding right away that computing science just isn't for him.

Fortunately, Sam is a good student and does as he's told.

Soon after, Sam wants to make the game harder by having every asteroid, every laser, and the player, all have a concept of hit points.  Well, actually, from the start, Sam had a hitPoint instance variable in every relevant class as he knew he would eventually support a hit points system, but he just didn't write methods for it yet.  So now he needs to write some methods, like a getHitPoint method so that the game user interface can display the hit points of various objects the user clicks on (for example).

Of course, there are more than one method to implement.  In fact, Sam writes a HitPoints Interface as he's learned his lesson from before, and now tries hard to code to interfaces.

Now where does Sam write the getHitPoint and other HitPoints methods?  He has to open up every relevant class he's ever written, declare that they implement HitPoints and add in those methods.  That's where.

But Sam is now a little more knowledgeable and knows how to write subclasses, so guess what Sam does?  He writes subclasses to add in his getHitPoint and other HitPoints methods instead.  Subclasses!  Sam goes on to argue that he's just trying to fulfill the Open-Closed Principle (such a keener, he read that on Wikipedia).  That's what subclasses are for, no?  To extend functionality?  Heck, in this case, he may even say it satisfies the Liskov Substitution Principle (what a keener).

And in so doing, Sam now has two laser classes, four asteroid classes, and two player classes, despite only having half that number of types of things actually visible to the player of the game.  That violates what may be called the Principle of Ontological Parsimony (POP).

Before we talk about POP, note that Sam of course could just open up all those classes and add in the methods he needs to support the HitPoints interface.  After all, he owns the code, it's not like he can't open them up.  This is obviously a little bit of a contrived example, but a beginning student to Java doesn't get to make these choices in an informed manner as, by definition, they're beginning students.

Beginning students have to plod along doing the best they can with very little knowledge or information.  The tool they use either makes it difficult or makes it easy for them to do the right thing --- and we want the students to do the right thing!  Java makes that exceedingly difficult.

Principle of Ontological Parsimony

POP is a principle with a deep philosophical root [1], going all the way back to Occam's Razor.  Technical definitions aside, it basically means that "Entities are not to be multiplied beyond necessity" [1].  This means if two theories have the same explanatory power, but one posits the existence of a thing, F, while the second theory does not, then the second theory is simpler and is to be preferred.

Recast for OOP software development, POP would say that if two programs do the same thing, but one requires the existence of a class, F, while the second program does not, then the second program is simpler and is to be preferred (all else being equal, etc.).

Now look back at Sam's program using subclasses to implement the hit point system.  It doubled the number of classes just to do the same thing as if he had opened up every class to add those same methods in.  Obviously using subclasses created a more complicated, less ontologically parsimonious program [2].

Of course, not every situation affords the chance to open up a previously written class to recompilation.  In those situations, perhaps writing subclasses would make sense, but wouldn't it be better if there's a way to add in those methods without multiplying the number of classes beyond necessity?  Clojure, Ruby, JavaScript, and other languages have found ways to do it.

The Real Expression Problem

In the real Expression Problem, recompiling existing code is not allowed, and static type safety must be maintained (no casts allowed) [3]. Those are very reasonable demands in modern software development where you may not have access to the source code of existing classes, and type safety is of course important for writing reliable software.

As demonstrated in the above example with "Sam's" asteroids game program, learning to write even a simple program correctly quickly explodes into writing many more classes and using more advanced features in Java.  To a language lawyer, that's a boon, as learning the tool is the goal.  If learning to write useful or fun programs is the goal, however, then the Java language itself is just incidental complexity [5] in this case that students should not have to be exposed to.

Clojure, Ruby, JavaScript, and other languages, have found ways to avoid that incidental complexity that Java loves to foist on the programmer.  This is not a problem that a teacher can just supply some templates or boilerplate code to help students solve either.

The "correct" solution to the Expression Problem in Java can be found in a paper [4] by Dr. Mads Torgersen, associate professor in computing science at the University of Aarhus, Denmark.  The solution is not simple, requires generics, programming patterns, etc., that are likely beyond most first year undergraduate students, and definitely above what could reasonably be expected of high school students --- templates provided or not.  If "Sam" in our example is a high school student, he's toast trying to learn the "correct" solution.  Unless he was learning in another language like Clojure, of course.

Wait a moment, plenty of students learn Java as their first language: are they toast?  What's going on is that students are plodding along writing terribly structured programs.  Nothing in Java stops students from writing the entire game essentially in a single class using nothing but static methods, no different than writing everything in a global namespace.  Nothing stops teachers from teaching OOP terribly either, and still get students to write programs that "work".  Students can still get some OO-ness in their programs without doing it properly throughout their programs, or without following OO design principles.

In short, students can do a terrible job and still get the job done as far as student projects go, despite learning some very bad habits along the way, but what's really unfortunate is that Java as a tool makes it difficult for even well-meaning students to do a good job of learning the "proper" techniques.  That's a deep problem Java has for beginning students.

Not just Java, but C#, C++, and probably other popular object-oriented programming (OOP) languages have the very same problem described above. I've only focused on Java though because I've been forced to teach it.



[1] "Ontological Parsimony", in Simplicity

[2] It's not just a simpler program that we want when we say parsimonious, but it must be ontologically simpler: it must require the existence of fewer entities, fewer classes, as in fewer things.  In science, those things (e.g. electrons) are found in nature.  In programs, we get to create and code up those things ourselves.

[3] The Expression Problem (Philip Wadler)

[4] The Expression Problem Revisited: Four new solutions using generics (Torgersen)

[5] Complexity arising from suboptimal language and infrastructure.  See Out of the Tar Pit (Moseley, and Marks).

No comments: