Why database should be taught before programming in universities

Learn Database before Coding

Often students from initial semester ask me how do we store our data in our programming project? When students join university to learn about computer science and technology they are usually taught programming first in courses like introduction to programming. As part of coursework students are required to work on a project. Majority of the projects, in fact almost all projects involve data handling and that data needs to be stored somewhere.

Problem students face

As a novice students don’t know how to store data. One option is to store data in plain text files if filing is taught to them but in that case their project becomes too complex for them. In my opinion, file formats is an advanced topic for students that have just started learning how to program. So, students get stuck where and how to store data. They create variables and arrays to store data in memory but that is not very useful until they have option to store their data somewhere permanently that they can retrieve later. Otherwise, every time they run their project they have to feed data from the beginning.

Teach Database before Programming

If universities modify their courses and add database in first semester and replace programming course with it then it would be easier for students to get started in computer science degree. Introduction to database is relatively easier course then programming and students will know what is database, how to store data in database, and how to retrieve it later using SQL. Then in next semester if they do a programming course then it will require only one lecture to teach them how to access database from your code and how to store and retrieve data. That will not only make their projects more valuable but will make more sense to them and they can take it to advance level in forthcoming courses.

Your Take?

What is your opinion? Please, let me know in the comments.

One reason why you should refactor your code often

Once upon a time, a consultant made a visit to a development project. The consultant looked at some of the code that had been written; there was a class hierarchy at the center of the system. As he wandered through the hierarchy, the consultant saw that it was rather messy. The higher level classes made certain assumptions about how the classes would work, assumptions that were embodied in inherited code. That code didn’t suit all the subclasses, however, and was overridden quite heavily. If the superclass had been modified a little, then much less overriding would have been necessary. In other places some of the intention of the superclass had not been properly understood, and behavior present in the superclass was duplicated. In yet other places several subclasses did the same thing with code that could clearly be moved up the hierarchy.

The consultant recommended to the project management that the code be looked at and cleaned up, but the project management didn’t seem enthusiastic. The code seemed to work and there were considerable schedule pressures. The managers said they would get around to it at some later point.

The consultant had also shown the programmers who had worked on the hierarchy what was
going on. The programmers were keen and saw the problem. They knew that it wasn’t really their fault; sometimes a new pair of eyes are needed to spot the problem. So the programmers spent a day or two cleaning up the hierarchy. When they were finished, the programmers had removed half the code in the hierarchy without reducing its functionality. They were pleased with the result and found that it became quicker and easier both to add new classes to the hierarchy and to use the classes in the rest of the system.

The project management was not pleased. Schedules were tight and there was a lot of work to
do. These two programmers had spent two days doing work that had done nothing to add the
many features the system had to deliver in a few months time. The old code had worked just fine. So the design was a bit more “pure” a bit more “clean.” The project had to ship code that worked, not code that would please an academic. The consultant suggested that this cleaning up be done on other central parts of the system. Such an activity might halt the project for a week or two. All this activity was devoted to making the code look better, not to making it do anything that it didn’t already do.

How do you feel about this story? Do you think the consultant was right to suggest further clean
up? Or do you follow that old engineering adage, “if it works, don’t fix it”?

Six months later the project failed, in large part because the code was too complex to debug or to tune to acceptable performance. The consultant was brought in to restart the project, an exercise that involved rewriting almost the whole system from scratch. He did several things differently, but one of the most important was to insist on continuous cleaning up of the code using refactoring.

This is an excerpt from the book preface “Refactoring – by Martin Fowler”.

What J talked about in 2013

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 37,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 14 sold-out performances for that many people to see it.

Click here to see the complete report.

 

2012 in review

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 24,000 views in 2012. If each view were a film, this blog would power 6 Film Festivals

Click here to see the complete report.

 

How to kill your time by hardcoding?

My colleague asked me to help him solve an issue he was facing. I sat with him and after spending more than half an hour I realized that someone has hardcoded the server name :@ while we were assuming that we have already moved to another server.

The main thing to realize here is that if we have spent few more minutes in making the server name configurable at first place then we could have saved hours that we spent later just to find out that server name was hard-coded. Not doing this results in frustration, waste of time, de-motivation, brings down the moral of the team, and finally kills productivity and you lose credibility.

Why code bad?

I was reading the book Clean Code – A Handbook of Agile Software Craftsmanship, Rober C. Martin Series. A section in chapter one caught my attention and remind me of code and reasons that I got for writing bad code and designing bad solution. Now I think this is a good argument against it which I am quoting here from the book.

“Have you ever waded through a mess so grave that it took weeks to do what should have taken hours? Have you seen what should have been a one-line change, made instead in hundreds of different modules? These symptoms are all too common. Why does this happen to code? Why does good code rot so quickly into bad code? We have lots of explanations for it. We complain that the requirements changed in ways that thwart the original design. We bemoan the schedules that were too tight to do things right.
We blather about stupid managers and intolerant customers and useless marketing types and telephone sanitizers. But the fault, dear Dilbert, is not in our stars, but in ourselves. We are unprofessional.
This may be a bitter pill to swallow. How could this mess be our fault? What about the requirements? What about the schedule? What about the stupid managers and the useless marketing types? Don’t they bear some of the blame?
No. The managers and marketers look to us for the information they need to make promises and commitments; and even when they don’t look to us, we should not be shy about telling them what we think. The users look to us to validate the way the requirements
will fit into the system. The project managers look to us to help work out the schedule. We are deeply complicit in the planning of the project and share a great deal of the responsibility for any failures; especially if those failures have to do with bad code! “But wait!” you say. “If I don’t do what my manager says, I’ll be fired.” Probably not. Most managers want the truth, even when they don’t act like it. Most managers want good code, even when they are obsessing about the schedule. They may defend the schedule and requirements with passion; but that’s their job. It’s your job to defend the code with equal passion.
To drive this point home, what if you were a doctor and had a patient who demanded that you stop all the silly and-washing in preparation for surgery because it was taking too much time?2 Clearly the patient is the boss; and yet the doctor should absolutely refuse to comply. Why? Because the doctor knows more than the patient about the risks of disease and infection. It would be unprofessional (never mind criminal) for the doctor to comply with the patient.
So too it is unprofessional for programmers to bend to the will of managers who don’t understand the risks of making messes.”

Breaking problem into code

Let us consider an example. If you have two boxes say BoxA and BoxB, and there are few balls of different colors, let’s say 5 balls (red, blue, green, yellow, and white) in BoxA, and you want to move one ball, say red ball from BoxA to BoxB. What will be the steps? If you can make a flow chart of it then its good, but if you find it difficult to make a flow chart of it then you need to do some hard work to become a programmer.

I asked this to someone and guess how he solved it. Here is his solution. Take out all balls from BoxA, then pick the red ball, put it in BoxB, and then put all remaining balls back in BoxA.

Fine! it worked. But why would you take out all balls if you can only take out red ball and move it to next box?

Let’s move this to programming side. In the above mentioned scenario we will have two database tables, boxes, and balls. boxes will have boxid as PK, name, and other details. Keep it simple so we don’t get lost in other details. balls table has ballid as PK, name, boxid as FK, and other details. boxes table has two records with 1 and BoxA as its boxid and name. balls table will have 5 records with id between 1 to 5 and red, blue, green, yellow, and white as their names, and all balls will have 1 in boxid field which is FK to boxes table.

Now if we adopt the first solution then it will run following queries (these are pseudo queries not actual SQL queries):

– select all balls where boxid=1 (so we can have a list of all balls before deleting them from database)
– delete from balls where boxid=1
– insert into balls values ballid=1, name=red ball, boxid=2
– insert into balls values
(ballid=2, name=blue ball, boxid=1)
(ballid=3, name=green ball, boxid=1)
(ballid=4, name=yellow ball, boxid=1)
(ballid=5, name=white ball, boxid=1)

Or even worse if we use multiple insert queries. It will run from 4 to 7 queries.

Now let’s see what happens if we just perform operation on the red ball and don’t touch other balls.

– update balls set boxid=2 where ballid=1; /* ball with id 1 is the red ball */

Wow, just one query and we’re done! We saved 3 to 6 queries. Imagine this scenario for a high traffic website or any other application, we can improve the performance with a big difference by just implementing correct logic to solve a problem.

Some coding tips!

This article highlights some common pitfalls that are being made by developers. I’ll elaborate it with a simple example.

$obj2->FuncOne( $obj1->GetData() );
$obj2->FuncTwo( $obj1->GetData() );

The result of GetData is passed to FuncOne and FuncTwo in two subsequent calls. What is the benefit of this approach? Simple answer is to save memory, by not using any variable to save the result. This argument was very strong until there were memories with very less capacity. But now even home user has a system with memory in GBs. So this practice is not good in today’s world.

So, what’s the drawback of this approach? Performance degradation. But how can it degrades the performance? Let me explain a bit. Consider a situation where GetData runs a query on database to fetch some data from multiple tables by joining them. Joins thereselves are heavy by nature. So when GetData is called twice it will run query twice, and suppose that this code snippet is a part of a heavy process that can be called by multiple users on the web or in an enterprise application, just imagine what will happen to the database and application itself. The performance of the application will be degraded. Users of your application will get frustrated and at the end you will lose business.

Now let’s look at it from another perspective. This approach will also increase the CPU workload. When GetData will be called it will jump from one branch instruction to another and before that it has to save the current address to the stack so it can pop it back when it returns back to the caller function. This has to be done every time when function is called. And when function performs heavy computation it needs more memory and processing power, increasing the footprint of your application and execution time. So you’re wasting your servant’s (CPU) time and energy by assigning it the same task twice.

Other drawbacks can be that code is more error prone and is difficult to debug and troubleshoot, and code maintenance is high, especially when you have to modify the code to meet new requirements.

You can make your application and code much better and efficient by adhering to few simple best practices. In this case the rule is that

“If result of a function is needed more than once then don’t call that function multiple times. Save the result of that function in a variable and use that variable instead. “

In view of this, above code could be written like this.

my $result = $obj1->GetData();
$obj2->FuncOne( $result );
$obj2->FuncTwo( $result );

There is one extra line of code in above example but is more efficient and is much more readable than the previous one.

Let’s see another example.

my $result = $obj1->GetData();
$result = { %$result, %$someData };
$obj2->FuncOne( $result );
$obj2->FuncTwo( $obj1->GetData() );

In this example developer fetches the result, appends another previously got data to it and passes it to FuncOne. Then he needs the same result, so he called GetData one more time. Now again, we can write this code in much efficient manner.

my $result = $obj1->GetData();
my $result2 = $result;

$result = { %$result, %$someData };

$obj2->FuncOne( $result );
$obj2->FuncTwo( $result2 );

Here we have saved another call to GetData, which might be performing some heavy computation or running a heavy query with joining multiple tables in it.