Software Development PDF Manual

In trying to come up with better methods of organizing and presenting concepts related to DMMD’s Software Development practices, I am compiling a manual with all of the different ideas. The latest PDF manual (generated with LaTeX) will always be available for download here.

Certain chapters from the manual will also be presented in these blog pages.

– Darian Muresan

no comments

Intuitive Mathematics PDF Manual

In trying to come up with better methods of organizing and presenting concepts related to Intuitive Mathematics, nothing seems to work better than using LaTeX. Therefore, the latest PDF manual (generated with LaTeX) will always be available for download here.

Certain chapters from the Intuitive Mathematics manual will also be presented in these blog pages.

– Darian Muresan

no comments

Software Branching

Software branching is an important sub-process of the overall software development process. The practice that we follow at DMMD is shown in the following figure.

In the above figure, time progresses from left to right and it is recorded using a changelist. A changelist is a number that increases each time a new submission to the software repository is made. Thus, a repository submission with a changelist of 12023 means that it was made before a repository submission of changelist 12050. The branching methodology then follows these rules:

The Main Branch. The main branch, shown in black, is considered the golden standard. It is the branch from which all of the release branches are made and into which all the experimental branches are integrated. The main branch is a single branch. The main brunch is sometimes called the trunk, as in the trunk of a tree with branches.

The Release Branch. The release branch always starts off the main branch. A release branch starts off with a MAJOR release. Once a major release is done, no new features are added to the release branch. All new features are added only to the main or experimental branches. The release branch has a finite lifetime and it is extended only when a bug is fixed in the released version. When a bug is found in the release branch, the bug is fixed in the release branch first (changelist 12023) and then it is integrated into the main branch (changelist 12050). If new releases have been made, before the bug fix is integrated into the main branch (changelist 12050), then the bug fix is integrated into all the release branches (changelist 12051) that have been made before the integration of the bug fix (changelist 12050). The key is that all bug fixes are immediately integrated into the main branch and from the main branch they are integrated into all the other branches.

The Experimental Branch. The experimental branch is used when new features are tried, or when code experiments are made. The experimental branch must always be eventually integrated back into the main branch. An experimental branch cannot take a life of its own, independent of the main branch.

Branching when a project contains several different components, is a bit more complex and will be discussed in the next entry.

– Darian Muresan

Histogram Equalization

What exactly is histogram equalization and how can we think of it from an intuitive point of view, rather than the more abstract, mathematical point of view? A somewhat detailed technical point of view is given on the Wiki Page. Still, I believe there are some mathematical concepts, which I believe can be explained more intuitively. Here’s my take on histogram equalization.

Without being too rigorous about the mathematical details and the distinction between discrete and continuous cases, let’s understand how histogram equalization is implemented.

First, what exactly is histogram equalization? Ideally, histogram equalization would spread the pixel values uniformly across an entire image. Thus, if we had an image with 256 pixels in it, then 1 pixel would have value 0, 1 pixel would have a value of 1, …, and one pixel would have a value of 255. If the image had 1024 total pixels, then the normalized image would have 4 pixels of value 0, 4 pixels of value 1, …, and 4 pixels of value 255. However, knowing how to spread the pixel values around an image, such that we have a uniform distribution as described above, can be tricky. Instead, what histogram equalization does (in the discrete case) is that it spreads and adjusts the level of the pixel values (as a group) in the histogram such that the cumulative area under the new histogram (moving from left to right) increases linearly.

A simple example might be in order. Assume that image I contains 6 pixels: two of value 10, two of value 20 and two of value 199. Further, assume that each pixel can be represented using 8 bits (i.e. minimum value is 0 and maximum is 255). Then the histogram equalized image would have two pixels of value 0, two pixels of value 128 and two pixels of value 255.

Second, let’s understand histogram equalization in more details. The Figure above shows the histogram of some image I. The histogram is a distribution function that tells how many pixels are at a certain pixel value. The x − axis is the pixel value, such as pixel value i, and the y − axis is the count, or the number of pixels of value i. Histograms are not 1-to-1 with the image. For example, two different images, I and J can have the same histogram if the locations of the pixels in I are permutations of the pixel locations in J.

The cumulative distribution function (cdf) is the cumulative area under the histogram curve. In the continuous case, this area is an integral and in the discrete case, it is a sum. The cdf(i) is calculated as the area under the histogram curve, up to pixel i.

In the figure above, the cdf(i) is the red line in the bottom graph. (Notice that the line is monotonically increasing, since the area under the histrogram curve can only increase as we move from left to right. Also note, that the red line is an approximation of the cdf.) The top graph shows the histogram of image I. At step i (pixel value i) the cdf function increases by the sum Si.

In the following figure, the green line represents the cumulative distribution function for the normalized histogram, call it cdf_norm (shown in the figure as having a bar on top). The cdf_norm is a straight line because the integral of a constant function is a straight line and it has a slope of 1 because both axis have been normalized to [0, 1].

In order to convert cdf into cdf_norm, the sum contribution, which is Si along the y-axis, needs to be made at exactly the same time as when the step Si is taken along the x-axis (i.e. when we are adding the contribution of pixel Si). If this is true for all steps before step i, then step i needs to be moved to cdf(i). This is achieved by simply assigning all the pixels that were of value i to be of value cdf(i). Note that the number of pixels that were of value i is the SAME as the number of pixels that are now of value cfd(i). The sum Si, the contribution to the y-axis, does not change when the pixels of value i are changed to cdf(i).

Finally, if the desire is to convert cdf(i) to some new cumulative distribution function, call it cdf_new(i) (shown as cdf with a hat in the figure below) then this can be achieved by re-assigning i to the inverse of cdf_new of cdf(i). This mapping is shown in the figure below.

The figures above try to capture and explain the logic behind histogram equalization and histogram re-mapping. The DMA Manual has a more detailed mathematical explanation (in the chapter titled dmaHistogramEqualization).

– Darian Muresan

no comments

Software, Hardware and Firmware

In understanding the complexities of software development, let’s pause for a second and ask the question:

What is software?

The American Heritage Dictionary defines software as:  written or printed data such as programs, routines, and symbolic languages, essential to the operation of computers.

This definition is mostly correct, but I would go a step further and say that software is anything that controls hardware. Software does not exist solely inside computers. Instead, software controls parts of your car, phone, toaster, oven, coffee maker, and just about any piece of electronic hardware you have in your home or workplace.  A more complete definition of software would be:

Software is a set of hardware instructions and their representation, which we call programs.

Programs can be represented on punch cards, magnetic tapes, film, and other media. It is important to understand that software is a set of instruction, not the physical media on which it is recorded. Hardware, in contrast I would define it as:

Electronic hardware is anything that is electrical or current goes through.

Hardware consists of tangible objects such as: electronic circuits, input/output devices, cables and the likes and it does not consist of abstract ideas, algorithms or instructions. An intermediate form between hardware and software is called firmware and this is defined as:

Firmware is software embedded in hardware.

There is not much of a difference between software and firmware and in fact one could easily argue that they are equivalent.  Firmware tends to be embedded into the hardware at manufacturing time and it is not expected to change for the lifetime of the hardware device. In addition, firmware is not lost when hardware looses power, whereas software, in the traditional sense, does not remain in memory when the hardware loses power.

If there is one key idea to consider about software, is that unlike other engineering projects, a software project tends to be more of a living project. A software project is much more like a plant that needs continuous watering in order to remain relevant. Because of the fast moving pace of computer hardware new versions of the same software, with new features, need to be developed and these new versions are most often built on top of previous versions. Contrast this with building a bridge, where once a bridge is built, it is good for many years, without major maintenance.

Understanding this key difference between a software project and most other engineering projects is important in order to appreciate the difficulty of building good software, as it will be highlighted in future chapters. Finally, a more thorough discussion of hardware and software definitions can be found in [1].

Hardware – Software Interaction

In most cases hardware and software are logically equivalent, in the sense that most algorithms can be implemented in either hardware or software and the decision which way to implement is solely up to the designer. In general, hardware tends to be extremely fast and can perform only a limited number of instructions, such as:

  1. Additions
  2. Comparisons
  3. Moves and Copies

The software uses these basic instructions to create complex algorithms. The software that directly controls hardware is called machine language. Because most machine languages are so simple, a lot of instructions are required to create even the simplest of algorithms. This can in turn become very tedious and laborious. Therefore software to generate this machine language was developed. This process of different software levels of development is highlighted in the following Figure [1]:

Levels 1, 2, and 3 are numeric. Programs in these levels consist of a series of numbers, which are hard for people to interpret. Level 4 is the assembly language, which becomes a bit more user friendly. At this level the instructions contain words and abbreviations meaningful to people. Level 5 and higher is what most people think of when they think of software development. At this level you have the standard programming languages, such as C, C++, FORTRAN, etc. At levels higher than 5 you have virtual languages, such as .NET and Java. For comparison purposes, here is an example of an assembly language program (Level 4) and the equivalent C (Level 5).

The same program written in C would look approximately like this:

Notice how much more readable the program in C (Level 5) looks like when compared to assembly. The logic level is when you get down to the transistor level.

From left to right, we have a transistor (acting like an inverter), a NAND logic (high, high outputs low) and a NOR logic (either high produces a low). The resistors, the wiggly lines, are there to limit the current across the transistor’s gates, so that the transistors don’t burn. Interestingly enough, a transistor is used for power amplifiers as well. The difference is that in a power-amplifier, the transistor has to be biased such that it works in the transition region, whereas in the digital world, the transistor works in the saturated regions (high or low). More discussions on the transistor will be in another blog entry.

– D. Darian Muresan


[1] A. S. Tanenbaum, Structured Computer Organization. Prentice Hall, 1990.

show hide 1 comment