I don't really blog anymore. Click here to go to my main website.

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

September 24, 2017

How to Write Easy to Read Code

A long time ago in a software sweatshop far, far away… Our code monkeys, haxx0rl33t and xXxDarkAssassiNxXx were furiously typing away to meet their deadline that was yesterday. The fact that it was dictated by non-technical people is irrelevant to our story. Our story is about the consequences of this artificial urgency for our code mon… developers.

Our esteemed colleague Mr. haxx0rl33t, is a results oriented fellow. To be more specific; the kind of person who would argue; assembly is as expressive as any so called high level language, because they are all turing complete. Code is purely a means to an end, triggering right side effects at runtime is all that matters. Therefore haxxy can’t care less about readability. Now, this may come as a surprise to you, but haxx0rl33t’s code occasionally has bugs.

Guess what happens when xXxDarkAssassiNxXx has to put out the fire after one of those bugs bring down the entire app? Will he read and understand the code haxx0rl33t has written? No. The code is a mess. And he does not have any time to lose. He is not paid to read code. xXxDarkAssassiNxXx is not as enthusiastic (!) about coding as haxx0rl33t. He cares about abstraction, maintainability, design patterns, etc. enough to enumerate them in his CV, but not enough to actually study them. He is destined to do great things. Like patching haxx0rl33t bugs with haste and apathy.

Broken Windows

For most teams improving code readability is more about a mental shift than learning specific techniques. Whatever your plan is, unless everyone is convinced that writing readable code is in their best interest, it will eventually fail. At the risk of stating the obvious; people respond to incentives. Step zero of your plan should be to make sure right incentives are in place and understood by the team.

If no one is looking, readability of a codebase will decrease. It helps if someone is championing readability. If your team is all responsible adults who have a few years of coding experience under their belt, you are in a better position than most teams. If code quality is frequently neglected in order to meet delivery targets, urgency vs. importance analysis might help.

This post is an attempt to enumerate most impactful practises that improve readability. Having said that, the importance of attitude towards readability cannot be overemphasized. I do not know the magic words that will guide someone towards the right attitude. I think it is something that mostly comes with experience. Regardless, if you are not already a champion of readability, I hope you read on and give it some thought.

Part I - Kihon Before Kata

Coming up with a universal manifesto of readability would be awesome. But it had to apply to many different types of software projects and at least all popular programming languages. It should be possible to agree on a few basic principles though.

When readability most matters is when things are on fire. It is like training martial arts in a safe, controlled environment, so that you can draw on your reflexes when you find yourself in an altercation. Practical value of readability is great but for most people it is not immediately apparent.

I do not want to stretch martial arts analogy too thin. But just like how one studies a single move before studying combinations… Like how one learns things at rest (pose) first before incorporating motion… Similar method is useful for improving readability.

There Is Order Before There Is Meaning

  1. Group declarations, imports and code and keep them ordered

    import a.k.a require a.k.a include statements document the dependencies from current module (or file or package) to other modules. These should already be covered in your coding standards:

    1. Imports and other declarations should come before code.

    2. There should be spacing between code and imports and other declarations.

    3. There should be an unambiguous ordering within these groups.

      For example; standard library imports, third party library imports and imports from current library/application are grouped together. These groups are then alphabetically sorted within themselves.

  2. Group elements according to their visibility

    Just like how imports document current module’s dependencies, grouping declarations of elements (e.g. a class variable, or a top-level function) give a birds eye view of what the module provide. Most editors also support collapsing these elements into a single line and/or provide an outline view of the module.

    1. Variable declarations and initializations should come before functions.

    2. Variables should be grouped based on their visibility, first private then protected then public. They should then be ordered alphabetically within the group.

      This ordering may seem wrong. The reason for putting private variables before publics is because they provide more insight about the class. Hopefully your code will not have many public variables anyway.

    3. Functions should be grouped as public then protected then private. They should be ordered alphabetically within the group.

  3. Use descriptive names in generic contexts and terse names in specific contexts

    Naming things is not that hard. Spending some extra effort in this area can help readers save their cognitive energy to understand higher level concepts in your code.

    1. Instead of Person.personName.firstName, prefer Person.name.first. Repetition like this does not add any value.
    2. If the library you are designing will be used by third parties, entry points should be self documenting. Prefer more descriptive names like MessageFactory.createImmediateMessage() over Context.immediate(). Keep in mind readers of third party code are likely encountering the code first time.
    3. Do not prefix interfaces with I or suffix classes with Impl. Everybody knows an interface is an interface and a class is an implementation.

Shortest Distance Between Two Points

Our capacity of holding onto concretes during thinking is limited. That is why we need abstractions. By abstracting we can forget about the details and keep thinking more complex ideas. When necessary we can get inside the abstraction and access its details and then perhaps we can go more levels deeper. Note that this has nothing to do with computer science. This is pure thought[1].

  1. Less is more
    1. Keep your method bodies small. Generally it is good to keep functions 10 lines or less. Methods that are longer than 50 lines, again generalizing here, is difficult to read.
    2. Limit lengths of lines (number of columns). 80 char limit is historical. I am not saying follow 80 char limit religiously, but it could be a good place to start.
  2. Take out the garbage
    1. Do not commit commented out code. It is not for “future use”, there is no “just in case”. We have version control for this. Commented out code will only confuse people. Who did this? Is it still necessary? What would happen if I uncomment?
    2. Do not write comments or documentation for obvious things. Counter.reset method resets the counter, documenting this serves no purpose.
    3. Remove obsolete code. Delete code that is private and not called anywhere in your application/library. Deprecate and remove any code that you think consumers of your library does not need. It could be that the functionality is not supported or that there is a better way to do the same thing.

Part II - Cache Miss Rate of Humans

Find Your Voice

This is the point in this post where I present a mind blowing idea and cause a profound change in your thinking. Here comes; readability is all about writing code that is easy to read.

I am joking of course. But on a serious note studying essay writing can help provide the kind of perspective that improve readability. An essay is not executable, so its entire function is to convey an idea to readers. Code is executable and it should convey the idea of itself, its own paradigm.

  1. Avoid magic numbers and strings
    1. Religious avoidance of number and string literals within functions is a kind of cargo culting. Magic values confuse the reader because they are values, they are concrete. You can argue circumference r = r * 2 * 3.14 is not maintainable code but it is readable code for anyone who knows basic math. On the other hand warehouse.get("SKU" + id) is confusing in an order processing function. Starting with what if I prefix codes with something other than “SKU”, it begs many questions.
    2. One advantage of creating constants for these literals is the convenience of changing one place in the code should the value itself change.
    3. 86400 does not mean anything to most programmers. secondsInADay does.
  2. Build a language, then code in it [2]
    1. Often second step of make it work, make it right, make it fast is skipped in the name of pragmatism. It is not pragmatism, it is myopism. Making it right is finding the right abstractions and applying them with the right design. It is about taking a step back once you have something that (barely or merely) works.
    2. Building a language here means coming up with appropriate abstractions that makes it easy to express the problem at hand. In other words expressing the problem with the terms of its domain, rather than making the problem statement obscure and implicit.
    3. If in doubt add more primitives. Primitives here refer to the primitives of the language you are building for the problem at hand. Its nouns and verbs.
    4. If you are using a high level language, most problems can be solved with readily available abstractions. For example you should only write loops in Scala when you are in a hot section of your code and have benchmarked that all other methods perform poorly. apply these properly. For example; do not try to write Java code in Python.
    5. Some problems call for building interpreters. Interpreters have two main advantages; they help avoiding similar but not exactly same problems to reuse code (because otherwise same abstraction couldn’t be reused since problems are slightly different). Second advantage is that execution strategy can be decoupled from the problem itself.
  3. Organization matters
    1. It might be useful to study some basic art concepts like flow and rhythm. If nothing else it will improve your taste in art and in code.
    2. First things readers will see are likely the README file and the directory structure of your application. Top level structure of your codebase should be documenting functionalities of your application[3]. Also avoid having too many files and directories at the root of your project.
    3. Avoid unnecessarily long delegation chains. Where A delegates to B which delegates to C and so on and so forth. Unless B and C, etc. are adding behavior, this will exhaust the reader and strain their concentration. This is usually done in the name of good architecture but all it achieves is more spaghetti code.
    4. Avoid unnecessary coupling of modules. We all know circular dependencies are bad. But too many dependencies between modules is also bad. It defeats the purpose of modularizations. It makes the code monolithic.

Least Surprises

  1. Do not try to cut watermelons with scissors

    1. Different languages lend themselves well to different programming paradigms. Do not try to apply metaprogramming in C or interfaces (à la Java) in Lua. Reading code written in a style that is alien to the programming environment and (less importantly) to the community of that environment is distracting. Even if the programming language used is not a very expressive one, it is better to limit oneself to suitable programming patterns.
    2. Caching is not a persistence mechanism. Wiping out a cache any time should not have any kind of effect, save performance, on your application. If your application relies on the fact that some cache key will be available during execution, readers will be confused. Because they will probably miss this implicit assumption.
    3. Configuration files should not contain any business logic. DSL’s are good, configuration-cum-code is evil. Configuration should live in its own place (directory or maybe even separate repository) and code should live in its own place. To understand what an application does, readers should not need to see a single config file. Configuration should only be necessary to be read when a particular deployment of an application is being inspected.
    4. Abstractions should make sense on their own. For example just reading the code of an interface should be enough to understand what it represents. If concretes of it must also be inspected in order to have an idea, then perhaps that interface is not a good abstraction. Conversely the interface should be named properly so that while reading a concrete class, one does not have to open all its interfaces to at least get a rough idea about what that class does.
  2. Remove redundancies in code

    1. Avoid boolean valued conditionals. if <predicate> then return true, else return false is equivalent to <predicate>.
    2. Try to replace conditionals with pattern matching (Haskell, Scala) or polymorphic calls (Java, C++).
    3. Pattern matching against boolean values should be converted into if statements, unless it is a sub-pattern.
    4. When defining data types use the most constrained representations. For example do not use String to store monetary values. If a number type like Long or Int will not do, use a Decimal.
    5. Avoid possibility of inconsistent states/values in design. If only your leaves can contain values use data Tree a = Leaf (Maybe a) | Node (Tree a) (Tree a) instead of data Tree a = Node (Maybe (Tree a)) (Maybe a) (Maybe (Tree a)). Note that latter type can have a node that neither has a value or children.
  3. Identify & Exploit Orthogonalities

    1. Code that has least amount of implicit assumptions built in is easiest to read. Make those assumptions explicit when structuring the code. For example if you are logging entry and exit of a function within the function body, pull out the logging calls and instead wrap the function with a decorator that handles the logging.

    2. When writing code with asynchronous calls, try to separate actual CPU heavy work from asynchronous related bits. This makes code easier to read and easier to test.

      When it comes to asynchronous programming things are often taken to the extreme. Do you really need it to be that fine grained? A good rule of thumb here is to segregate the execution of CPU heavy parts and IO heavy parts with async boundaries.

    3. As you identify and refactor orthogonalities within smaller units, consider moving them into their own unit. For example a separate module for logging or a separate library for asynchronous communication[4].

After finishing the post I have realized I did not mention avoiding state (think immutable data structures) and control (think declarative versus imperative) once. I have been doing functional programming longer than I had been doing OOP in my career, I often take it for granted. Functional programming, done right, improves readability a lot.

I know some of you will read this post and say hey, just write clean code it will be readable. I wholeheartedly agree with this. However starting points for the two is slightly different, even though they end up aiming for the same target.

I hope this post helps some of my readers. Instead of following my advice blindly I would prefer you spend more time thinking about the readability of your code. Let me know what you think of the items above. If anything does not make sense for you I would like to improve the post with further explanations.

There is a lot of similarity between readability and code quality and clean code.

[1]For more info: Introduction to Objectivist Epistemology

From SICP:

Establishing new languages is a powerful strategy for controlling complexity in engineering design; we can often enhance our ability to deal with a complex problem by adopting a new language that enables us to describe (and hence to think about) the problem in a different way, using primitives, means of combination, and means of abstraction that are particularly well suited to the problem at hand.
[3]See this presentation and Developing Reusable Django Apps.
[4]You are probably already using a library for logging. Almost always we have some common patterns of usage in our applications, my suggestion is to abstract it away so that if/when you switch to another logging framework you do not need to change your code in many places.

If you have any questions, suggestions or corrections feel free to drop me a line.