Practical Computer Science: Naming Variables

One thing that contemporary Computer Science curriculums get terribly wrong is naming of variables. It's just about never explicitly mentioned in any course. You're lucky if your instructor even says something along the lines of "good variable names matter."

Most CS departments say nothing about naming. They let their students figure it out on their own. Students do the reasonable thing in this situation and look to their professors as role models. They observe how professors write real and pseudo code in class and emulate what they see.

It's not a great idea to let students just kind of figure this thing out on their own. Even if professors did use good variable names in their code, we could still end up with a lot of students not really knowing how to write good variable names.

The current state of affairs is just about as bad as it could be: professors ignore variable naming as a topic and typically use bad variable names in their own code.

This is largely due to the divide between academic computer science and industry. Academics are typically more focused on getting something to work just once so they can publish a paper. The private sector needs its code to grow and evolve over time, so it needs to be easily modifiable.

What do bad variable names look like?

Bad variable names make the intent of a function almost impossible to discern. Take this example:

class Solution:
    def xxx(self, strs):
        if not strs:
            return ""
        shortest = min(strs,key=len)
        for i, ch in enumerate(shortest):
            for other in strs:
                if other[i] != ch:
                    return shortest[:i]
        return shortest 

I've removed comments and removed the function name. Can you tell what this function is doing?

How about this function?

class Solution:
    def xxx(self, words):
        if not words:
            return ""
        shortest_word = min(words,key=len)
        for index, letter in enumerate(shortest_word):
            for word in words:
                if word[index] != letter:
                    return shortest_word[:index]
        return shortest_word 

Hopefully this second function is easier for you to understand. All I've done is renamed the variables.

Spoiler: As of writing, this is the highest-voted discussion answer on Leetcode for problem 14, "Longest Common Prefix".

I'm not super great at Python, so I found some parts of the original solution, (like shortest = min(strs,key=len)) to be very cryptic. I didn't know that Python had a min() function that could be passed another function that would be applied to each element in the collection before selecting the element that produces the minimum value.

I think that if the original author had written shortest_word = min(words,key=len) instead, I could have simply read the variable name shortest_word and ignored the right half of the line. The variable name itself would have told me what the right half of the line is doing even if I didn't understand exactly how it worked.

I found the line for i, ch in enumerate(shortest): in the original function a bit mystifying at first, too. I didn't know that shortest was a word until I had looked up the min() function in Python. Even then, I was a bit unsure of what the i and ch variables returned by enumerate() were.

Not being super familiar with Python, I didn't even know the enumerate() function existed before, but had I known, I don't think I would have been confident that the first value returned by it is the index and the second value is the current element. It's easy to forget the order of return values.

Also, naming the return values i and ch didn't help me much. If you ask a random person on the street what a "ch" is, they will probably give you a blank stare.

Biased as I am, the rewritten line for index, letter in enumerate(shortest_word): is far more clear. I see we're enumerating, and the result is an index and a letter. Cool. I know what indicies and letters are, and I know enumeration means to go through a collection (hopefully in order). Using that information, this line becomes self-explanatory. I don't even need the Python documentation anymore!

You don't have to look hard in the discussion section of Leetcode to find a lot more examples of highly-voted answers with terrible naming practices like these. One- and two-letter variable names abound.

I think this is a side-effect of the type of people who frequent Leetcode: college students and recent college graduates who are preparing for interviews at software companies.

These people have never coded in industry before. All their experience has been writing code to solve homework problems where their professors never even give feedback on their variable names.

These students have spent years studying the pseudo code provided by their professors where the professors commit the same crimes against variable naming again and again. Professors constantly give variables the shortest possible names in order to make their code fit on a slide. Professors use short variable names because they're easier to say in lecture. Professors use short variable names because they act as a shibboleth to weed out the students who can't keep track of dozens of arbitrarily named concepts in an attempt to elevate themselves above their students.

None of these reasons hold up in industry, and that's why most production code looks nothing like the code on Leetcode. It's just bad practice.

How to write good variable names?

In order to write good variable names, you first must understand what makes a variable name good.

Code should be easier to read than it is to write. Code is a means for communication. The audience for your code is not the computer; the audience is other people. Other people need to be able to read your code to find bugs, add features, and improve performance. Computers don't care about your code at all. Variable names disappear when code is executed by a computer, so don't choose variable names for your computer. Your computer will never love you for that.

The code you write will be read hundreds of times more than it is written. Your code will be read by your team when you ask for a code review. Your code will be read by you when a bug shows up in production that you need to fix. Your code will be read by your teammate when they need to add a feature that interacts with your code. Your code will be read by people you've never met after you leave your team/company for greener pastures.

So good variable names make your code easier to read. They impart clarity. If your variables are well named, your code will be so easy to read that comments will be unnecessary.

If you name your variables with this in mind, you'll be better at naming than 90% of people.

Read on for my secret process that will help you name variables better than the remaining 10%.

The adjective-noun process for naming variables

I follow a somewhat rigid process for naming my variables. I've found that consistency across a code base is valuable. If everything is done the same way (even if it's not always the best way), then the consistency makes everything easier to read. When you know what to expect, you can often skip over whole sections of code. (What could be easier to read than code that you don't even need to read?)

Step 0: Avoid abbreviations

Do not use abbreviations in your variable names. Different people abbreviate in different ways. We want to choose variable names that make sense for everyone. You might think the variable should be abbreviated msg while someone else prefers mesg. The only thing everyone can agree on is it's spelled message in the dictionary. Use the full word.

If I have to remember if a variable contains any abbreviations or not, I'm wasting time instead of thinking about the problem at hand. If I have to stare at an abbreviation to try to decipher what the full word is, I'm wasting time instead of thinking about the the problem at hand.

Abbreviations only waste time. They should not be used unless the abbreviation is more clear than the full word. Example: httpHeader is more clear than HypertextTransferProtocolHeader. It's more clear because people use the acronym more often than they use the full word.

Be careful of jargon. Some words in your industry might be used more often as acronyms and abbreviations than as full words. If the abbreviation is not something you'd expect a new college graduate to know, then it's probably jargon, and you should use the full word instead. The best software engineers write code that's clear enough for a new college graduate to understand.

As an example of jargon, a previous company I worked at used the acronym cta all over their codebase. The acronym came from the phrase "call to action". Typically a call to action is a suggestion to the user to do a single specific thing.

In this company's code base, there were often many CTAs per page. All of the CTAs weren't really calling on the user to take a certain action. They were things the user could click on to perform an action. The word had become such entrenched jargon at that company that it took on a meaning different from what it meant to the outside world.

Had the engineers used the full name call_to_action in their variable names, they would have quickly gotten tired of writing so many letters, realized the thing they were naming wasn't actually a call to action, and would have likely settled on the simpler and more accurate name of button.

Step 1: Choose a noun

The first step is choosing a noun. The noun you choose should describe what your variable holds: a count (of apples), a participant (in a phone call), a customer (at a store). The noun that you pick should be as concrete as possible; you should be able to draw a picture of the noun.

The noun for most user-defined types is usually the same as the type. A good noun for an instance of the Participant class would be participant.

The noun for most primitive types (int, bool, etc.) is usually not the same as the type. A bad noun for an int would be int. Is the int a count of something? An index? A sum? Use those more descriptive nouns instead.

Step 2: Add adjectives to the front

Once you have a noun, you have to determine if that noun is descriptive enough. If your variable holds all things described by your noun, then you're probably done, and you can name your variable with just a noun. If your variable holds only a subset of things described by your noun, then add an adjective to describe which subset it is. For example, a variable holding a list of items on someone's wish list could be called wish_list_items.

(Yes, I know "wish list" is not an adjective. I'm lumping noun adjuncts with adjectives in this post.)

If your have multiple variables with the same nouns in the same scope, add adjectives to them to distinguish them. An online store might have obsolete_products and popular_products.

Step 3: Exceptions for lists, sets, and maps

Some data structures are just a bit different, so I have some extra rules for naming them.

Lists

Naming lists is easy. Start with the name you'd pick for a single element in your list and then make it plural.

Sets

Sets should be named similarly to lists: name an element of the set and add "Set" to the end. A set of letters like {a, b, c} would be called letter_set. A set of lists of letters like {[a, b, c], [x, y]} would be called letters_set. Only make the element name plural if it's a list.

Maps

Maps follow a similar strategy. Name your key, name your value, and then put them together like this: value_by_key. A map that you can use to lookup the number of times a particular letter occurs in a text could be called count_by_letter. (g_count = count_by_letter['g']) Again, don't make your key or value names plural unless they are lists.

I picked up this naming strategy for maps from some blog post I read a long time ago. Sorry that I don't know which one it was.

Conclusion

Professors have goals distinct from those of software engineers working industry. Those goals cause professors to neglect variable names in the classes they teach. This leads recent college graduates to develop bad habits with regards to naming variables that can take some time to break once they start working. By following a few simple steps, we can all start naming our variables better and write more easily comprehensible code.

Photo