|
|
A Programming Style for Java (versión hiperlink)Autor: Paul Haahr26 October 1999 (first draft: 26 May 1998) (second draft: 31 December 1998) (last revised: 24 July 2001) I've been programming full-time in Java for several years, which hardly makes me an expert on the language, but I've developed a style I've become comfortable with and thought it would be worthwhile to write down. This is just my personal style, and while I've written this essay with lots of recommendations, it remains a distinctly personal view of how to write code. This style has evolved from the style I used for C programs, with some influences from the years I spent working in Scheme and Dylan, along with the usual collection of ``little'' or domain-specific languages I've also used, such as YACC or various Unix shells. Much of my style is independent of the language being used; I believe that my Dylan and C programs have substantially the same feel as my Java programs, and that's natural to expect. |
||||||||||||||||||||||||||||||||
|
Most of my experience with Java has been in writing a
bytecode
to native compiler, which is a moderate-sized Java program, containing
around 500 classes and 80,000 lines of code. That includes comments and
boilerplate code, but I think it's a good amount to have produced in a
year. My style has remained mostly consistent over that time, with a few
exceptions discussed below.
All style guides are inherently rules of thumb. Rules are meant to be broken: I'm sure there are many good reasons to violate every suggestion here. When I'm about to write code which goes against my (previously uncodified) style, I think about why I'm doing something different and whether that reason is sufficient. I assume that the reader of this essay is familiar with Java; the more
general comments are independent of the language, but many are concerned
with the particulars of Java.
Basic PrinciplesThese principles are, I believe, independent of any specific programming language and motivate the specific suggestions below. When a tension exists between a principle here and one of the other style rules, the principle should win out.Readability is the most important attribute of styleThis is probably obvious. A style, as a restriction on how one writes programs, does not make it much easier to write programs that are never read. Of course, a good style should make it easier to maintain or extend a program, but that's mostly because of the benefits in reading and understanding the program. (A consistent style does make programs easier to write by restricting the choices available for any given construct.)When writing in a natural language, it is necessary to consider one's readers. This applies to programs as well, there are three readers I consider when writing code:
Choose good namesThe most direct way of explaining what a program is about is by selecting good names for variables, types, procedures, etc. Where language syntax and keywords are punctuation and external libraries provide some vocabulary, the names you chose are the ``content words'' of your programs and provide the means for making the program clear.My obsession with names can lead to paralysis as a programmer. If I'm trying to write some new code, and I can't figure out a good name for a class or method, I'll often be stuck until I have something. Many times, I find that the reason I can't pick a good name is that I don't understand enough about what I'm trying to say; with understanding comes good names, and vice versa. A thesaurus sometimes helps, but I often wish I had domain-specific thesauri for the areas I work on. (See the section on naming for my thoughts on what makes a good name.) Don't fight the language |
||||||||||||||||||||||||||||||||
|
Not all programming languages are right for all programs, and not all
techniques can easily be expressed in all languages. When working in Java,
try to use approaches that meet the least resistance from the language.
This rule has both affirmative (``do it this way'') and negative (``don't
do it that way'') aspects.
For example, Java is a class-based, object-oriented language. That makes classes the default structuring tool, and a good Java program will take advantage of them. A design which uses lots of public fields, empty constructors, and static methods spread out over bunches of classes not related to the objects they manipulate feels wrong for Java. As a negative example, I've spent a lot of my career working in Lisp-derived languages, where recursion is often the most obvious and efficient form of control flow. This usually isn't true in Java, because nested local methods are awkward to use and restrictions in the language often prevent efficient implementation of recursion. So, while my natural instinct is to write recursive functions, if there really isn't a good reason for the recursion -- for example, taking advantage of the implicit stack -- I don't do it. Be consistent |
||||||||||||||||||||||||||||||||
|
Whatever individual style decisions you make, stick with them. Consistency
means that a reader can ignore many details of typography and naming, because
they're exactly as would be expected, and concentrate on the substance
of a program. Purposeless variations in styles only serve to distract.
Of course, which variations are purposeless and which consistencies are foolish is a subjective matter. Taste and discretion are often required. Comment appropriatelyComments (and other forms of in-program documentation) are essential tools for a reader trying to understand how to use a program, how it works, and how it is structured. And few people disagree -- it's hard to find anyone who doesn't advocate commenting code, even if they don't always practice what they preach.Treating comments as a panacea is, of course, a mistake. A badly written program can be documented well, and it's still a bad program. Commenting is one tool among many for making programs clearer. But, the number of programs which suffer from overdocumentation is probably not too large. More problematic is documenting the wrong things, so that documentation serves to confuse or distract a reader; the worst examples are comments that are incorrect or out-of-date and disagree with the code they refer to. What to write comments about and how to write them are what people do reasonably disagree on. Broadly speaking, there are two forms of comments -- those aimed at people trying to understand the innards of a system and those trying to understand its interface. My opinion is that every comment should answer some question intelligent readers of the program, with those goals in mind, would ask themselves, such as:
Don't let optimization interfere with readabilityThe best-known truism about programming is ``Premature optimization is the root of all evil.'' It's also probably the most often violated.I've found that a good barrier to prematurely optimizing code is forcing myself to document anything that's subtle or uses optimization techniques beyond what might be expected for a given situation, both in terms of documenting the implementation and giving an explanation of why I bothered to optimize -- usually my reasoning is that a particular piece of code will be invoked very frequently or deal with large data sets. Often, trying to back this up with quantitative data in the comments will lead me to do more analysis of the problem, which at least gives better insight into the problem. (One of the hazards of writing compilers, especially if you're writing a compiler for a language you use, is that it becomes natural to think about the code your compiler would generate for code as you type it. My reaction to this situation is a silly one: I edit myself while coding to ensure that my programs are easy for our compiler to optimize. For example, I often end up eliminating common subexpressions by hand, even if the compiler could do it for me. I'm a sinner and I know it.) Be a chameleonWhen you're dropped into a piece of existing code, don't rewrite it all in your own style. This should be obvious, but many people (myself included) find too many reasons for exceptions. Aside from wasting time and confusing source control systems, you will annoy everyone else working in the same code base.TypographyMake your code look like the examples in the JavaSoft booksRather than descending into a long discussion of what style of indentation and typography looks good, is more readable, etc., I'll state that I believe that there are many possible styles of indentation and code layout which are perfectly readable, but the virtues of uniformity trump all the minor differences about where, for example, opening braces go. Thus I strongly believe that all programmers should use a common typographic style for Java, so code can be freely interchanged. |
||||||||||||||||||||||||||||||||
|
The obvious source for this common style are the Java language books
from JavaSoft. I use
The
Java Language Specification as my template. The other good choice
would be
The
Java Programming Language; since the two use rather similar styles,
it doesn't really matter.
Further, consistent typography of names is important to allow seamless mixing of code from multiple sources, so the naming conventions of the core Java libraries should be followed in other programs. |
||||||||||||||||||||||||||||||||
|
I would summarize the formatting rules for this style as:
Write searchable codeIt is often useful to be able to search through source code with an editor or other utility. Some code-writing practices can make that easier. In particular, consistent typographic style allows searching for given idioms. For example, when breaking apart long lines of code, do not put a carriage return between the keyword new and the class name. Thus, you can find all places where instances of class BigStuff are created by searching for ``new BigStuff''. (Contributed by Stan Chesnutt.) |
||||||||||||||||||||||||||||||||
|
Break lines longer than eighty columnsYes, everyone uses window systems these days. And, yes, everyone can stretch a window if they want to, to make it wide enough to fit your code, at least if they use a small enough font and are not working on a laptop. But wide lines of code -- like wide pages of text -- are hard to follow. For English text, the usual recommendation is to have pages with no more than sixty or seventy characters per line, because the reader's eye begins to get lost as it moves across longer lines.Eighty columns is, of course, an arbitrary limit tied to prehistoric notions of what a terminal provided. But many people use editor windows exactly that wide because they have a reasonable expectation of being able to fit everything in that width. The simplest way to annoy them is to use very long lines. When faced with long lines, try to treat line breaks as half-way between spaces and parentheses in terms of grouping related operations. That is, break lines at the highest reasonable level in the parse tree. For example, split lines between looser-binding rather than tighter-binding operators, or at commas in argument lists, rather than in the middle of a single argument. Use four-space or narrower indentationFor more than ten years, when writing C code, I used the Unix convention of eight-space tabs for indenting code. I tried that for a few months with Java, but gave up. The straw that broke the camel's back was that typical Java method bodies are indented two levels vs. one level for C functions, because one level of indentation is (or should be) consumed by the class definition. In the end, I gave up on the equivalence between eight-space tabs and basic indentation levels, because that's just a silly historical artifact.The change was a good one. I found it too easy with a large indentation level to have most of my code crammed along the right margin of my window, and was feeling frustrated staying within eighty columns. And my eye bounced around too much from left to right as I was reading programs. With four-space tabs, I've found that, in practice, 99% of the first line of statements in my compiler start in column 24 or before; that is, fewer than one percent of statements or declarations are indented more than four levels. With eight-space tabs, those statements would start at column 48, leaving fewer than 32 characters for content. A single column of whitespace is probably insufficient as the basic level of indentation, because it offers too little contrast from one line to the next. But anything from two to four spaces should be clear and reasonable. Use blank lines to identify related lines of codeJust as text with extremely large paragraphs is hard to read, functions with unseparated blocks of code much longer than ten or twenty lines are typically hard to read. Blank lines can turn an undifferentiated mass into visibly distinct regions, each with its own purpose.If a body of code consists of several blank-separated regions, it's often a good idea to put a short comment at the beginning of each section to explain its purpose. Or, often even better, split into several separate, well-named, methods Parentheses always helpI used to know the rules for operator precedence in C. I've forgotten half of them, intentionally. Any time I have to think about operator precedence, I now just insert an extra set of parentheses and move on. In theory, it's possible to over-parenthesize Java code, but I've never seen it happen.NamingUse consistent naming conventionsA good name is consistent with other names in a program. If an access method is named getFoo, don't call the method which changes the value fooSetter or the method which reads the ``baz'' value anything other than getBaz.Use names you can pronouncePeople talk about programs. It's easier to talk about code if you can pronounce the words inside of it. And, at least for those of us who talk to ourselves when we write code, it's also easier to think about programs if they're easy to read aloud and pronounce.My business partner and I work in offices separated by about ten miles and a large body of water. We communicate mostly by email, but have done debugging-by-telephone a fair number of times. Without names we could say easily, that would have been much more difficult. (As is, case sensitivity is an issue. How do you pronounce CodeStream differently from codestream?) Don't use abbreviationsI dislike most use of abbreviations and shorthands in names. Programs, like prose, are meant to be read, and pervasive use of artificially shortened names makes text harder to read. In particular, dropping vowels to shorten a name does not make it more readable.There are exceptions to this, of course, especially for acronyms in widespread use in the relevant problem domain. So it seems perfectly reasonable to me that Sun uses ``URL'' as part of names in the java.net APIs and that a class in our compiler is called SSAVariable, because everyone working on modern compilers knows that ``SSA'' stands for ``Static Single Assignment.'' Types provide good roots for variable names |
||||||||||||||||||||||||||||||||
|
Often, it's a good idea to use a variables type as part of its name.
For example button and resetButton are good names for
instances of the Button class.
I strongly dislike the coding style which uses aButton or theButton as generic names. ``A button'' seems to vague to me, ``the button'' too precise, and in neither case do I find the extra information, if any, provided by the article to be worth the bother. (Using variables named both aButton and theButton in the same context seems particularly egregious.) In Java, interfaces and abstract classes are often more suitable names for objects than concrete classes. I'll often use stream as the name for a local variable or parameter which is a PrintStream or DataInputStream, and depend on context for figuring out which one I mean. If there are two streams being used in a particular context, I'll try to use a prefixes that indicates the purposes of the streams, such as inputStream and outputStream (or just input and output), or dataStream and controlStream. Good names aren't necessarily long namesFor a loop index, i is usually a better name than loopIndex. In general, local variables should have short names because there is an obvious context for them, instance variables or structure members have less context so they may need longer names, and global names (only classes in Java) may need more in their name to make up for their lack of context.Steal names from external sourcesI'm often coding up algorithms from research papers or textbooks; not everybody does this all the time, but it's usually the best way to go in a well-researched field like compilers. When doing so, I try to pick variable names which correspond to the names used in the published material, then cite a reference in a comment. Thus, anyone reading my code can go back to the original source and directly see the mapping to my code.In addition to domain-specific sources, good names can be found in writing or folklore on programming methodology. For example, the names of patterns in the ``Gang of Four'' Design Patterns book provide excellent roots for class or interface names, because they've become part of a common vocabulary for object-oriented programming. Thus the name of the class java.net.ContentHandlerFactory from the core Java libraries immediately tells the reader that this class uses the ``Abstract Factory'' creation pattern, which leverages a lot of meaning from seven characters in the name. Use sensible parts of speechParts of speech provide a good guide to how a word should be used, and this can be leveraged when naming entities in a program. When picking a name for a thing, use a noun. When looking for an action, use a verb. Typically, this means that classes and variables will be nouns and method names will be verbs or verb phrases. But, classes or objects which fit the ``strategy'' or ``command'' design patterns might be best named with verbs. Similarly, an accessor method may simply name a noun corresponding to the value it returns.Adjectives often make good names for interfaces and abstract classes - classes which implement or inherit from the adjectival base have the named property. Adjectives are also potential names for boolean fields and methods. Finally, plurals are often good names for instances of collections, such as arrays or vectors. Some examples of names in these various categories from the core Java
libraries or the Redshift compiler, grouping noun phrases as nouns, etc.,
are:
Use names with a positive sense |
||||||||||||||||||||||||||||||||
|
Often names, especially those of predicates or boolean values, can
be chosen either in a positive (e.g., isValid or
contains)
or negative (isInvalid or
doesNotContain) form. Always
use the positive form (and ensure that the implementation matches the name).
The negative form can always be constructed by using the logical negation
(!) operator. In contrast, deriving a positive form by negating
a negative sense name creates a potentially confusing double negative.
Name all magic numbersMany programs need to use numeric constants. It is almost always best to assign names to these values rather than using raw numbers. The often cited reason is that the number may change and localizing the usage of the actual constant value makes it possible to modify the program without hunting in many places for the incorrect value.Preparing for a program's evolution is a good idea, but, to me, an even more important reason for naming constants is that it's very hard to tell what a given integer means (number of eggs in a package? hours on a clock face?) and using a name provides context. Thus, even when working with quantities that really will never change, using an expression like months.length is better than the raw number 12. An extreme version of this rule says ``name all constants other than zero or one.'' The exceptions can probably be safely generalized to include two and negative one, but it's a good way to think about the principle. Be willing to change a bad nameBad names are often obvious: they're the ones which are hard to remember or don't seem to describe their purpose well. Until an interface is frozen or exposed to a body of users, it's a good idea to repeatedly go back to the names that bother you. When a name is good, it probably won't stick out at you. Eventually, you'll find a better name.To some degree, Java works against the ability to change names, because
of the connection (in most development environments) between class names
and source file names. When you come up with a better name for a class,
you need to rename the source file containing it, which may or may not
be an operation that is well-supported by a source-control system. That
leads to friction in changing names, where it should be a smooth operation.
Class structureWrite documentation comments for all classes and members |
||||||||||||||||||||||||||||||||
|
Java's
documentation
comment
facility is a convention for writing comments which can be
extracted to create external documentation for a set of classes. At the
very least, every class, method, and field in a program should have a documentation
comment explaining what it does.
Unfortunately, documentation comments come in only one variety. It might make more sense to have different forms of documentation comments, to distinguish between comments for clients of a library from those for programmers working on the library itself. To some degree, this can be obtained by treating documentation comments on public constructs as intended for clients, and use all other documentation comments for internal documentation. Then, when internal documentation is needed for a public class or method, normal comments could follow the documentation comment. I've been somewhat inconsistent on this matter, and tend to put more implementation documentation in public documentation comments than I probably should. In general, I format documentation comments with the leading sequence ``/**'' and trailing sequence ``*/'' on separate lines, and use lines with leading asterisks for the content of the comment. (This is how such comments appear in the JavaSoft books.) But, for comments on members of inner classes and on individual entries in an interface which consists only of a series of constants, I use only single line comments. Thus: |
||||||||||||||||||||||||||||||||
|
/**
* Constant values for the three bears.
*/
public interface Bears {
/** Too big constant value. */
int PAPA_BEAR = 1;
/** Too small constant value. */
int BABY_BEAR = -1;
/** Just right constant value. */
int MAMA_BEAR = 0;
};
Group related declarations |
||||||||||||||||||||||||||||||||
|
Often, people will put all fields together, then all methods. Or all
public members, then protected, then package-accessible, and finally private.
This makes reading a class harder, as the reader will have to constantly
scroll up and down to make sense of a given routine. As much as possible,
I think it's best to put related definitions near each other. Large classes
(when they can't be broken into smaller classes) should have their members
divided among sections; title-style comments can be used to separate the
sections for a reader.
Don't use package-qualified names except in import declarationsAn important aspect of Java's package system is that references to classes outside the current package (or the java.lang set of core classes) can be easily identified by reading the list of imports at the beginning of a source file. Unfortunately, this attribute is lost if package-qualified names are used outside of import declarations.The only exception I make to this rule is when a single class needs to use two classes with the same unqualified name from different packages. That is a common case in classes which bridge two packages, for example. Import whole packages only in limited circumstancesBy importing entire packages with the import package.* notation, as opposed to importing individual classes, the actual dependencies from one package to another become hard to see. This becomes especially true when multiple packages are imported in their entirety.However, there are cases where importing whole packages is the right thing. For example, if all code in a given system uses some core classes from a locally defined package, importing all of that package everywhere does little harm; such a foundation can be thought of as an application-specific extension of the java.lang package -- definitions which are necessary for any of the rest of the system. Also, given a package which defines a framework of classes which are meant to subclassed, a package which actually subclasses several of those classes can be thought of as a ``sub-package'' of the original, and it makes sense to import the entire base package. In all cases where entire packages are imported, the message to the reader of the source is ``this class depends on the entirety of that package.'' Don't build circular package relationshipsPackages encapsulate moderate-sized components of a system. The goals of using packages are the same as for any other encapsulation mechanism: to hide implementaion details from clients and to enable reuse. When a program is large enough to be decomposed into multiple packages, the actual partitioning requires some thought and effort. Much could be written about how to structure packages in Java, but I use a simple rule for determining whether package lines are drawn well: if there are circular relationships among packages, the partitioning is not clear and should be rethought.Why do circularities indicate problems? First, it's usually a good idea to think about software as being stratified into levels of concern (or abstraction): the low-level building blocks, the high-level application code, etc. Circularities obscure which level various pieces of the system work at. Second, circularities prevent independent reuse of packages, by tangling up the dependencies. Finally, circularies often make it difficult to decide which package a new class belongs in, because the decision can't be easily made based on which level of abstration the new class presents. If, when writing a program, you find the need for circular references between packages A and B, there are two basic approaches for untangling the circularities:
Don't overuse inheritance (especially subclassing)Many programmers, especially those new to object-oriented programming, will use inheritance simply because they can. This is usually a mistake. Inheritance has its purposes, and when it is the right thing, it is an essential language feature. The cases where I naturally think of using inheritance include:
In Java, my rule of thumb is to use interface inheritance if multiple classes need to serve the same role, and class inheritance only when a significant portion of the implementation of the classes can be shared by inheriting methods. The lack of full multiple inheritance in Java makes class inheritance a valuable resource, which may only be spent once per class, where interfaces may be added somewhat freely. If for no other reason, this urges a preference for using interfaces rather than subclasses when possible. (I've started using the pattern of defining both an interface and an abstract class which provides default behavior, for the circumstances where I want an interface that any client can use, but I also want to provide an implementation which can be used. In this case, for an interface named Quux, I'll usually name the class providing default behavior SimpleQuux or BasicQuux.) Don't subclass concrete classesWhen subclassing, it's often unclear what aspects of a superclass are properly inherited and which were dragged along unintentionally. This is greatly complicated by inheriting from classes which are meant to be instantiated in their own right, where some fields, methods, and constructors are typically not relevant for subclasses. Therefore I make a simple rule of only subclassing abstract classes. In other words, I treat all classes as either abstract or final, though actually declaring concrete classes as final should be considered a separate decision.A useful discipline is to initially treat all classes as final. When development calls for subclassing a given class (say Quux) for the first time, change it from final to abstract and create a new final subclass (for example, SimpleQuux) which inherits all the default methods. After working with various subclasses for a little while, if there is behavior which is only inherited by a single subclass (typically, the ``simple'' one), move the implementation to that subclass and make the corresponding method abstract in the superclass. I make an exception to this rule, in general, for exception classes, where I will use both a general exception class for reporting most exceptions, and specialized subclasses when there are particular cases I want to distinguish for use in catch clauses or as declared exceptions. Keep classes, fields, and methods private until they're needed elsewhereInformation hiding is an essential technique for writing large systems and reusable code. By hiding the implementation details of one part of a system, internal changes to it will not break other components which depend on it. And by exposing the minimal amount of information from a given class, we reduce the dependencies that could develop against it.There are many aspects to information hiding, but the central one in Java is the use of access declarations. By making fields and methods private by default and then widening their accessibility as needed (and by not declaring classes public until they are needed outside their package), the most information possible is kept hidden. Using this discipline forces programmers to actively make the decision to expose a feature only when it's needed, when the pros and cons of revealing more of a class can be debated in a real context. |
||||||||||||||||||||||||||||||||
|
A different view of the advantage of keeping access as narrow as useful
is that, by reducing the exposed interface of a class, the class has fewer
features that need to be learned by a potential user, and thus is simpler
to understand and reuse.
Use final for fields which shouldn't change |
||||||||||||||||||||||||||||||||
|
A final declaration on a field is used to ensure that the
field is assigned an initial value either in the declaration or in all
constructors, and that the field's value is never changed after that point.
Typically, an object is composed of mutable state -- fields which may change
over the lifetime of the object -- and immutable state -- fields which
won't change. It is useful to make this distinction explicit in a program,
both to allow a compiler to catch erroneous attempts to set a field and
to communicate the intent of the original programmer.
Make all non-final fields private |
||||||||||||||||||||||||||||||||
|
In Java, if a field is not final and not private, it can be modified
by code in another class. Such modifications are often problematic, because
they violate the encapsulation and ownership assumptions of the object-oriented
style, so my rule of thumb is to disallow them.
In general, if I have a field named field which I need to use in other classes, I keep it private but add an accessor method named getField to return its value. If I need to externally modify the value, I add a corresponding setField method. Then, if I need to change how the field is implemented, add a type conversion or assertion check, or track modifications to the field, only a single class needs to be changed. (In fact, where possible, I use the get- and set-methods inside the defining class, because that reduces the number of places which need to be changed within the defining class.) I typically make the accessor methods final unless and until some subclass does need to override them. This can help performance in some environments (by making inlining a simple and correct optimization) and enforces the notion that the accessors are simply controlled-access mechanisms for reading and writing the field. Support anonymous instance creation when sensibleAnonymous instance creation, via Class.newInstance(), is the simplest way to take advantage of one of Java's most powerful features, dynamic class loading. (Until the addition of core reflection to the language, it was the only way.) This encourages Java programmers to write public constructors with empty argument lists, to support dynamic loading.Unfortunately, the addition of a constructor for anonymous instance creation can complicate a class by requiring set-methods for fields which would otherwise just be initialized by a constructor in the non-anonymous case. Typically, doing so will prevent making those fields final. More importantly, it raises questions such as ``what are default values for this field?'' and ``what happens when operations are invoked on an instance where the fields haven't been set?'' Because of these complications, I prefer to use dynamic class loading in very restricted forms. The main style I use is to dynamically load a single class which implements an abstract factory interface (or extands an abstract class) and can be created anonymously. This class then provides factory methods for creating specific instances with the necessary parameters, which are passed directly to constructors. If you find yourself using dynamic loading in many places, it may be
more appropriate to use an existing component technology.
Java
Beans and
Enterprise Java
Beans provide disciplined mechanisms for managing dynamically-loaded
components.
Methods and StatementsEach method should do one thing |
||||||||||||||||||||||||||||||||
|
A sure sign that a method is designed improperly is that its functionality
(as opposed to its implementation) can't be succinctly described as a single
action or the computation of a single value. If at all possible, the name
of the method should reflect this single action or value as closely as
possible.
On the other hand, it may just be the description that's wrong. For example, a description of a method as ``get a slice of bread, put it in the toaster, push down the button, and wait for the bread to pop up'' sounds like too much is going on; changing the description to ``make toast'' makes it clear that there is a single abstract action going on. The first description tells about how the method does its job, not what it does. Smaller methods are easier to understandIt's probably self-evident, but several small methods, all with good names and documentation comments, are easier to understand than a single large method with the same functionality.Smaller methods are also usually easy to debug than monolithic methods, because the correctness of a smaller piece of code can be verified more easily than a larger one. Further, method call boundaries are usually easy points to verify that behavior is as expected with the addition of assertions or wrapper-methods which check inputs and outputs. Using shorter methods also has the advantage that one is more likely to be able to reuse a smaller piece of functionality than a large one. |
||||||||||||||||||||||||||||||||
|
My preference is to keep methods to no more than five or ten statements.
This is not a hard-and-fast rule, but methods which are fewer than ten
lines can usually be understood in a glance, where longer ones will usually
require active concentration.
Partition methods to have short parameter listsLong parameter lists are often an indication of poor separation of concerns -- too much data is flowing between methods as separate values, rather than being encapsulated in objects or within a single method.In general, having more parameters also increases the burden on callers and makes a method more specific to a given situation, making in unlikely that it can be reused. Again, this indicates that the wrong piece of functionality may have been selected to isolate as a separate method. When you come across a method with too many parameters, it's useful to consider whether several of the parameters should be consolidated into a single object (especially if the same set of parameters is being used for more than one method) or whether there's an object which already provides accessors for getting at those values. Declare variables when you're first ready to use themC programmers usually declare several of variables at the start of a function, and then use them, for a variety of purposes, throughout the function; others will declare variables at the start of the block in which it is needed. These are the only options that C provides. Java (following an innovation from C++) allows declaration anywhere in a block, which introduces a local variable with scope extending to the end of the block. This allow programmers to declare variables when they're first used, with the type, name, and initial value all in one place. This is the right thing, because it allows a reader to focus on all the relevant properties of a local variable at the right time.In general, if you find that you don't know what value to initialize a variable with, move the declaration later in the method until the point at which you do know its initial value. A similar rule applies to for loops: if possible, declare and initialize the loop variable in the initialization clause of the for statement. (Note that variables declared in for statements are limited in scope to the for itself in Java, which differs from the rules in C++.) Occasionally, the initial value of a local variable can't known at the point it needs to be declared, because the value is computed in all of the the branches of a conditional statement and used after the conditional. In that case, omit the initialization clause and put the declaration at the latest sensible point in the containing block. For example, this is typical case where I don't use an initializer for a local variable: DependencyRecorder dependencies;
if (options.getBooleanValue("dependencies"))
dependencies = new DependencySet(method.getDefiningClass());
else
dependencies = NullDependencyRecorder.RECORDER;
In this case, it would have been fine to use the ternary conditional operator
(?:), but I usually prefer if statements when the arms
of the conditional are too long to fit on one or two lines. Another option
is to extract the conditional statement into its own method and use a call
to that method to initialize the local variable.
When I initially wrote this document, there were fourteen cases in the code I had written where local variables were declared without initialization, and all were immediately followed by compound statements which initialized the variables in question: six were followed by if statements, five by switch, and three by try blocks. Create new variables rather than reassigning old onesLocal variables are useful for providing names for intermediate results. The value of doing so is diminished if a local variable is used to hold several different values with different interpretations during its lifetime. (This does not apply to loop or accumulator variables, which, by their nature, are meant to change throughout their lifetimes, but their meaning should always be the same, relative to the current iteration of a loop.)Java makes it very easy to introduce new local variables at almost any point in a method. Use this freedom to create locals with good names for all the intermediate results you need to hold on to. (If you find that you want to use the same name for multiple local variables at the same scope level in different phases of method, that may be an indication that it's time to split the method.) Don't modify parameter variables |
||||||||||||||||||||||||||||||||
|
Parameter variables are useful to maintain, with their original values,
for the duration of a method. When modifying a parameter (even to ``clean
it up'' for use in the method), one introduces a possible confusion between
the original value and the modified version. In addition, the original
values are often useful for debugging, printing informational messages,
or checking results.
As Martin Fowler observes in Refactoring, people often confuse a modified local parameter variable with the value in the caller, which remains unchanged thanks to Java's call-by-value semantics. Since this confusion is so easy to avoid, there's no point in letting it occur. In other words, just treat all parameters as if they were final. (Actually making them final seems like overkill to me, but I would have been quite happy if Java parameters were implicitly final.) Treat side-effects cautiouslySide-effects, by which I mean changes to the values stored in fields of objects or elements of arrays, are clearly intended to be used frequently in Java. However, the presence of side-effects can make it harder to reason about a program, because there is invisble state to the side of computations which changes. That means a reader needs to keep track of both the visible aspects of a program and the hidden values off to the side that may change.In more technical terms, the presence of side-effects breaks ``referential transparency,'' which means that the same expression, given the same input values, should always have the same value. Note that this corresponds closely to the mathematical notion of functions. |
||||||||||||||||||||||||||||||||
|
Restricting the use of side-effects can also make it easier to take
a single-threaded computation and break its work among multiple threads,
because any shared data structures which undergoes modification requires
synchronization in both readers and writers. By leaving objects unchanged,
worries about which version of an object is seen by which thread go away.
Typically, a programmer in Java who is trying to avoid side-effects will create new objects rather than modify old ones. Good rule of thumbs are that non-compound statements should have one side-effect each and expressions should rarely have side-effects. Bertrand Meyer, in Object-Oriented Software Construction, argues that side-effects on visible state of an object (as opposed to side-effects on, say, a cache) should only occur in ``procedures'' (void methods) and never in ``functions'' (methods returning values). While I'm not as strict as Meyer -- I believe methods which return a value popped from a stack or read from an input stream are perfectly reasonable -- I think his style has strong merits. Programmers who've worked in functional languages for any period of
time instinctively gravitate towards a low side-effect style.
Errors and ExceptionsDon't use exceptions for normal control flowJava is, intentionally, a safe language. The run-time system includes many safety- (and sanity-) preserving features, such as null checks, type checks, and array bounds checks, and signals errors with exceptions. Using those checks to catch your mistakes is a good thing. Relying on them for detecting normal situations isn't. The most egregious abuse of Java is use a try/catch clause combined with run-time checks, rather than simple conditionals, as in this sort of code: int sumArray(int[] array) {
int sum = 0;
try {
for (int i = 0;; i++) // BAD: no loop test
sum += array[i];
} catch (IndexOutOfBoundsException e) {
return sum;
}
}
This example is similar to one from
BYTE
Magazine, which offered it as an optimization over the traditional
loop over an array, because it could save a comparison per iteration. On
style grounds, this is clearly awful, and turns into the worst form of
spaghetti control flow very quickly. |
||||||||||||||||||||||||||||||||
|
On performance, the author's claim might have been right in the case
of naïve interpreters, though the overhead for throwing and catching
an exception probably dominates the cost of the extra check for all but
the largest arrays. But, with even the simplest of JIT compilers, a loop
such as:
for (int i = 0; i < array.length; i++) sum += array[i];is going to contain only one array bounds check, because the reference to array[i] inside the loop is clearly safe. In general, with a compiler, extra checks are ususally free and can even serve to help the compiler generate better code. For example, in the code fragment above, the code to create an exception does not have to be generated, so the result will be a smaller program. Don't use declared exceptions for erroneous conditionsThere are two categories of exceptions (more properly, subclasses of Throwable) in Java, those which need to be declared by any method which might them (I refer to these as ``declared exception types'' or ``declared execptions'') and those which can be thrown from any method (``undeclared exceptions''). The undeclared exception types are subclasses of Error and RuntimeException; all other exception types require declarations.There is a balance to be struck between the uses of declared and undeclared exceptions. Using only declared exceptions makes it very difficult to determine where exceptions that actually must be checked for are potentially thrown. On the other hand, using only undeclared exceptions makes it impossible to figure out where exceptions that should be caught and recovered from are thrown. My rule of thumb for whether I make an exception declared or undeclared is whether localized recovery from the exception being thrown is sensible. If the calling method (or one of its recent callers) of the code is the right place to handle a given failure type, I represent that failure with a declared exception. If, on the other hand, if the failure case is best handled by a global handler which catches all the exceptions for a given component of a program and treats them all as failures of that subsystem, I use undeclared exceptions. When in doubt, I err on the side of using undeclared exceptions. Declared exceptions -- because they are reflected in the signature of a method and require matching definitions -- can be used for enforcing static invariants of a program. For example, consider a method which may only be invoked at given times in a program, such as when the moon is full. That method may be declared with a throws MoonMustBeFull clause, referring to a declared exception reserved for that purpose. Then, the author of any code which calls this method is reminded of the static requirements. The method, of course, may check whether the moon is full when it is invoked, and throw the exception when it is not. But throwing the exception is almost incidental; more important is the fact that the caller must pay attention to the requirement. Let the language find errors for youJava is a safe, statically-typed, strongly-typed language. This means that it will point out many of your errors, both at compilation-time and at run-time.One of the most intriguing decisions made by the designers of Java was to eliminate warnings from the compiler. Either a construct is correct, and you can use it, or it's incorrect, and you get a warning. For example, the rules for definite assignment are specified to prevent bugs due to uninitialized local variables. This has the overall effect of making it more likely that programs will be correct when the compiler has finally agreed to compile them. The run-time checks are used to guarantee that a program doesn't violate the constraints of the language by preventing references to, for example, a field of a null object or an out-of-bounds element of an array. These checks are essential to the safety and security Java promises. However, getting such a run-time error is usually an indication of some upstream problem; very rarely is the right approach in such a case to merely insert a check to prevent the run-time error from occurring or to insert a try/catch statement which ignores such an error condition. Often such a problem is trivial to find given the run-time exception and stack traceback. Use assertions |
||||||||||||||||||||||||||||||||
|
One unfortunate absence in Java is a built-in assertion statement.
Just as the language's run-time checks are used to enforce the invariants
of the language, user-written code should include assertions to ensure
that what the programmer believed to be true about the state of the program
is actually true at run-time.
Any individual piece of a large program necessarily makes assumptions about the state of the rest of the program and the context in which it is running. Assertions serve two fundamental purposes: to document those assumptions and to ensure that the conditions hold whenever the code relying on them is run. Pervasive use of assertions makes programs easier to get correct; it's always better to have an error appear as an assertion failure than letting a program run for a time in an erroneous state before failing or, worse, producing incorrect results. Like comments, good assertions are meaningful and up-to-date. Asserting that 1 != 0 is rarely helpful. Similarly, an erroneous assertion in rarely executed code may confuse more than help. When examining an assertion failure, a programmer has an obligation to determine whether the assertion or the rest of the program is correct. I typically add a method along the lines of: public static void assert(boolean condition) {
if (!condition)
throw new Error("assertion failure");
}
to any class where I want to write an assertion. (Often such a definition
in a widely used superclass is very useful.) I always use undeclared exceptions
for assertion failures.
IdiomsAll languages support constructs which might not be clear to a beginning user, but are frequently employed by experienced users. Some languages (including C, APL, and Lisp) often seem to encourage the development of idioms which are hard to follow.The idioms I discuss here are typical of my usage of Java. None is, I hope, hard to follow, but all may seem somewhat odd to a programmer coming from another language, notably C. Use ``while (true)'' for infinite loops |
||||||||||||||||||||||||||||||||
|
Many C programmers (yours truly sometimes included) use the construct
``for (;;)'' for infinite loops. My reasoning was that I couldn't
count on ``true'' being defined in all C programs, and ``while
(1)'' looks strange to my eyes.
Java has a boolean type, so even that specious argument doesn't hold water -- the right thing to use is simply ``while (true)''. Set loop limits in for-initialization clausesThe initialization clause of a for statement is executed exactly once, where the termination test is executed every time around the loop. If the upper bound on a numeric (typically integer) loop could be changed by execution of the loop and one does not want to use the changed value (or if the upper bound is time-consuming to compute) the loop limit can be declared and set along with the iteration variable in the initialization clause. By using this form, neither variable's scope extends outside the loop.For example, this is a typical loop from the compiler for visiting every node in the graph representing a program: for (int i = 0, n = flowgraph.getNodeCount(); i < n; i++) {
Node node = flowgraph.getNode(i);
if (node != null)
visitNode(node);
}
Roughly one-third of the integer for loops in our compiler take
this form. Some usages probably fall into the category of premature optimization,
but when I look an integer for loop with a complex termination expression,
I end up wondering whether it was intentional or not to re-evaluate the
termination, and coding in this style answers that question implicitly.
Other Style GuidesJoshua Bloch's Effective Java is the best thing I've ever read on how to program in Java. The book is a collection of fifty-seven suggestions on how to do or not do things in Java. Bloch writes well -- the book is interesting and all the suggestions seem well motivated. I cannot recommend this book highly enough; I wish it had existed when I started writing Java code. |
||||||||||||||||||||||||||||||||
|
The Java Code Conventions from JavaSoft are less useful than one might hope. These guidelines, derived from Sun's C and C++ conventions, focus on typographic issues almost to the exclusion of everything else. The style they advocate is similar to, but not quite the same as, the style found in examples in the JavaSoft books. Some aspects of the guidelines (such as where to put copyright information) are more appropriate to an in-house corporate style document than a general guide. | ||||||||||||||||||||||||||||||||
|
Doug Lea's
Draft
Java Coding Standard contains a long list of great recommendations.
I agree with most of these and find them well-motivated. If you read just
one other style guide for Java, this is the one I'd recommend.
Peter Norvig's Infrequently Answered Questions about Java is not really a style guide, but it makes some excellent suggestions for cases where Java seems awkward to use. This is also a very good piece to read for someone who's coming to Java from another language (say C), understands the language as a specification, but feels lost in the zeitgeist of Java. Scott Ambler's AmbySoft Java Coding Standard is also pretty good. I find these recommendations too cumbersome and I dislike Ambler's typographic style. But, he's clearly thought about how to put together big programs which are worked on by teams. Rob Pike's Notes on Programming in C is a good contrast to Ambler's guide, advocating a more minimalist style. While I find Rob's code often too terse and typographically cryptic, his comments on complexity and consistency are on target. Martin Fowler's valuable Refactoring covers techniques for rewriting programs with better object-oriented design and style. The parts of the book most relevant to this document help identify areas of your programs which need work. Anyone writing object-oriented programs who hasn't read the ``Gang of Four'' Design Patterns book should pick it up right away. Patterns provide a high-level way of thinking of program structure, a useful vocabulary for discussing design, and a catalog of examples and tools which can be used off-the-shelf. Once this book has seeped into your consciousness, your programs will become more readable. Bertrand Meyer's Object-Oriented Software Construction provides a comprehensive, opinionated discussion of object-oriented style. The book focuses on Eiffel, but is relevant to all OO languages and systems. His ideas on Design by Contract and The Principle of Uniform Reference are central to how many people (including me) think about object-oriented software. Finally, my favorite book on programming style is a twenty year old classic which uses FORTRAN and PL/I for its examples: The Elements of Programming Style, by Brian Kernighan and P. J. Plauger. Some of it is out of date, but most of the book is concerned with general principles that endure. |