søndag den 26. september 2010

How I Learned to Stop Worrying and Love the Eclipse "Format Source" Save Action...

One of the most interesting things in Eclipse in recent years (besides Team Project Sets and the standalone version of the compiler) was when "Editor Save Actions" were introduced, and that it allowed to run "Format Source" on every Java file when saved from within Eclipse.

This has proven to be quite disruptive to the way I write Java programs for several reasons. First let me explain why I think this is a good idea in the first place:

When you have code which may have lived a few years in a source repository with a bugfix once in a while it is inevitable that the code will need to be reindented inside a {}-block, or an array assignment has become to wide, and you groan and manually do the reformatting. The reason for the groaning is because you cannot just reformat the whole source file with Ctrl-Alt-F as there is some code elsewhere in the source file which has been manually reformatted ages ago, and the automatic formatter breaks this manual formatting.

Breaking is one thing, what is much worse is that if you change the source by reformatting, it means that the source control system (CVS; Subversion etc.) dutifully records this and everytime you in the future wants to know when that line was changed you get the time of the reformatting, not the actual last change to the source file. I have actually worked on a project where we were prohibited to use the formatter because of exactly this side-effect.

Naturally it wouldn't be a problem if everybody reformatted their source before comitting as that would mean that a later reformat wouldn't change anything and not count as a revision with regards to the source repository, and now Eclipse can help by doing it automatically. You never have to think about it again! This is so big a thing that it is hard to understand before you have actually used it for a while.

  • When working with code you can always just reformat the whole file at any time if your code gets messy. Don't worry about breaking formatting elsewhere - just do it!
  • After a while you move from Ctrl-Alt-F to reformat to just saving the file with Ctrl-S. If anything moves unexpectantly, you give it a closer look.
  • Code gets a common look. The individual styles everybody picks up, like spaces or no spaces around equal signs, or should the + be at the end of the line or at the beginning of the next line when having a multi line string concatenation, vanish automatically when the source is reformatted. This common look is also beneficial because everybody is used to reading that style and does not get annoyed with small things like missing spaces.
  • For this to work for a whole team everybody on the team needs to have exactly the same settings. The easiest way in Eclipse to do this, is just to accept the standard Eclipse settings, but you can set up the formatter as desired to conform with your local standard, and save the preferences to a file, which others then can load when starting a new workspace. Very useful, but unnecessary if you just stick with the Eclipse default.
This makes all very good sense on paper, but after a while it shows that the Eclipse formatter on default settings occasionally does some very unexpected indentations when formatting a wide complex expressions and needing to break it over multiple lines. All in all I found it was time to introduce an additional rule to avoid the formatter having to do this.
  • All statements must fit on a single source line.
This proved to have a much wider impact than I originally thought.

The immediate consequences were that a lot of no-need-to-think refactoring was needed if a line was too wide.
  1. long method arguments could be extracted to a local variable
  2. assignments and definitions could be split ("String s = ...." were to be turned into two lines "String s;" and "s = ...."). Eclipse offers this with Ctrl-1 with the cursor on the equals sign. This is not necessarily a good thing but if it is enough it is acceptable.
  3. if blocks were nested more than about 3-4 levels the lines were still too short anyway and you had to extract a method to get the lines long enough.
In other words you have a very simple rule for when to refactor and how. If the formatter breaks the line, it is time to do it, and you are constantly reminded until you do so.

But here it is very common for people to say "All this just gives extra work, which is unnecessary. Why do it then?". I thought so too myself, but then I concluded that:

Extraction of method arguments to local variables allow you to give meaningful names to each argument! This is important for future programmers to understanding it when reading it. There is no names at the calling spot in Java (Math.substring(10,19) does not indicate whether 19 is the number of characters wanted or something else completely) . By doing this manual listing of parameters you get naming there too.

NullPointerExceptions are also easier to handle in this way. By listing each argument on its own line instead of something like "new String[] {f.getX(), g.getX(), h.getX()}" where you have no idea of whether f, g or h was null if you get a NullPointerException. The short lines help too here. (This goes for ArrayIndexOutOfBoundsExceptions too but they are rarer).

And I think that the resulting code is generally easier to read which is also an advantage in itself...













mandag den 24. maj 2010

A gentle introduction to logging in Java with slf4j

This blog entry is intended for starting Java programmers who have learned to get information from inside their program with

System.out.println("a = " + a);

wants to learn how experienced Java programmers do the same thing in production code by using

log.debug("a = {}", a);

The way of programming described in the following is exactly the same as is used in large Java projects, and is the best practice known to me. The only difference is that the setup described here, is kept small and simple in order to be useful to small, standalone programs as those typically written by starters. If you need more advanced features, you can change the setup without having to change a single line of your own code (but how to do that is beyond the scope of this blog entry).

It is my hope that by using the techniques described in this document you will have a greater chance to avoid some very embarrassing and exhausting experiences in your future programming jobs, and hopefully not having to unlearn a habit or two.

"And you try and tell the young people of today that ..... they won't believe you."

Java was designed by Sun Microsystems, which happened to make a living at the time selling computers running the Sun dialect of the Unix operating system. It is not a surprise then that Java programs tend to look like other Unix programs, and they have been designed to work on the computer you see in the picture below.

## PIC ###
Picture shows Ken Thompson and Dennis Ritchie working at the first real Unix machine. Please note that there is real paper in the "typewriters", the closet behind the men is the computer, and that the circular containers on top of the computer contain program and data on paper tape. Things were different in 1973.

Under Unix every program has three input/output connections available - namely:
  • standard input - this provides the things the user type to the program. (Java: System.in)

  • standard output - this is normal information (Java: System.out)

  • standard error - this is information about error situations (Java: System.err)
A central concept in Unix is that you can let the output of one program be the input of another program - called a pipe - encouraging small specialized programs being combined to complete a larger task, like creating a list of words in a file by first extracting all words, sorting them, and then eliminate duplicates. Unfortunately this has not caught on in Java, mostly because the Java environment takes too long to start for small programs to be really usable. The concepts are still the same, however.

So the simplest possible Java program uses System.out to print out the line "Hello World":

1: public class HelloWorld {
2: public static void main(String[] args) {
3: System.out.println("Hello World");
4: }
5: }


This should be named "HelloWorld.java", and is compilable from the command line with "javac HelloWorld.java", and the compiled class can then be invoked with "java HelloWorld".

Line 1 declares this to be the class named HelloWorld - there is no "package" line so it is just HelloWorld instead of e.g. com.bigcoorporation.HelloWorld.

Line 2 is the magic words needed to be able to invoke the class file from the command line as "java HelloWorld".

Line 3 prints the line "Hello World" to standard output.

Now, why did that happen?


One of the unfortunate side effects of programming for a living is that people are using your programs, and they tend to find that your programs do not work as they expect - this is traditionally called a "bug"##FIXME FOOTNOTE 2## - and it is then your responsibility to identify and correct the problem.

In most cases, all you have to work with is the user describing what happened and what was expected. The program has ended and is for all practical purposes unavailable to you. Hence a way of saving information from within the program for later is needed, much in the same way that logbooks are used on ships to aid in navigation. The term "logging" is used in general for the process of systematically saving information for later.

Several logging solutions are available to Java programmers. All of these generally share the ability to log:
  • The area of the code containing the log statement.

  • The time when the log statement was invoked.

  • An importance level of this log statement.

  • An exception (with accompanying stack trace).

  • A log message.


This has proven to be very helpful for logs intended for the human reader. Facilities available in some logging frameworks include
  • store events in a database

  • alert users by email, instant messaging or any other electronic means


(but that is too advanced to be covered in this introduction - it is just to give you an idea of what is possible)

The HelloWorld program rewritten to use the SLF4J logging solution (full instructions are in appendix A) and put in the "demo" package, looks like:

package demo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HelloWorld {
public static void main(String[] args) {
Logger log = LoggerFactory.getLogger(HelloWorld.class);
log.info("Hello World");
}
}


and generates the following output:

2009-06-07 10:38:21 demo.HelloWorld main
INFO: Hello World

The first line contains a date and time stamp, the name of the logger - here "demo.HelloWorld" - and the name of the method containing the log.info statement - here "main".

The second line contains the importance level - here "INFO" - and the message passed - here "Hello World". Note how much of this information was derived automatically!

If we add a method call with a delay, we can see that the timestamp changes, and that the method names in the log correctly indicates if the log statement is done from "main" or "waitTwoSeconds".

package demo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HelloWorld2 {
public static void main(String[] args) throws Exception {
Logger log = LoggerFactory.getLogger(HelloWorld2.class);
log.info("Hello World");
waitTwoSeconds(log);
log.info("Done");
}
private static void waitTwoSeconds(Logger log) throws Exception {
Thread.sleep(2000);
log.info("Slept 2 seconds");
}
}


with the following output:


2009-06-07 10:53:52 demo.HelloWorld2 main
INFO: Hello World
2009-06-07 10:53:54 demo.HelloWorld2 waitTwoSeconds
INFO: Slept 2 seconds
2009-06-07 10:53:54 demo.HelloWorld2 main
INFO: Done

If you were shown this log, you would be able to determine that the program had run as expected to the statement logging "Done" in the main method of demo.HelloWorld2 and you should not look for problems happening before that statement.

Logging information stored in variables



If you need to add information stored in a variable, slf4j allow you to specify this with {}. (Note: Example works with Java 5 and later)
package demo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class ZeroToFour {
public static void main(String[] args) throws Exception {
Logger log = LoggerFactory.getLogger(ZeroToFour.class);
for (int i = 0; i < 5; i++) {
log.info("i = {}, i*i = {}", i, i * i);
}
}
}


which outputs

2009-06-07 13:23:57 demo.ZeroToFour main
INFO: i = 0, i*i = 0
2009-06-07 13:23:57 demo.ZeroToFour main
INFO: i = 1, i*i = 1
2009-06-07 13:23:57 demo.ZeroToFour main
INFO: i = 2, i*i = 4
2009-06-07 13:23:57 demo.ZeroToFour main
INFO: i = 3, i*i = 9
2009-06-07 13:23:57 demo.ZeroToFour main
INFO: i = 4, i*i = 16


All objects can be logged in this way, and the toString() method is called to get the string to log. Note that int, long, double, floats, chars are not objects, but that in Java 5 and later they are invisibly converted correctly.

Logging problems



All programs regardless of programming language must have a way to handle problems. In Java this is done by "throwing exceptions" which causes the Java Virtual Machine to stop the normal execution of your program, and start looking for code which can "catch the exception thrown" - if none is found in the current method, the method calling it is investigated and if none is found there the method which called it is investigated, etc. This continues until either a suitable "try-catch-clause" has been found or the program ends with an error message and a stack trace.

A stack trace looks like:

java.lang.ArithmeticException: / by zero
at demo.CatchException.doStuff(CatchException.java:20)
at demo.CatchException.main(CatchException.java:12)


The first line lists the full name of the exception, a colon, and the error message. The next line tells where the exception was thrown, here in the method doStuff() in the class demo.CatchException (the exact location is on line 20 in the file CatchException.java which contains the source code for the class). The rest of the lines is on the same form, but tells where the call to the previous line happened.

In the above example, main() calls doStuff() on line 12, and in doStuff() on line 20 the exception is thrown.

The full program looks like:
1: package demo;
2:
3: import org.slf4j.Logger;
4: import org.slf4j.LoggerFactory;
5:
6: public class CatchException {
7:
8: public static void main(String[] args) {
9: Logger log = LoggerFactory.getLogger(CatchException.class);
10:
11: try {
12: doStuff(log);
13: } catch (Exception e) {
14: log.error("Something happened", e);
15: }
16: }
17:
18: public static void doStuff(Logger log) {
19: log.info("1/1 = {}", 1 / 1);
20: log.info("1/0 = {}" ,1 / 0);
21: }
22:}


Here is a sample run with line numbers.

1: 2009-06-07 12:59:27 demo.CatchException doStuff
2: INFO: 1/1 = 1
3: 2009-06-07 12:59:27 demo.CatchException main
4: SEVERE: Something happened
5: java.lang.ArithmeticException: / by zero
6: at demo.CatchException.doStuff(CatchException.java:20)
7: at demo.CatchException.main(CatchException.java:12)


Line 19 in the program logs "1/1 = {}" with the argument 1. This results in line 1-2 in the output. Line 1 contains a timestamp and the class and method. Line 2 contains the severity of the log statement - here INFO - and the log message with {} replaced with arguments.

Line 20 tries to log "1/0 = {}" with the argument 1/0, but dividing by zero causes the JVM to throw an ArithmeticException which cancels the execution of line 20 and begins the search for a suitable try-catch-clause. There is not one in doStuff() so the JVM goes back to the place where doStuff was called - this is line 12 in main() - which is inside a suitable try-catch-clause!

The exception thrown is therefore assigned to the variable e in line 13, and an error is logged on line 14. This results in line 3-7 in the output. Line 3 contains the timestamp and the class and method of the logging statement. Line 4 contains the severity of the log statement - here SEVERE - and the message logged. Line 5-7 contains the stack trace contained in the exception.

What SHOULD I log?



When you think about what you should log, you might consider thinking of the worst case scenario, where you need to locate and fix a Heisenbug - a problem which you cannot reliably reproduce - only using the log from when it happened to the customer.

This means doing a "log.debug" call for every piece of work or decision taken with arguments describing what has been done, and a "log.info" call for every important piece of work with arguments.

Fortunately this is usually not necessary! But such production scenarios exist - all you have is your log. You might as well think about it when you write your code.

  • Usually the log.debug is used by the developers to debug methods, and they are switched off in production scenarios so they are just skipped (hence they must be fast to skip).

  • Use log.info about things you would consider important enough to warrant logging but not enough for any more than that.

  • Use log.warn to note unusual things - preferably in detail - that might warrant a second glance from a system operator.

  • Use log.error for those things that require immediate attention. Like waking up people at 03:00...

  • Log the output of "new java.util.Date()" when starting the program. Perhaps also with the name of the machine and its IP-number. This helps you identifying restarts, get the full timestamp of the restart, plus keeping track of multiple machines with identical configuration.

  • For files, consider logging the absolute file name when opening a file. This might help you identifying problem with the wrong file being read or written.


Appendix A - Setting up SLF4J



SLF4J is a frontend which allow the usage of any of several logging frameworks without changing code and recompiling. In order to run the examples, you need to download and unpack the latest release of SLF4J from http://slf4j.org/download.html. To get the same output as this document you will need to tell slf4j to use the logging framework built into Java from 1.4 onwards.

Amongst the unpacked files you will find slf4j-api-1.5.8.jar and slf4j-jdk14-1.5.8.jar (numbers vary slightly, please adjust accordingly) which need to be added to your classpath.

To compile a program with slf4j from the command line, use

javac -g -cp .../slf4j-api-1.5.8.jar demo/CatchException.java


and to run it with slf4j and the jdk14 backend on a Unix system (Linux, Mac etc.), use:

java -cp .:.../slf4j-api-1.5.8.jar:.../slf4j-jdk14-1.5.8.jar demo.CatchException


(Note that the argument after "-cp" is period-colon-pathtoslf4japi-colon-pathtoslf4jjdk. I.e. three locations separated with a colon)

To run it under Windows with slf4j and jdk14 backend, use:

java -cp .;.../slf4j-api-1.5.8.jar;.../slf4j-jdk14-1.5.8.jar demo.CatchException


(Note that the argument after "-cp" is period-semicolon-pathtoslf4japi-semicolon-pathtoslf4jjdk. I.e. three locations separated with a semicolon)

For Eclipse, right-click on a Java project in the Package Explorer, choose Properties, choose Java Build Path, choose Library pane, click "Add External Jars" and add each of the two jar files.

This allows you to use the logging framework built into Java with slf4j. Other frameworks exist and most can be used through slf4j by including the correct jar file instead of slf4j-jdk14-1.5.8.jar when the program is started. See http://www.slf4j.org/ for the latest list of native and wrapped implementations.

Regardless of which logging framework is being used, the usual rules of that framework apply - slf4j is just a simple, common frontend - so if you want to configure the jdk14 logging you must read the official Sun documentation. It is a bit hard to read, so I suggest you see if your specific problem is covered at http://www.exampledepot.com/egs/java.util.logging/pkg.html

Appendix B: Questions you might think



Q: Isn't it slow to have lots of "log.debug" calls if they have been switched off?
A: No. Logging frameworks are optimized to handle this very quickly, and it is not a problem.

Q: I have read that it is necessary to use "isDebugEnabled()" to make logger calls fast in log4j. Why?
A: Log4j predates slf4j, and this does not apply to slf4j if you use the {}-placeholders, as they delay the actual creation of the full string to be logged as much as possible, notably after the event has been determined to be logged.


Q: What is the recommended way to declare the logger variable inside an object (and not a static method)?
A: You should declare it like this

private final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(this.getClass());


Note: This will not work for static methods. Here you should start each method with requesting a Logger object similar to:

public static doStuff() {
final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(ThisClass.class);
log.info("in doStuff()");
}



Notes:

2) The origin of the word is that in 1945 a moth was found to cause a malfunction in the Mark I computer. It can be seen at http://www.history.navy.mil/photos/images/h96000/h96566kc.htm