Application logs

The symptoms

We have all been there. You are debugging an application and can’t find the problem because you don’t have enough information. So you add some additional debug log lines which provide this additional information.

But, since you are not the first developer doing some debugging, your new potentially life-saving information simply disappears in the flood of other debug lines. Since your log line is obviously more important than the others it needs to stand out. So you add some exclamation marks at the end. This makes it not only stand out more but also clearly conveys the message that this line contains something important. And of course, it relieves some of the stress caused by feeling powerless in hunting this bug. (You might notice that error messages written by developers in general contain a lot more exclamation marks than normal texts, probably due to the same causes).

Again you reboot your application only to find that some other developer with similar frustrations got a similar idea. Time to get creative! Why not add some exclamation marks at the beginning? Or maybe an additional line of “—————–” to underline your statement? And, while we are at it, we might make a nice ascii box around the statement so we are sure that the message pops up and is isolated from all the other irrelevant data in the log. Of course, since we as programmers have a compulsive habit of keeping this clean, we need to make sure that the message itself fits nicely in this box so we spend some time in correctly outlining the box. Maybe we should write some guidelines on how to properly format such a ascii art box. What is the minimal screen size for viewing it? And why is there no <blink> tag support in application log files?!

Stop.

The problem

Ascii art in application log files are a waste. They don’t solve a problem, instead they add to the problem. Imagine that there is this new bug and you have to wade through all these nicely formatted ascii art constructions only to find that none of them contain the information for your bug. Now imagine it is in the middle of the night and your bosses’ boss is on your phone pushing you to fix the bug. You open the log file and find this beauty:

2014-12-15 00:33:00,066 [ajp-bio-8009-exec-79][217.149.193.58:anonymous] 
INFO com.example.StartUp - =====================================
2014-12-15 00:33:00,067 [ajp-bio-8009-exec-80][217.149.193.58:anonymous] 
INFO com.example.Warmup - -----------------
2014-12-15 00:33:00,067 [ajp-bio-8009-exec-80][217.149.193.58:anonymous] 
INFO com.example.Warmup - Warming up caches
2014-12-15 00:33:00,068 [ajp-bio-8009-exec-79][217.149.193.58:anonymous] 
INFO com.example.StartUp - || The application has started!! ||
2014-12-15 00:33:00,070 [ajp-bio-8009-exec-79][217.149.193.58:anonymous] 
INFO com.example.StartUp - =====================================
2014-12-15 00:33:00,071 [ajp-bio-8009-exec-80][217.149.193.58:anonymous] 
INFO com.example.Warmup - -----------------

Maybe some developer added some error line containing something like:
2014-12-15 00:33:00,066 [ajp-bio-8009-exec-79][217.149.193.58:anonymous] ERROR com.example.IndexController- This should not happen
or some helpful random value without context:
2014-12-15 06:15:00,001 [taskScheduler-1][:] INFO com.example.SomeJob- 12
Or, adding insult to injury, some joke comment like:

2014-12-15 06:18:05,349 [localhost-startStop-1][:] 
FATAL Can't start service. Good luck debugging this!

If you see this kind of behaviour, it is time to take application log files seriously. Application logs are important. When log files are a mess, it seriously impacts maintenance of your application. Not only is keeping ascii art pretty quite wasteful, it doesn’t work. In the same way the quality of your tests should be on par with the quality of your production code, the same goes for your log files.

Log levels

So what does work?

First of all, all decent logging frameworks support log levels. These require guidelines just as much as any other programming construct. These log levels differ per framework but as a starting point, I’ll give some pointers:

TRACE: The free-for-all, no constraints, log whatever you want level. In general, these should be removed as soon as you have finished your feature or bugfix. It might be a good idea to add your initials or login name to these log lines so you can find them later.

DEBUG: Logs most entry/exit conditions, flow decisions and values of important variables. Each statement should be self-contained: e.g. “12” is not acceptable, but “the price is 12 EUR” is. The log at this level should read like a (very poorly written) story.

INFO: This log level is a bit more application specific. I tend to use it for events that can have a global impact (e.g. cross-thread or cross-deployment interaction). In general, start- and stop events of applications and batch processes should be logged on this level, as well as wide-scoped transactions or large write actions.

WARN: Some edge case or non-trivial non-happy flow condition triggered which can be managed by the application. It should log three things: what the application was doing, what went wrong and what will happen now. In general, nothing more is needed since it is managed. It might help explain why the application responded in a certain way and resolve the ticket as “not a bug” by explaining how the application works. An example: “When validating the users login attempt, a locked account was detected. Will show default message ‘username/password incorrect’.”

ERROR: Here things are getting interesting. This state should be triggered whenever the programmer cannot resolve a non-happy flow in a specific way. In general, if there is no specified error handling and the programmer has to do something, this is the level to use. Again, all three aspects should be logged: what the application was doing, what went wrong and what will happen now. In this case, logging a strack trace as well is mandatory. An example: “When retrieving the user account, an exception occurred: null. Will show generic ‘system error’ message to user’.”

If there is an alerting system in place (e.g. automatically wake people up at night), there might be a different level defined for that as well. Although, in general it is better to have a dedicated log analysis tool for that which can also analyse trends .

I tend to advocate going into production with the application log level on DEBUG for as long as the deployment can handle it. Switching log levels to a higher level can always be done quickly if it becomes necessary. (You are using a proper logging framework, are you?) Keeping it at DEBUG allows for quicker bugfixes and makes sure that even on this level the messages are treated seriously.

Solution

Coming back to our original problem: how can I find my desperately looked-for log line? If everyone is sticking to the above rules, it still doesn’t help in finding my log line quickly. Although I now know on which level it should be, there are still plenty of log lines in the way. The solution is not to pollute the log file, but simply use a proper log viewer. A plugin as Grep Console for Eclipse easily helps you do the trick. Create a match on your specific line and then it can highlight your specific line, hide others and it can even make sound if you want. The only downside? It doesn’t support blinking.

PS: If you want to automate log file monitoring, there are plenty of tools available at different levels of complexity. Having proper log files will help all of them:

unix command line tool for analysing logs: Simple Event Correlator
log aggregator: Logstash (ELK stack)
application analytics: NewRelic

Application logs

The symptoms

The problem

Log levels

Solution

Arjan Lamers

Previous PostOpenShift 3 beta

Next PostGrails 3 Released: Installing Gradle and Groovy 2.4 Support in Eclipse/GGTS

First8 | Conclusion en volg ons