Code formatting is an opinionated beast. It always has been a matter of taste, and it always will be a matter of taste. This is the reason, why professional formatting tools, such as Eclipse JDT, offer a gazillion number of options. Which is still not sufficient enough. After all, you can override them inline with tag-comments to make the formatter shut up. Can't we do better than that? What if we could use machine learning techniques to detect the preferred code style that was used in a codebase so far? Turned out, we can.
Last year we gave a talk at EclipseCon about the possibilities that come with the Antlr Codebuff project (https://github.com/antlr/codebuff). We showed how it helps solving the problem with good results. The only ingredients you need is a small set of representative examples and a grammar written with ANTLR 4 (http://www.antlr.org/). Sounds great and will work for a lot of areas including Eclipse Xtext that uses ANTLR under the covers, but what if there is no given grammar? What if writing a grammar is hard and doing it just for the purpose of formatting doesn’t make sense? There must be a smarter way, because machines are already able to learn from the huge mess in social networks. In Comparison to social networks, source code is more structured and a machine should be able to learn from it. We trained a machine with Deep Learning, so it can format source code without knowledge of the grammar.
In this talk, we’ll explain the problem of formatting, demo Codebuff and our Deep Learning prototype including the concepts behind both technologies. Our goal for this talk is to convince you that writing a formatter manually is not necessary any more.