Making Community · Keynote
Thursday, 09:00, 1 hour | Ballrooms ABCD
Over the last several years, businesses have seen an explosion in the volume, variety, and velocity of the data they must deal with every day. This has been both a blessing and a curse - at the same time as the explosion in data has enabled new types of highly intelligent applications and insights, developers have found that the previous generation of data management tools and frameworks struggle to keep up with terabytes or petabytes of often ill-structured data.
In this talk, Todd will introduce Apache Hadoop, an open source framework for storing and analyzing large quantities of diverse data. He will first discuss the motivation for the system, and then dive into details about the main components of the stack, their overall architecture, and the programming paradigms used to express scalable and flexible computation on large datasets. Along the way, he will also highlight some interesting applications of the technology. Lastly, Todd will share some experiences and lessons the Hadoop team has learned from working on large scale systems software written primarily in Java.
Todd is an engineer at Cloudera, where he works primarily on open source distributed systems in the Apache Hadoop ecosystem. He is a committer on the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects, and a member of the Apache HBase and Apache Thrift Project Management Committees. He has also contributed to JCarder, a dynamic analysis tool for detecting deadlock-prone lock cycles in complex Java applications.
Prior to Cloudera, Todd worked on backend infrastructure for several web sites including Amie Street, Songza, and MyArtPlot, which he co-founded in 2006. Todd has been coding Java since 2001 but also has shipped production code in Python, Perl, PHP, Erlang, and C++. Todd received his bachelor's degree with honors from Brown University.