Organizers: A review of the year in Deep Learning
Lunch (pizza provided)
CNN Reductions for Low Power Computing –
Micah Bojrab is a seasoned software engineer at MDA and a PhD student under Ming Dong at Wayne State University working on GPU-based Convolutional Neural Nets for Image Recognition. Micah has an extensive commercial background working on parallelization of algorithms using CUDA.
Abstract: The world of computation is changing. This new world relies heavily on lightweight, battery-powered devices and commodity hardware. For many technologies, this ever changing landscape requires them to adapt or be surpassed. Deep Learning is no exception. Classically Deep Learning is a computationally intensive technique only feasible on GPUs and similar heavyweight vector processors. This presentation will review CNNs, their applicability to compute-limited devices, inherent problems therein, and a few techniques to overcome these limitations.
Improving Generalization via Deep Reinforcement Learning –
, University of Michigan.
Junhyuk Oh is a third-year Ph.D. student in Computer Science and Engineering at University of Michigan. He is supervised by Professor Honglak Lee and Professor Satinder Singh. His research focuses on deep learning and reinforcement learning.
In this talk, I will briefly introduce the basic idea of deep reinforcement learning (Deep RL).
I will also present my recent work that aims to improve generalization ability of RL agents through deep learning.
The first work focuses on how to generalize over unseen and larger topologies in 3D world given navigational tasks.
The second work discusses how to generalize over new tasks that are described by natural language.
Going Deeper in Semantics and Mid-Level Vision –
, University of Michigan.
Jia Deng is an Assistant Professor of Computer Science and Engineering at the University of Michigan. His research focus is on computer vision and machine learning, in particular, achieving human-level visual understanding by integrating perception, cognition, and learning. He received his Ph.D. from Princeton University and his B.Eng. from Tsinghua University, both in computer science. He is a recipient of the Yahoo ACE Award, a Google Faculty Research Award, the ICCV Marr Prize, and the ECCV Best Paper Award.
Abstract: Achieving human-level visual understanding requires extracting deeper semantics from images. In particular, it entails moving beyond detecting objects to understanding the relations between them. It also demands progress in mid-level vision, which extracts deeper geometric information such as pose and 3D. In this talk I will present recent work on both fronts. I will describe efforts on recognizing human-object interactions, an important type of relations between visual entities. I will present a state-of-the-art method on human pose estimation. Finally, I will discuss recovering 3D from a single image, a fundamental mid-level vision problem.
Object Detection Using Deep Neural Networks –
, University of Michigan.
Yuting Zhang is a postdoctoral fellow at the EECS department, University of Michigan. He has been working on deep learning and its application to computer vision with Prof. Honglak Lee since 2013. He received his Ph.D. from Zhejiang University in 2015, advised by Prof. Gang Pan. His research focuses on modeling visual recognition and generation problems with deep representation learning and probabilistic methods. His work provided improved solutions to long-standing object detection and image classification problems. He won the 1st place of the Computer Vision Community Top Paper Award (OpenCV People’s Vote Winning Papers) in CVPR 2015.
Abstract: Accuracy and efficiency are the key performance indicators for object detection systems. A recent series of work on deep convolutional neural networks (CNNs) has made groundbreaking advances in dramatically pushing the state-of-the-art of object detection. Deep neural representations adapted from large-scale image classification networks serve as strong backbones for achieving high recognition accuracy in the semantic space. Localization-sensitive training objectives, post-classification regression, and advanced search algorithm provide further driven force for more accurate localization in the spatial space. For more practical end-user applications, feature sharing in convolutional architectures has been widely explored to cut down the heavy computational cost of detection systems. With all above efforts, reliable detection performance has been achieved for predefined object categories of interest. More recent studies have envisioned the possibility of extending the semantic space to natural languages, leading to stronger flexibility and accessibility for human.
Are Virtual Worlds the Present and Future of Visual Scene Understanding? –
, Toyota Research Institute (TRI).
German Ros' research focuses on visual scene understanding for driving scenarios, by exploring the use of virtual worlds and domain adaptation to build effective and low-cost AI models, which are able to perceive their environment and drive safely. Ros has been walking the industrial path of research, helping companies such as Toshiba, Yandex, Audi, Intel and Huawei to exploit virtual worlds and deep learning technologies to develop new products and services related to autonomous driving. He has recently joined Toyota Research Institute (TRI) as a researcher.
Abstract: Recently, supervised deep learning has become the preferred tool for visual scene understanding tasks, such as object detection and semantic segmentation, which are nowadays critical for autonomous vehicles. However, in order to apply deep learning methods to these tasks large volumes of annotated images are required. The process of annotation is usually a human intensive labor carried out by several operators, who provide imprecise annotations at a high economical cost. As an alternative, by using virtual worlds we can automatically obtain large amounts of precise and rich annotations, but several questions arise: (I) can a visual model trained in realistic virtual worlds successfully operate in real world contexts? (ii) What is the required setup to achieve good results?Conducted experiments in some specific visual tasks show that virtual-world based training can provide excellent testing accuracy when combined with simple domain adaptation techniques. Here we show how the combination of realistic virtual-worlds and domain adaptation becomes a cost-effective alternative to train convolutional neural networks to produce state-of-the-art models. Are we entering the era of synthetic data?
Deep architectures for visual reasoning and decision-making –
, Google Brain & University of Michigan.
Honglak Lee is an Assistant Professor of Computer Science and
Engineering at the University of Michigan, Ann Arbor. He received his
Ph.D. from Computer Science Department at Stanford University in 2010,
advised by Prof. Andrew Ng. His research focuses on deep learning and
representation learning, which spans over unsupervised and
semi-supervised learning, supervised learning, transfer learning,
structured prediction, graphical models, and optimization. His methods
have been successfully applied to computer vision and other perception
problems. He received best paper awards at ICML and CEAS. He has
served as a guest editor of IEEE TPAMI Special Issue on Learning Deep
Architectures, as well as area chairs of ICML, NIPS, ICCV, AAAI,
IJCAI, and ICLR. He received the Google Faculty Research Award (2011),
NSF CAREER Award (2015), and was selected by IEEE Intelligent Systems
as one of AI's 10 to Watch (2013).