aistats.github.io/schedule.html at master · mutual-ai/aistats.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
layout: default
---
<script type="text/javascript">
<!--
    function toggle_visibility(id) {
       var e = document.getElementById(id);
       if(e.style.display == 'block')
          e.style.display = 'none';
       else
          e.style.display = 'block';
    }
//-->
</script>


<script type="text/javascript">
    document.getElementById('LNschedule').id='leftcurrent';
</script>


<div class="contents">

<h1>{{ site.conference.name }} {{ site.conference.year }} Program of Events</h1>


<!--<b>VENUE MAP</b>: A plan of the hotel and surrounding areas can be found <a href="Hyatt66.plan.pdf">here</a>.<br><br>-->


<h2>Best Paper Awards</h2>

<a href="http://proceedings.mlr.press/v54/newling17a.html">A Sub-Quadratic Exact Medoid Algorithm</a> <br>
<font color=red>James Newling, Francois Fleuret</font><br><br>

<a href="http://proceedings.mlr.press/v54/bahmani17a.html">Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation</a><br>
<font color=red>Sohail Bahmani, Justin Romberg</font><br><br>

<a href="http://proceedings.mlr.press/v54/naesseth17a.html">Reparameterization Gradients through Acceptance- Rejection Sampling Algorithms</a><br>
<font color=red>Christian Naesseth, Francisco Ruiz, Scott Linderman, David Blei</font><br><br>


{% for post in site.posts reversed %}
{% if post.layout == "singletrack" %}
{% include listsingle.html %}
{% endif %}
{% endfor %}


<h2>21-Apr (Fri)</h2>

<b>7:30-9:00</b> Breakfast, Windows on the Green & Chart Room <br><br>
<font color=blue><b>8-10</b> Registration Desk <br><br></font>

<b>9:00-10:00</b><font color="red"> Invited Talk, Cynthia Rudin, Crystal Ballroom 1, 2</font><br>
<b>What Are We Afraid Of?: Computational Hardness vs the Holy Grail of Interpretability in Machine Learning.</b>
 <a href="#foo3" onclick="toggle_visibility('foo3');">See abstract.</a> <a href="http://prezi.com/6i5xnwf-snwf/?utm_campaign=share&rc=ex0share&utm_medium=copy">See slides</a>.
<div id="foo3" style="display:none;">
<i>
Is there always a tradeoff between accuracy and interpretability? This is a very old AI question. Many people have claimed that they have investigated the answer to this question, but it is not clear that these attempts have been truly serious. If we try to investigate this claim by comparing interpretable modeling algorithms (like decision trees - say CART, C4.5) to a black box method that optimizes only accuracy (SVM or neural networks), we will not find the answer. This is not a fulfilling comparison - the methods for producing interpretable models are greedy myopic methods with no global objective, whereas the black box algorithms have global objectives and principled optimization routines. In order to actually answer this question, we would have to compare an "optimal" interpretable model to an optimal black box model. This means we actually need optimality for interpretable models. This, of course, leads to computationally hardness, which scares us. On the other hand, we have computing power like never before. So do we truly know what we are afraid of any more?
In this talk I will discuss algorithms for interpretable machine learning. Some of these algorithms are designed to create certificates of nearness to optimality. I will focus on some of our most recent work, including (1) work on optimal rule list models using customized bounds and data structures (these are an alternative to CART) (2) work on optimal scoring systems (alternatives to logistic regression + rounding). Further, since we have methods that can produce optimal or near-optimal models, we can use them to produce interesting new forms of interpretable models. These new forms were simply not possible before, since they are almost impossible to produce using traditional techniques (like greedy splitting and pruning). In particular: (3) Falling rule lists, (4) Causal falling rule lists, and (5) Cost-effective treatment regimes. Work on (1) is joint with postdoc Elaine Angelino, students Nicholas Larus-Stone and Daniel Alabi, and colleague Margo Seltzer. Work on (2) is joint with student Berk Ustun. Work on (3) and (4) are joint with students Fulton Wang and Chaofan Chen, and (5) is an AISTATS 2017 paper that is joint work with student Himabindu Lakkaraju.<br>
<font color=blue>
<u>Bio:</u> Cynthia Rudin is an associate professor of computer science and electrical and computer engineering at Duke University, and directs the Prediction Analysis Lab. Her interests are in machine learning, data mining, applied statistics, and knowledge discovery (Big Data). Her application areas are in energy grid reliability, healthcare, and computational criminology. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo where she received the College of Arts and Sciences Outstanding Senior Award in Sciences and Mathematics, and three separate outstanding senior awards from the departments of physics, music, and mathematics. She received a PhD in applied and computational mathematics from Princeton University. She is the recipient of the 2013 and 2016 INFORMS Innovative Applications in Analytics Awards, an NSF CAREER award, was named as one of the "Top 40 Under 40" by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. Work from her lab has won 10 best paper awards in the last 5 years. Her work has been featured in Businessweek, The Wall Street Journal, the New York Times, the Boston Globe, the Times of London, Fox News (Fox & Friends), the Toronto Star, WIRED Science, U.S. News and World Report, Slashdot, CIO magazine, Boston Public Radio, and on the cover of IEEE Computer. She is past chair of the INFORMS Data Mining Section, and is currently chair-elect of the Statistical Learning and Data Science section of the American Statistical Association.</i>
</font>
</div>
<br><br>

<b>10:00-10:30</b> Coffee Break, Crystal Atrium<br><br>

<b>10:30-12:10</b> <u>Theory</u>, Crystal Ballroom 1, 2<br>
<i>Session Chair: Sanjoy Dasgupta</i><br>
  94 Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation<br>
  68 A Sub-Quadratic Exact Medoid Algorithm<br>
  456 On the Interpretability of Conditional Probability Estimates in the Agnostic Setting<br>
  209 Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers<br><br>

<b>12:10-2:00</b> Lunch on your own<br><br>
<font color=blue><b>1:00-3:00</b> Registration Desk <br><br></font>

<b>2:00-3:40</b> <u>Approximate Inference and MCMC</u>,  Crystal Ballroom 1, 2 <br>
<i>Session Chair: Simon Lacoste-Julien</i><br>
  51 Annular Augmentation Sampling<br>
  101 Removing Phase Transitions from Gibbs Measures<br>
  170 Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms<br>
  174 Asymptotically exact inference in differentiable generative models<br><br>

<b>3:40-4:10</b> Coffee Break, Crystal Atrium <br><br>

<b>4:10-7:00</b> Poster Session (with light snacks), Crystal Ballroom 3, 4 <br>
<a href="#foo4" onclick="toggle_visibility('foo4');">See poster list.</a>
<div id="foo4" style="display:none;">
 <font size = 1px>
  fP01: 82 Near-optimal Bayesian Active Learning with Correlated and Noisy Tests<br>
  fP02: 9 Large-Scale Data-Dependent Kernel Approximation<br>
  fP03: 86 Distance Covariance Analysis<br>
  fP04: 228 Rank Aggregation and Prediction with Item Features<br>
  fP05: 420 Signal-based Bayesian Seismic Monitoring<br>
  fP06: 60 Learning Cost-Effective and Interpretable Treatment Regimes<br>
  fP07: 170 Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms<br>
  fP08: 174 Asymptotically exact inference in differentiable generative models<br>
  fP09: 288 Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models<br>
  fP10: 196 Local Perturb-and-MAP for Structured Prediction<br>
  fP11: 51 Annular Augmentation Sampling<br>
  fP12: 104 Performance Bounds for Graphical Record Linkage<br>
  fP13: 180 Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets<br>
  fP14: 273 CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC<br>
  fP15: 298 On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior<br>
  fP16: 345 Learning Optimal Interventions<br>
  fP17: 419 Learning Structured Weight Uncertainty in Bayesian Neural Networks<br>
  fP18: 429 Discovering and Exploiting Additive Structure for Bayesian Optimization<br>
  fP19: 211 Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes<br>
  fP20: 416 Prediction Performance After Learning in Gaussian Process Regression<br>
  fP21: 129 A Framework for Optimal Matching for Causal Inference<br>
  fP22: 523 Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness<br>
  fP23: 182 Least-Squares Log-Density Gradient Clustering for Riemannian Manifolds<br>
  fP24: 68 A Sub-Quadratic Exact Medoid Algorithm<br>
  fP25: 117 Random Consensus Robust PCA<br>
  fP26: 224 Adaptive ADMM with Spectral Penalty Parameter Selection<br>
  fP27: 94 Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation<br>
  fP28: 214 Data Driven Resource Allocation for Distributed Learning<br>
  fP29: 278 Comparison-Based Nearest Neighbor Search<br>
  fP30: 363 Generalization Error of Invariant Classifiers<br>
  fP31: 456 On the Interpretability of Conditional Probability Estimates in the Agnostic Setting<br>
  fP32: 141 ConvNets with Smooth Adaptive Activation Functions for Regression<br>
  fP33: 404 Diverse Neural Network Learns True Target Functions<br>
  fP34: 209 Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers<br>
  fP35: 329 Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere<br>
  fP36: 69 Minimax density estimation for growing dimension<br>
  fP37: 540 Scalable Greedy Feature Selection via Weak Submodularity<br>
  fP38: 449 Information Projection and Approximate Inference for Structured Sparse Variables<br>
  fP39: 227 Dynamic Collaborative Filtering With Compound Poisson Factorization<br>
  fP40: 242 Information-theoretic limits of Bayesian network structure learning<br>
  fP41: 3 Conditions beyond treewidth for tightness of higher-order LP relaxations<br>
  fP42: 347 A Lower Bound on the Partition Function of Attractive Graphical Models in the Continuous Case<br>
  fP43: 531 Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models<br>
  fP44: 504 Sequential Multiple Hypothesis Testing with Type I Error Control<br>
  fP45: 22 Lower Bounds on Active Learning for Graphical Model Selection<br>
  fP46: 498 Learning from Conditional Distributions via Dual Embeddings<br>
  fP47: 13 Online Nonnegative Matrix Factorization with General Divergences<br>
  fP48: 161 Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines<br>
  fP49: 384 Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data<br>
  fP50: 417 Communication-Efficient Learning of Deep Networks from Decentralized Data<br>
  fP51: 520 Automated Inference with Adaptive Batches<br>
  fP52: 190 Bayesian Hybrid Matrix Factorisation for Data Integration<br>
  fP53: 192 Co-Occurring Directions Sketching for Approximate Matrix Multiply<br>
  fP54: 205 Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)<br>
  fP55: 442 Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications<br>
  fP56: 245 Markov Chain Truncation for Doubly-Intractable Inference<br>
  fP57: 101 Removing Phase Transitions from Gibbs Measures<br>
  fP58: 484 Distribution of Gaussian Process Arc Lengths<br>
  fP59: 76 Estimating Density Ridges by Direct Estimation of Density-Derivative-Ratios<br>
  fP60: 213 Minimax Approach to Variable Fidelity Data Interpolation<br>
  fP61: 132 Stochastic Rank-1 Bandits<br>
  fP62: 26 Sparse Accelerated Exponential Weights<br>
  fP63: 479 Efficient Online Multiclass Prediction on Graphs via Surrogate Losses<br>
  fP64: 124 Frank-Wolfe Algorithms for Saddle Point Problems<br>
  fP65: 167 Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot<br>
  fP66: 175 Decentralized Collaborative Learning of Personalized Models over Networks<br>
  fP67: 20 ASAGA: Asynchronous Parallel SAGA<br>
  fP68: 264 A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization<br>
  fP69: 282 A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe<br>
  fP70: 284 Faster Coordinate Descent via Adaptive Importance Sampling<br>
  fP71: 375 Tracking Objects with Higher Order Interactions via Delayed Column Generation<br>
  fP72: 399 Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage<br>
  fP73: 367 Optimistic Planning for the Stochastic Knapsack Problem<br>
  fP74: 410 Thompson Sampling for Linear-Quadratic Control Problems<br>
  fP75: 239 Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering<br>
  fP76: 302 Minimax-optimal semi-supervised regression on unknown manifolds<br>
  fP77: 2 Minimax Gaussian Classification & Clustering<br>
  fP78: 372 Identifying groups of strongly correlated variables through Smoothed Ordered Weighted L_1-norms<br>
  fP79: 494 Spectral Methods for Correlated Topic Models<br>
  fP80: 131 Quantifying the accuracy of approximate diffusions and Markov chains<br>
  fP81: 267 Hierarchically-partitioned Gaussian Process Approximation<br>
  fP82: 459 Linking Micro Event History to Macro Prediction in Point Process Models<br>
  fP83: 507 A Maximum Matching Algorithm for Basis Selection in Spectral Learning<br>
</font>
</div>
<br><br>

<b>7:15-9:00</b> <font color=red>Dinner Buffet, Panorama Ballroom<br><br></font>


<h2>22-Apr (Sat)</h2>

<b>7:30-9:00</b> Breakfast, Panorama Ballroom C, D & Terrace <br><br>
<font color=blue><b>8-10</b> Registration Desk <br><br></font>

<b>9:00-10:00</b> <font color=red>Invited Talk: Sanjoy Dasgupta. Panorama Ballroom A, B <br></font>
<b>Towards a Theory of Interactive Learning.</b>
<a href="#foo5" onclick="toggle_visibility('foo5');">See abstract.</a> <a href="sanjoy-talk.pdf">See slides</a>.
<div id="foo5" style="display:none;">
<i>"Interactive learning" refers to scenarios in which a learning agent (human or machine) engages with an information-bearing agent or system (for instance, a human expert) with the goal of efficiently arriving at a useful model. Examples include: active learning of classifiers; automated teaching systems; augmenting unsupervised learning with interactive post-editing; and so on. In particular, such interaction is a basic mechanism by which we can communicate our needs and preferences to the computers that play an increasing role in our lives.
It would be helpful to have unifying mathematical frameworks that can provide a basis for evaluating interactive schemes, and that supply generic interaction algorithms. I will describe one such mathematical framework, that covers a fairly broad range of situations, and illustrate how it yields algorithms for interactive hierarchical clustering and interactive topic modeling.<br>
<font color=blue><u>Bio</u>: Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD at UC Berkeley in 2000. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning. He is the author of a textbook, 'Algorithms', with Christos Papadimitriou and Umesh Vazirani. He was program co-chair for the Conference on Learning Theory (COLT) in 2009 and for the International Conference on Machine Learning (ICML) in 2013.</i>
</font></div>
<br><br>

<b>10:00-10:30</b> Coffee Break, Panorama Foyer<br><br>

<b>10:30-12:10</b> <u>Bayesian Methods</u>, Panorama Ballroom A, B<br>
<i>Session Chair: Rebecca Steorts</i><br>
  420 Signal-based Bayesian Seismic Monitoring<br>
  180 Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets<br>
  82 Near-optimal Bayesian Active Learning with Correlated and Noisy Tests<br>
  298 On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior<br><br>

<b>12:10-1:30</b> Lunch on your own <b>(note shorter lunch)</b> <br><br>

<b>1:30-3:10</b> <u>Large-scale learning</u>, Panorama Ballroom A, B <br>
<i>Session Chair: Pradeep Ravikumar</i><br>
  417 Communication-Efficient Learning of Deep Networks from Decentralized Data<br>
  520 Automated Inference with Adaptive Batches<br>
  224 Adaptive ADMM with Spectral Penalty Parameter Selection<br>
  372 Identifying groups of strongly correlated variables through Smoothed Ordered Weighted L_1-norms<br><br>

<b>3:10-3:40</b> Coffee Break Panorama Foyer <br><br>

<b>3:40-5:20</b> <u>Sketching</u>, Panorama Ballroom A, B <br>
<i>Session Chair: Anastasios (Tasos) Kyrillidis</i><br>
  384 Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data<br>
  399 Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage<br>
  192 Co-Occurring Directions Sketching for Approximate Matrix Multiply<br>
  117 Random Consensus Robust PCA<br><br>

</div>