Follow
Zhaohan Daniel Guo
Zhaohan Daniel Guo
DeepMind
Verified email at google.com - Homepage
Title
Cited by
Cited by
Year
Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
69852020
Agent57: Outperforming the atari human benchmark
AP Badia, B Piot, S Kapturowski, P Sprechmann, A Vitvitskyi, ZD Guo, ...
International conference on machine learning, 507-517, 2020
6982020
koray kavukcuoglu, Remi Munos, and Michal Valko. Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
5082020
Never give up: Learning directed exploration strategies
AP Badia, P Sprechmann, A Vitvitskyi, D Guo, B Piot, S Kapturowski, ...
arXiv preprint arXiv:2002.06038, 2020
3592020
A general theoretical paradigm to understand learning from human preferences
MG Azar, ZD Guo, B Piot, R Munos, M Rowland, M Valko, D Calandriello
International Conference on Artificial Intelligence and Statistics, 4447-4455, 2024
3042024
Joint semantic utterance classification and slot filling with recursive neural networks
D Guo, G Tur, W Yih, G Zweig
2014 IEEE Spoken Language Technology Workshop (SLT), 554-559, 2014
2502014
Bootstrap latent-predictive representations for multitask reinforcement learning
ZD Guo, BA Pires, B Piot, JB Grill, F Altché, R Munos, MG Azar
International Conference on Machine Learning, 3875-3886, 2020
1602020
Neural predictive belief representations
ZD Guo, MG Azar, B Piot, BA Pires, R Munos
arXiv preprint arXiv:1811.06407, 2018
932018
Nash learning from human feedback
R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ...
arXiv preprint arXiv:2312.00886, 2023
782023
A pac rl algorithm for episodic pomdps
ZD Guo, S Doroudi, E Brunskill
Artificial Intelligence and Statistics, 510-518, 2016
712016
Byol-explore: Exploration by bootstrapped prediction
Z Guo, S Thakoor, M Pîslar, B Avila Pires, F Altché, C Tallec, A Saade, ...
Advances in neural information processing systems 35, 31855-31870, 2022
682022
Generalized preference optimization: A unified approach to offline alignment
Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ...
arXiv preprint arXiv:2402.05749, 2024
522024
Using options and covariance testing for long horizon off-policy policy evaluation
Z Guo, PS Thomas, E Brunskill
Advances in Neural Information Processing Systems 30, 2017
522017
Bootstrap your own latent: A new approach to self-supervised learning. arXiv
JB Grill, F Strub, F Altché, C Tallec, PH Richemond, E Buchatskaya, ...
arXiv preprint arXiv:2006.07733, 2020
442020
Geometric entropic exploration
ZD Guo, MG Azar, A Saade, S Thakoor, B Piot, BA Pires, M Valko, ...
arXiv preprint arXiv:2101.02055, 2021
402021
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 33632-33656, 2023
322023
Concurrent pac rl
Z Guo, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
312015
Understanding the performance gap between online and offline alignment algorithms
Y Tang, DZ Guo, Z Zheng, D Calandriello, Y Cao, E Tarassov, R Munos, ...
arXiv preprint arXiv:2405.08448, 2024
292024
Pac continuous state online multitask reinforcement learning with identification
Y Liu, Z Guo, E Brunskill
Proceedings of the 2016 International Conference on Autonomous Agents …, 2016
222016
Charline Le Lan, Michal Valko, Tianqi Liu, et al. Human alignment of large language models through online preference optimisation
D Calandriello, D Guo, R Munos, M Rowland, Y Tang, BA Pires, ...
arXiv preprint arXiv:2403.08635, 2024
182024
The system can't perform the operation now. Try again later.
Articles 1–20