Misha Bilenko

Research | Personal

Update: I work at Meta on making large models useful and delightful to users.

Before that, I was at Google DeepMind working on Gemini assistant and shipping large models in products.

Before that, I was (back) at Microsoft working on deploying, applying and creating large ML models. I worked on Phi-3 models, Azure AI Services (Speech, Vision, Responsible AI, FormRecognizer, Language and Machine Translation), GitHub Copilot, Bing Chat, Azure OpenAI - as well as a few other Microsoft products, and with many amazing people.

Before that, I lived in Moscow and led the Machine Intelligence and Research (MIR) division in Yandex. We had all sorts of serious fun, including but not limited to: intelligent assistants (go Alice!), smart speakers (go Station!), speech recognition and synthesis, computer vision, machine translation, machine learning algorithms and platforms, and research in all of the above (go numerous Yandex products with all of the above!). Watching Speech, MT and Assistants move from statistical ML stacks to DNNs and ship lots of features and quality gains was really cool, launching Alice on all surfaces and several Station models was even better.

Before that, I lived in Seattle area and led the Machine Learning Algorithms team in Cloud+Enterprise division at Microsoft. Our ML tools were used in many products, from Microsoft AzureML, to SQL Server, to numerous others across all divisions of the company. We collaborated extensively with MSR and many applied ML/Data Science groups.

Before that, I was a researcher in the Machine Learning Department at Microsoft Research. I enjoyed building ML systems and tools, and working on large-scale prediction problems on behavioral, transactional and textual data. Specific applications on which I focused were high-throughput ML, click probability prediction, relevant ad selection, constructing user profiles for targeting, and improving search relevance with logs of user behavior. Earlier, I worked on semi-supervised clustering and record linkage (a.k.a. entity resolution a.k.a. de-duplication a.k.a. identity uncertainty a.k.a. co-reference resolution...).

Before that, I completed my Ph.D. in the Department of Computer Science at the University of Texas at Austin in 2006, where I was a member of the Machine Learning Group. Along the way, I spent the summer of 2002 at IBM T.J. Watson Research Center, and the summer/fall of 2004 at Google.

Besides all kinds of ML applications, I'm generally interested in adaptive similarity(distance, kernel, divergence, embedding, ...) functions, implementing learning algorithms on parallel/distributed platforms, and tools for machine learning practitioners. Evaluation methodology/metrics for all of these, offline and online, is always fun to understand and improve.

Research

Large-scale Learning
- Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
  Abdin et al.
  [arXiv] [Azure AI Model Catalog] [Hugging Face]
- Time Adapative Sketches (Ada-Sketches) for Summarizing Data Streams
  Anshumali Shrivastava, Christian Konig, Mikhail Bilenko. In Proceedings of ACM SIGMOD 2016.
  [PDF]
- Scaling Up Stochastic Dual Coordinate Ascent
  Kenneth Tran, Saghar Hosseini, Lin Xiao, Thomas Finley, Mikhail Bilenko. In Proceedings of KDD-2015.
  [PDF]
- Crypto-nets: Neural networks over encrypted data
  Pengtao Xie, Mikhail Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin Lauter, Michael Naehrig
  [PDF]
- Lazy Paired Hyper-Parameter Tuning
  Alice X. Zheng and Mikhail Bilenko. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI-2013), Beijing, China, August 2013.
  [PDF] [Slides (PPT)]
- Fast Prediction of New Feature Utility
  Hoyt Koepke and Mikhail Bilenko. In Proceedings of the 29th International Conference on Machine Learning (ICML-2012), Edinburgh, Scotland, June 2012.
  [PDF] [Slides (PPT)]
- Scaling Up Machine Learning. Edited by Ron Bekkerman, Mikhail Bilenko, and John Langford. Cambridge University Press, 2012.
- NIPS 2011 Workshop on Big Learning
- Predictive Client-side Profiles for Personalized Advertising
  Mikhail Bilenko and Matthew Richardson. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2011), San Diego, August 2011.
  [PDF] [Slides (PPT)]
- SIGIR-2009 Workshop on Information Retrieval and Advertising
- Catching the Drift: Learning Broad Matches from Clickthrough Data
  Sonal Gupta, Mikhail Bilenko, and Matthew Richardson. Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2009), Paris, June 2009.
  [PDF] [Slides (PPT)]
- Enhancing Web Search by Promoting Multiple Search Engine Usage
  Ryen W. White, Matthew Richardson, Mikhail Bilenko, and Allison Heath. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2008), Singapore, July 2008.
  [PDF]
- Talking the Talk vs. Walking the Walk: Salience of Information Needs in Querying vs. Browsing
  Mikhail Bilenko, Ryen W. White, Matthew Richardson, and G. Craig Murray. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2008), Singapore, July 2008.
  [PDF]
- Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites From User Activity
  Mikhail Bilenko and Ryen W. White. In Proceedings of the 17th International World Wide Web Conference (WWW-2008), Beijing, April 2008.
  [PDF]
- Leveraging Popular Destinations to Enhance Web Search Interaction
  Ryen W. White, Mikhail Bilenko, and Silviu Cucerzan. ACM Transactions on the Web (TWEB), 2(3), pp.1-30, 2008.
  [PDF]
  - Earlier version: Studying the Use of Popular Destinations to Enhance Web Search Interaction
    Ryen W. White, Mikhail Bilenko, and Silviu Cucerzan. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2007), Amsterdam, July 2007.
    (Winner of Best Paper Award)
    [PDF]
Learnable similarity functions and their applications in information integration (e.g., record linkage/identity uncertainty) and text mining
- RIDDLE: Repository of Information on Duplicate Detection, Record Linkage, and Identity Uncertainty
- Adaptive Blocking: Learning to Scale Up Record Linkage and Clustering
  Mikhail Bilenko, Beena Kamath, and Raymond J. Mooney. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM-2006), pp.87-96, Hong Kong, December 2006.
  [PDF]
- Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
  Mikhail Bilenko, Sugato Basu, and Mehran Sahami. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM-2005), pp.58-65, Houston, TX, November 2005.
  [PDF]
- Adaptive Duplicate Detection Using Learnable String Similarity Measures
  Mikhail Bilenko and Raymond J. Mooney. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp.39-48, Washington, DC, August 2003.
  [PDF]
- On Evaluation and Training-Set Construction for Duplicate Detection
  Mikhail Bilenko and Raymond J. Mooney. In Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, pp.7-12, Washington, DC, August 2003.
  [PDF]
Semi-supervised clustering
- Probabilistic Semi-Supervised Clustering with Constraints
  Sugato Basu, Mikhail Bilenko, Arindam Banerjee, and Raymond J. Mooney. In Semi-Supervised Learning, O. Chapelle, B. Sch�lkopf, and A. Zien (eds.), MIT Press, 2006.
  Note: this chapter summarizes the KDD and ICML papers below
  [PDF]
- A Probabilistic Framework for Semi-Supervised Clustering
  Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp.59-68, Seattle, WA, August 2004.
  (Winner of Best Research Paper Award)
  [PDF]
- Integrating Constraints and Metric Learning in Semi-Supervised Clustering
  Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney. In Proceedings of the 21st International Conference on Machine Learning (ICML-2004), pp.81-88, Banff, Canada, July 2004.
  [PDF]
- A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
  Mikhail Bilenko and Sugato Basu. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), pp.17-22, Banff, Canada, July 2004.
  [PDF]
Indirect learning in information integration (record linkage, information extraction), text classification, and clustering
- Two Approaches to Handling Noisy Variation in Text Mining
  Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney. In Proceedings of the ICML-2002 Workshop on Text Learning (TextML'2002), pp.18-27, Sydney, Australia, July 2002.
  [PDF]

Personal

In my leisure time I enjoy applying hill-climbing search and gradient descent algorithms to real-world domains, which are almost as cool as the cool stuff that my sister does.
Update: all outdoorsy hills-climbing searches and gradient descents are of the kid-friendly kind now. Much smaller step sizes and more regularized, still very fun!