Comparison of machine learning textbooks
This page is a comparison of machine learning textbooks, especially at the socalled introductory level. It includes books that focus on presenting multiple learning methods, and excludes books that focus solely on e.g. reinforcement learning.
Contents
Comparison table
The page count excludes any appendixes.
Not sure what other columns would be useful. Level of mathematical rigor? The approach taken (e.g. probably approximately correct framework)? Topics covered? Code samples (e.g. code for plots provided, or code for implementations provided, and in which language)? How amenable the book is to selfstudy? Ultimately what I care about is how easily I can understand the book/how much "fit" I have with the book, but this is difficult to generalize to others (who have different backgrounds and preferences).
Title  Author  Length (pages)  Prerequisites  Recommendations 

Machine Learning: A Probabilistic Perspective  Kevin P. Murphy  1008  "This book is suitable for upperlevel undergraduate students and beginning graduate students in computer science, statistics, electrical engineering, econometrics, or any one else who has the appropriate mathematical background. Specifically, the reader is assumed to already be familiar with basic multivariate calculus, probability, linear algebra, and computer programming. Prior exposure to statistics is helpful but not necessary."  ^{[1]} 
Introduction to Machine Learning  Alex Smola and S.V.N. Vishwanathan  196  ?  
Understanding Machine Learning: From Theory to Algorithms  Shai ShalevShwartz and Shai BenDavid  368  "We made an attempt to keep the book as selfcontained as possible. However, the reader is assumed to be comfortable with basic notions of probability, linear algebra, analysis, and algorithms. The first three parts of the book are intended for first year graduate students in computer science, engineering, mathematics, or statistics. It can also be accessible to undergraduate students with the adequate background. The more advanced chapters can be used by researchers intending to gather a deeper theoretical understanding."  ^{[2]}^{[3]} 
Pattern Recognition and Machine Learning  Christopher M. Bishop  676  "It is aimed at advanced undergraduates or first year PhD students, as well as researchers and practitioners, and assumes no previous knowledge of pattern recognition or machine learning concepts. Knowledge of multivariate calculus and basic linear algebra is required, and some familiarity with probabilities would be helpful though not essential as the book includes a selfcontained introduction to basic probability theory."  ^{[4]}^{[5]}^{[1]},
^{[6]} (complaints) 
Introduction to Machine Leaning (second edition)  Ethem Alpaydin  516  "This is an introductory textbook, intended for senior undergraduate and graduatelevel courses on machine learning, as well as engineers working in the industry who are interested in the application of these methods. The prerequisites are courses on computer programming, probability, calculus, and linear algebra. The aim is to have all learning algorithms sufficiently explained so it will be a small step from the equations given in the book to a computer program. For some cases, pseudocode of algorithms are also included to make this task easier."  
The Elements of Statistical Learning: Data Mining, Inference, and Prediction (second edition)  Trevor Hastie, Robert Tibshirani, and Jerome Friedman  698  "This book is designed for researchers and students in a broad variety of fields: statistics, artificial intelligence, engineering, finance and others. We expect that the reader will have had at least one elementary course in statistics, covering basic topics including linear regression."  ^{[7]}^{[5]}^{[3]}^{[8]}^{[9]} 
An Introduction to Statistical Learning with Applications in R  Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani  418  "One of the first books in this area—The Elements of Statistical Learning (ESL) (Hastie, Tibshirani, and Friedman)—was published in 2001, with a second edition in 2009. ESL has become a popular text not only in statistics but also in related fields. One of the reasons for ESL's popularity is its relatively accessible style. But ESL is intended for individuals with advanced training in the mathematical sciences. An Introduction to Statistical Learning (ISL) arose from the perceived need for a broader and less technical treatment of these topics. In this new book, we cover many of the same topics as ESL, but we concentrate more on the applications of the methods and less on the mathematical details. We have created labs illustrating how to implement each of the statistical learning methods using the popular statistical software package R. These labs provide the reader with valuable handson experience. This book is appropriate for advanced undergraduates or master's students in statistics or related quantitative fields or for individuals in other disciplines who wish to use statistical learning tools to analyze their data." 

Foundation of Machine Learning  Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar  340  "The book is intended for students and researchers in machine learning, statistics and other related areas. It can be used as a textbook for both graduate and advanced undergraduate classes in machine learning or as a reference text for a research seminar. […] The reader is assumed to be familiar with basic concepts in linear algebra, probability, and analysis of algorithms. However, to further help him, we present in the appendix a concise linear algebra and a probability review, and a short introduction to convex optimization. We have also collected in the appendix a number of useful tools for concentration bounds used in this book."  ^{[10]} 
Machine Learning: A Bayesian and Optimization Perspective  Sergios Theodoridis  1012  "The book addresses the needs of advanced graduate, postgraduate, and research students as well as of practicing scientists and engineers whose interests lie beyond blackbox solutions."  
Bayesian Reasoning and Machine Learning  David Barber  618  "The book is designed to appeal to students with only a modest mathematical background in undergraduate calculus and linear algebra. No formal computer science or statistical background is required to follow the book, although a basic familiarity with probability, calculus and linear algebra would be useful. The book should appeal to students from a variety of backgrounds, including Computer Science, Engineering, applied Statistics, Physics, and Bioinformatics that wish to gain an entry to probabilistic approaches in Machine Learning. In order to engage with students, the book introduces fundamental concepts in inference using only minimal reference to algebra and calculus. More mathematical techniques are postponed until as and when required, always with the concept as primary and the mathematics secondary."  
Information Theory, Inference, and Learning Algorithms  David J.C. MacKay  596  "This book is aimed at senior undergraduates and graduate students in Engineering, Science, Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear algebra as taught in a first or secondyear undergraduate course on mathematics for scientists and engineers."  
A Probabilistic Theory of Pattern Recognition  Luc Devroye, László Györfi, and Gábor Lugosi  574  ?  
Learning From Data: A Short Course  Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin  182  ?  
Machine Learning: The Art and Science of Algorithms that Make Sense of Data  Peter Flach  366  ?  
Machine Learning  Tom M. Mitchell  390  "Because of the interdisciplinary nature of the material, this book makes few assumptions about the background of the reader. Instead, it introduces basic concepts from statistics, artificial intelligence, information theory, and other disciplines as the need arises, focusing on just those concepts most relevant to machine learning. The book is intended for both undergraduate and graduate students in fields such as computer science, engineering, statistics, and the social sciences, and as a reference for software professionals and practitioners. Two principles that guided the writing of the book were that it should be accessible to undergraduate students and that it should contain the material I would want my own Ph.D. students to learn before beginning their doctoral research in machine learning."  ^{[8]} 
Statistical Learning Theory^{[11]} (course notes for CS229T/Stats231 at Stanford)  Percy Liang  210  Understanding of machine learning, linear algebra, and probability. Knowledge of convex optimization also helpful. ^{[12]}  ^{[13]} 
See also
References
 ↑ ^{1.0} ^{1.1} https://www.reddit.com/r/MachineLearning/comments/2pk137/favorite_machine_learning_books/cmxeqk5/
 ↑ https://intelligence.org/researchguide/
 ↑ ^{3.0} ^{3.1} https://news.ycombinator.com/item?id=10591458
 ↑ http://lesswrong.com/lw/3gu/the_best_textbooks_on_every_subject/3cp0
 ↑ ^{5.0} ^{5.1} https://www.reddit.com/r/MachineLearning/comments/3u9sai/what_books_are_best_as_an_introduction_to_machine/cxd6xto/
 ↑ https://www.reddit.com/r/MachineLearning/comments/3qutk7/must_read_books_for_beginners_on_machine_learning/cwijwu9/
 ↑ https://www.reddit.com/r/MachineLearning/comments/5z8110/d_a_super_harsh_guide_to_machine_learning/
 ↑ ^{8.0} ^{8.1} https://web.archive.org/web/20100118082135/https://measuringmeasures.blogspot.com/2010/01/learningaboutstatisticallearning.html
 ↑ http://lesswrong.com/lw/8fd/transcription_of_eliezers_january_2010_video_qa/
 ↑ https://www.reddit.com/r/MachineLearning/comments/1jeawf/machine_learning_books/cbdwhw9/
 ↑ https://web.stanford.edu/class/cs229t/Lectures/percynotes.pdf
 ↑ https://web.stanford.edu/class/cs229t/syllabus.html
 ↑ https://agentfoundations.org/item?id=134