Comparison of machine learning textbooks

This page is a comparison of machine learning textbooks, especially at the so-called introductory level. It includes books that focus on presenting multiple learning methods, and excludes books that focus solely on e.g. reinforcement learning.

Comparison table

The page count excludes any appendixes.

Not sure what other columns would be useful. Level of mathematical rigor? The approach taken (e.g. probably approximately correct framework)? Topics covered? Code samples (e.g. code for plots provided, or code for implementations provided, and in which language)? How amenable the book is to self-study? Ultimately what I care about is how easily I can understand the book/how much "fit" I have with the book, but this is difficult to generalize to others (who have different backgrounds and preferences).

Title	Author	Length (pages)	Prerequisites	Recommendations
Machine Learning: A Probabilistic Perspective	Kevin P. Murphy	1008	"This book is suitable for upper-level undergraduate students and beginning graduate students in computer science, statistics, electrical engineering, econometrics, or any one else who has the appropriate mathematical background. Specifically, the reader is assumed to already be familiar with basic multivariate calculus, probability, linear algebra, and computer programming. Prior exposure to statistics is helpful but not necessary."	^[1]
Introduction to Machine Learning	Alex Smola and S.V.N. Vishwanathan	196	?
Understanding Machine Learning: From Theory to Algorithms	Shai Shalev-Shwartz and Shai Ben-David	368	"We made an attempt to keep the book as self-contained as possible. However, the reader is assumed to be comfortable with basic notions of probability, linear algebra, analysis, and algorithms. The first three parts of the book are intended for first year graduate students in computer science, engineering, mathematics, or statistics. It can also be accessible to undergraduate students with the adequate background. The more advanced chapters can be used by researchers intending to gather a deeper theoretical understanding."	^[2]^[3]
Pattern Recognition and Machine Learning	Christopher M. Bishop	676	"It is aimed at advanced undergraduates or first year PhD students, as well as researchers and practitioners, and assumes no previous knowledge of pattern recognition or machine learning concepts. Knowledge of multivariate calculus and basic linear algebra is required, and some familiarity with probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory."	^[4]^[5]^[1], ^[6] (complaints)
Introduction to Machine Leaning (second edition)	Ethem Alpaydin	516	"This is an introductory textbook, intended for senior undergraduate and graduate-level courses on machine learning, as well as engineers working in the industry who are interested in the application of these methods. The prerequisites are courses on computer programming, probability, calculus, and linear algebra. The aim is to have all learning algorithms sufficiently explained so it will be a small step from the equations given in the book to a computer program. For some cases, pseudocode of algorithms are also included to make this task easier."
The Elements of Statistical Learning: Data Mining, Inference, and Prediction (second edition)	Trevor Hastie, Robert Tibshirani, and Jerome Friedman	698	"This book is designed for researchers and students in a broad variety of fields: statistics, artificial intelligence, engineering, finance and others. We expect that the reader will have had at least one elementary course in statistics, covering basic topics including linear regression."	^[7]^[5]^[3]^[8]^[9]
An Introduction to Statistical Learning with Applications in R	Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani	418	"One of the first books in this area—The Elements of Statistical Learning (ESL) (Hastie, Tibshirani, and Friedman)—was published in 2001, with a second edition in 2009. ESL has become a popular text not only in statistics but also in related fields. One of the reasons for ESL's popularity is its relatively accessible style. But ESL is intended for individuals with advanced training in the mathematical sciences. An Introduction to Statistical Learning (ISL) arose from the perceived need for a broader and less technical treatment of these topics. In this new book, we cover many of the same topics as ESL, but we concentrate more on the applications of the methods and less on the mathematical details. We have created labs illustrating how to implement each of the statistical learning methods using the popular statistical software package R. These labs provide the reader with valuable hands-on experience. This book is appropriate for advanced undergraduates or master's students in statistics or related quantitative fields or for individuals in other disciplines who wish to use statistical learning tools to analyze their data."
Foundation of Machine Learning	Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar	340	"The book is intended for students and researchers in machine learning, statistics and other related areas. It can be used as a textbook for both graduate and advanced undergraduate classes in machine learning or as a reference text for a research seminar. […] The reader is assumed to be familiar with basic concepts in linear algebra, probability, and analysis of algorithms. However, to further help him, we present in the appendix a concise linear algebra and a probability review, and a short introduction to convex optimization. We have also collected in the appendix a number of useful tools for concentration bounds used in this book."	^[10]
Machine Learning: A Bayesian and Optimization Perspective	Sergios Theodoridis	1012	"The book addresses the needs of advanced graduate, postgraduate, and research students as well as of practicing scientists and engineers whose interests lie beyond black-box solutions."
Bayesian Reasoning and Machine Learning	David Barber	618	"The book is designed to appeal to students with only a modest mathematical background in undergraduate calculus and linear algebra. No formal computer science or statistical background is required to follow the book, although a basic familiarity with probability, calculus and linear algebra would be useful. The book should appeal to students from a variety of backgrounds, including Computer Science, Engineering, applied Statistics, Physics, and Bioinformatics that wish to gain an entry to probabilistic approaches in Machine Learning. In order to engage with students, the book introduces fundamental concepts in inference using only minimal reference to algebra and calculus. More mathematical techniques are postponed until as and when required, always with the concept as primary and the mathematics secondary."
Information Theory, Inference, and Learning Algorithms	David J.C. MacKay	596	"This book is aimed at senior undergraduates and graduate students in Engineering, Science, Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear algebra as taught in a first- or second-year undergraduate course on mathematics for scientists and engineers."
A Probabilistic Theory of Pattern Recognition	Luc Devroye, László Györfi, and Gábor Lugosi	574	?
Learning From Data: A Short Course	Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin	182	?
Machine Learning: The Art and Science of Algorithms that Make Sense of Data	Peter Flach	366	?
Machine Learning	Tom M. Mitchell	390	"Because of the interdisciplinary nature of the material, this book makes few assumptions about the background of the reader. Instead, it introduces basic concepts from statistics, artificial intelligence, information theory, and other disciplines as the need arises, focusing on just those concepts most relevant to machine learning. The book is intended for both undergraduate and graduate students in fields such as computer science, engineering, statistics, and the social sciences, and as a reference for software professionals and practitioners. Two principles that guided the writing of the book were that it should be accessible to undergraduate students and that it should contain the material I would want my own Ph.D. students to learn before beginning their doctoral research in machine learning."	^[8]
Statistical Learning Theory^[11] (course notes for CS229T/Stats231 at Stanford)	Percy Liang	210	Understanding of machine learning, linear algebra, and probability. Knowledge of convex optimization also helpful. ^[12]	^[13]

References

External links

[2pk137-1] 1.0 ^1.1 https://www.reddit.com/r/MachineLearning/comments/2pk137/favorite_machine_learning_books/cmxeqk5/

[2] ttps://intelligence.org/research-guide/

[hn-10591458-3] 3.0 ^3.1 https://news.ycombinator.com/item?id=10591458

[4] ttp://lesswrong.com/lw/3gu/the_best_textbooks_on_every_subject/3cp0

[3u9sai-5] 5.0 ^5.1 https://www.reddit.com/r/MachineLearning/comments/3u9sai/what_books_are_best_as_an_introduction_to_machine/cxd6xto/

[6] ttps://www.reddit.com/r/MachineLearning/comments/3qutk7/must_read_books_for_beginners_on_machine_learning/cwijwu9/

[7] ttps://www.reddit.com/r/MachineLearning/comments/5z8110/d_a_super_harsh_guide_to_machine_learning/

[measuringmeasures-8] 8.0 ^8.1 https://web.archive.org/web/20100118082135/https://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html

[9] ttp://lesswrong.com/lw/8fd/transcription_of_eliezers_january_2010_video_qa/

[10] ttps://www.reddit.com/r/MachineLearning/comments/1jeawf/machine_learning_books/cbdwhw9/

[11] ttps://web.stanford.edu/class/cs229t/Lectures/percy-notes.pdf

[12] ttps://web.stanford.edu/class/cs229t/syllabus.html

[13] ttps://agentfoundations.org/item?id=134

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Comparison table

See also

References

External links