Read Practical Statistics for Data Scientists 50 Essential Concepts Peter Bruce Andrew Bruce 9781491952962 Books

By Madge Garrett on Tuesday, May 28, 2019

Read Practical Statistics for Data Scientists 50 Essential Concepts Peter Bruce Andrew Bruce 9781491952962 Books





Product details

  • Paperback 318 pages
  • Publisher O'Reilly Media; 1 edition (May 28, 2017)
  • Language English
  • ISBN-10 1491952962




Practical Statistics for Data Scientists 50 Essential Concepts Peter Bruce Andrew Bruce 9781491952962 Books Reviews


  • A reasonable survey of core statistical methods, not super-clear, plus a slapdash review of a few machine-learning models, with very little explanation.

    Pros
    * Decent review of core concepts
    * Good coverage of importance of distinguishing between sample and population statistics
    * Better discussion of bootstrapping than I've seen anywhere else
    * Good ideas on dealing with non-normal data and avoiding the assumption that all data is normally distributed

    Cons
    * Assumes that you know R. Lots of code, no explanations of the code.
    * Inconsistent level of detail and depth. Detailed coverage of mean, range, quartile, but rampant hand-waving when you get to bagging and boosting
    * Many of the math explanations are unclear or incomplete. The authors make you do a lot of work to figure things out and you will need external resources
    * The last part of the book is a thin and purely practical survey of ML models. You don't get much understanding of how or why things work.
  • Excellent introductory text for a comprehensive overview of statistics! The github repository augments the content very well and provides added value for the statistical topics covered in the book. Both of the Bruce brothers are statistical gurus and this fact is evident in the writing, which is both informative and witty. Peter is the president of Statistics.com and is well-versed in providing statistical instruction to students of all ages and levels. He is also a proponent of resampling and one of the developers of the excellent Resampling Stats software package for Excel.

    It is true that the textbook does not provide in-depth coverage for all topics, but I don't think that was the intent of the authors. However, the text DOES provide an excellent introduction to topics relevant to students and data scientists. After reading the text and working through the examples, you will be equipped to further your knowledge in whichever topic you require for you data analysis task.

    Highly recommended!
  • I love this book as a reference. Clear, efficient but detailed explanations. It is not designed as a textbook but as a reference. When I wonder "what is that test used for again?" or "what was that formula?" this is the first thing I reach for. Sure, Google has become universal for that too, but I like having a single hard copy reference that I can get to know and that becomes a trustworthy old friend. This book is taking on that role for me.
  • First of all, this book is not for you if you want a deep and thorough explanation of statistical concepts. It serves a completely different purpose to familiarize a reader with high-level concepts; to enable them to continue their statistics education elsewhere.

    I found this book a very engaging read it sets itself apart from other books on statistics in clearly telling which concepts are not-so-relevant for the modern computerized explorative analysis toolset. Many concepts that are presented in classic books on the subjects are rooted in 20s and 30s where computing power wasn't available and researches resorted to various pre-calculated distributions and formulas to do their work. A modern data-scientist's approach would eschew some of the old ways and instead rely on randomization, resampling and computing power.

    This book not only tells what something is, but also why it is that way and if a concept is still relevant today.
    I can recommend this book if your statistics knowledge is spotty or ephemeral, it serves its purpose well and doesn't bog down the reader with (sometimes) unnecessary mathematical concepts to demonstrate an idea.

    Why the four stars
    1. Lack of examples in programming languages.
    2. Complete lack of exercises (at least 1-2 exercises are necessary).
    3. All scarce examples that are available are in R. No Python. (
  • This book is well written and packs a substantial amount of information into a small number of pages. It is best used to get a survey and overview of many of the facets of the domain of data science. This book will not teach you anything in enough depth to actually execute it well — it will teach you just enough to be dangerous and not realize when you've gone off the rails. I recommend it for managers who may never go into technical depth, for people considering whether or not they are interested in data science, or as a preview book to create a framework from which to hang more detailed understanding. Although this is an introductory book, it assumes you can already program in R. If you can't, either accept that you won't be able to follow the specifics of the examples, or read The Art of R Programming and/or R for Data Science.

    I dislike that the authors make a number of categorical statements of the form "Data Scientists do this" or "Data Scientists don't need that". I disagree with many of these assertions and I think they have taken a definition of "data science" which is narrower than the prevailing consensus in the industry.

    This book has some errors (see, for example, the confusion matrix on page 196) but overall the accuracy is above average relative to recent norms.

    As other reviewers have noted, the author's github repository for the book is currently empty. If that's important to you, check it under "andrewgbruce" on github and make sure it's been updated before you buy the book.