Preface xixChapter 1 The Bazaar of Storytellers 1Data Science: The Sexiest Job in the 21st Century 4Storytelling at Google and Walmart 6Getting Started with Data Science 8Do We Need Another Book on Analytics? 8Repeat, Repeat, Repeat, and Simplify 10Chapters' Structure and Features 10Analytics Software Used 12What Makes Someone a Data Scientist? 12Existential Angst of a Data Scientist 15Data Scientists: Rarer Than Unicorns 16Beyond the Big Data Hype 17Big Data: Beyond Cheerleading 18Big Data Hubris 19Leading by Miles 20Predicting Pregnancies, Missing Abortions 20What's Beyond This Book? 21Summary 23Endnotes 24Chapter 2 Data in the 24/7 Connected World 29The Liberated Data: The Open Data 30The Caged Data 30Big Data Is Big News 31It's Not the Size of Big Data; It's What You Do with It 33Free Data as in Free Lunch 34FRED 34Quandl 38U.S. Census Bureau and Other National Statistical Agencies 38Search-Based Internet Data 39Google Trends 40Google Correlate 42Survey Data 44PEW Surveys 44ICPSR 45Summary 45Endnotes 46Chapter 3 The Deliverable 49The Final Deliverable 52What Is the Research Question? 53What Answers Are Needed? 54How Have Others Researched the Same Question in the Past? 54What Information Do You Need to Answer the Question? 58What Analytical Techniques/Methods Do You Need? 58The Narrative 59The Report Structure 60Have You Done Your Job as a Writer? 62Building Narratives with Data 62 "Big Data, Big Analytics, Big Opportunity" 63Urban Transport and Housing Challenges 68Human Development in South Asia 77The Big Move 82Summary 95Endnotes 96Chapter 4 Serving Tables 992014: The Year of Soccer and Brazil 100Using Percentages Is Better Than Using Raw Numbers 104Data Cleaning 106Weighted Data 106Cross Tabulations 109Going Beyond the Basics in Tables 113Seeing Whether Beauty Pays 115Data Set 117What Determines Teaching Evaluations? 118Does Beauty Affect Teaching Evaluations? 124Putting It All on (in) a Table 125Generating Output with Stata 129Summary Statistics Using Built-In Stata 130Using Descriptive Statistics 130Weighted Statistics 134Correlation Matrix 134Reproducing the Results for the Hamermesh and Parker Paper 135Statistical Analysis Using Custom Tables 136Summary 137Endnotes 139Chapter 5 Graphic Details 141Telling Stories with Figures 142Data Types 144Teaching Ratings 144The Congested Lives in Big Cities 168Summary 185Endnotes 185Chapter 6 Hypothetically Speaking 187Random Numbers and Probability Distributions 188Casino Royale: Roll the Dice 190Normal Distribution 194The Student Who Taught Everyone Else 195Statistical Distributions in Action 196Z-Transformation 198Probability of Getting a High or Low Course Evaluation 199Probabilities with Standard Normal Table 201Hypothetically Yours 205Consistently Better or Happenstance 205Mean and Not So Mean Differences 206Handling Rejections 207The Mean and Kind Differences 211Comparing a Sample Mean When the Population SD Is Known 211Left Tail Between the Legs 214Comparing Means with Unknown Population SD 217Comparing Two Means with Unequal Variances 219Comparing Two Means with Equal Variances 223Worked-Out Examples of Hypothesis Testing 226Best Buy-Apple Store Comparison 226Assuming Equal Variances 227Exercises for Comparison of Means 228Regression for Hypothesis Testing 228Analysis of Variance 231Significantly Correlated 232Summary 233Endnotes 234Chapter 7 Why Tall Parents Don't Have Even Taller Children 235The Department of Obvious Conclusions 235Why Regress? 236Introducing Regression Models 238All Else Being Equal 239Holding Other Factors Constant 242Spuriously Correlated 244A Step-By-Step Approach to Regression 244Learning to Speak Regression 247The Math Behind Regression 248Ordinary Least Squares Method 250Regression in Action 259This Just In: Bigger Homes Sell for More 260Does Beauty Pay? Ask the Students 272Survey Data, Weights, and Independence of Observations 276What Determines Household Spending on Alcohol and Food 279What Influences Household Spending on Food? 285Advanced Topics 289Homoskedasticity 289Multicollinearity 293Summary 296Endnotes 296Chapter 8 To Be or Not to Be 299To Smoke or Not to Smoke: That Is the Question 300Binary Outcomes 301Binary Dependent Variables 301Let's Question the Decision to Smoke or Not 303Smoking Data Set 304Exploratory Data Analysis 305What Makes People Smoke: Asking Regression for Answers 307Ordinary Least Squares Regression 307Interpreting Models at the Margins 310The Logit Model 311Interpreting Odds in a Logit Model 315Probit Model 321Interpreting the Probit Model 324Using Zelig for Estimation and Post-Estimation Strategies 329Estimating Logit Models for Grouped Data 334Using SPSS to Explore the Smoking Data Set 338Regression Analysis in SPSS 341Estimating Logit and Probit Models in SPSS 343Summary 346Endnotes 347Chapter 9 Categorically Speaking About Categorical Data 349What Is Categorical Data? 351Analyzing Categorical Data 352Econometric Models of Binomial Data 354Estimation of Binary Logit Models 355Odds Ratio 356Log of Odds Ratio 357Interpreting Binary Logit Models 357Statistical Inference of Binary Logit Models 362How I Met Your Mother? Analyzing Survey Data 363A Blind Date with the Pew Online Dating Data Set 365Demographics of Affection 365High-Techies 368Romancing the Internet 368Dating Models 371Multinomial Logit Models 378Interpreting Multinomial Logit Models 379Choosing an Online Dating Service 380Pew Phone Type Model 382Why Some Women Work Full-Time and Others Don't 389Conditional Logit Models 398Random Utility Model 400Independence From Irrelevant Alternatives 404Interpretation of Conditional Logit Models 405Estimating Logit Models in SPSS 410Summary 411Endnotes 413Chapter 10 Spatial Data Analytics 415Fundamentals of GIS 417GIS Platforms 418Freeware GIS 420GIS Data Structure 420GIS Applications in Business Research 420Retail Research 421Hospitality and Tourism Research 422Lifestyle Data: Consumer Health Profiling 423Competitor Location Analysis 423Market Segmentation 423Spatial Analysis of Urban Challenges 424The Hard Truths About Public Transit in North America 424Toronto Is a City Divided into the Haves, Will Haves, and Have Nots 429Income Disparities in Urban Canada 434Where Is Toronto's Missing Middle Class? It Has Suburbanized Out of Toronto 435Adding Spatial Analytics to Data Science 444Race and Space in Chicago 447Developing Research Questions 448Race, Space, and Poverty 450Race, Space, and Commuting 454Regression with Spatial Lags 457Summary 460Endnotes 461Chapter 11 Doing Serious Time with Time Series 463Introducing Time Series Data and How to Visualize It 464How Is Time Series Data Different? 468Starting with Basic Regression Models 471What Is Wrong with Using OLS Models for Time Series Data? 473Newey-West Standard Errors 473Regressing Prices with Robust Standard Errors 474Time Series Econometrics 478Stationary Time Series 479Autocorrelation Function (ACF) 479Partial Autocorrelation Function (PCF) 481White Noise Tests 483Augmented Dickey Fuller Test 483Econometric Models for Time Series Data 484Correlation Diagnostics 485Invertible Time Series and Lag Operators 485The ARMA Model 487ARIMA Models 487Distributed Lag and VAR Models 488Applying Time Series Tools to Housing Construction 492Macro-Economic and Socio-Demographic Variables Influencing Housing Starts 498Estimating Time Series Models to Forecast New Housing Construction 500OLS Models 501Distributed Lag Model 505Out-of-Sample Forecasting with Vector Autoregressive Models 508ARIMA Models 510Summary 522Endnotes 524Chapter 12 Data Mining for Gold 525Can Cheating on Your Spouse Kill You? 526Are Cheating Men Alpha Males? 526UnFair Comments: New Evidence Critiques Fair's Research 527Data Mining: An Introduction 527Seven Steps Down the Data Mine 529Establishing Data Mining Goals 529Selecting Data 529Preprocessing Data 530Transforming Data 530Storing Data 531Mining Data 531Evaluating Mining Results 531Rattle Your Data 531What Does Religiosity Have to Do with Extramarital Affairs? 533The Principal Components of an Extramarital Affair 539Will It Rain Tomorrow? Using PCA For Weather Forecasting 540Do Men Have More Affairs Than Females? 542Two Kinds of People: Those Who Have Affairs, and Those Who Don't 542Models to Mine Data with Rattle 544Summary 550Endnotes 550Index 553
Murtaza Haider, Ph.D., is an Associate Professor at the Ted Rogers School of Management, Ryerson University, and the Director of a consulting firm Regionomics Inc. He is also a visiting research fellow at the Munk School of Global Affairs at the University of Toronto (2014-15). In addition, he is a senior research affiliate with the Canadian Network for Research on Terrorism, Security, and Society, and an adjunct professor of engineering at McGill University.Haider specializes in applying analytics and statistical methods to find solutions for socioeconomic challenges. His research interests include analytics; data science; housing market dynamics; infrastructure, transportation, and urban planning; and human development in North America and South Asia. He is an avid blogger/data journalist and writes weekly for the Dawn newspaper and occasionally for the Huffington Post.Haider holds a Masters in transport engineering and planning and a Ph.D. in Urban Systems Analysis from the University of Toronto.