{"id":12405,"date":"2025-02-14T12:37:33","date_gmt":"2025-02-14T12:37:33","guid":{"rendered":"https:\/\/metaschool.so\/articles\/?p=12405"},"modified":"2025-02-14T12:42:16","modified_gmt":"2025-02-14T12:42:16","slug":"beginners-guide-to-linear-regression","status":"publish","type":"post","link":"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/","title":{"rendered":"Beginner&#8217;s Guide to Linear Regression"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_56_1 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title \" >Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#What_is_Linear_Regression\" title=\"What is Linear Regression?\">What is Linear Regression?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Breaking_Down_the_Terms\" title=\"Breaking Down the Terms\">Breaking Down the Terms<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#How_Does_Linear_Regression_Work\" title=\"How Does Linear Regression Work?\">How Does Linear Regression Work?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#When_Should_You_Use_Linear_Regression\" title=\"When Should You Use Linear Regression?\">When Should You Use Linear Regression?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Assumptions_of_Linear_Regression\" title=\"Assumptions of Linear Regression\">Assumptions of Linear Regression<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Lets_Code_Predicting_Exam_Scores\" title=\"Let\u2019s Code: Predicting Exam Scores\">Let\u2019s Code: Predicting Exam Scores<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Real-Life_Example_Pizza_Sales\" title=\"Real-Life Example: Pizza Sales\">Real-Life Example: Pizza Sales<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Common_Mistakes_to_Avoid\" title=\"Common Mistakes to Avoid\">Common Mistakes to Avoid<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/metaschool.so\/articles\/beginners-guide-to-linear-regression\/#Key_Takeaways\" title=\"Key Takeaways\">Key Takeaways<\/a><\/li><\/ul><\/nav><\/div>\n\n<p>In the world of AI and ML, linear regression\u00a0is one of the most fundamental and widely used tools. It\u2019s often the first algorithm aspiring data scientists and AI enthusiasts learn, and for good reason. Linear regression is not just a stepping stone to more complex models\u2014it\u2019s a powerful technique in its own right, with applications spanning industries like finance, healthcare, marketing, and beyond. From predicting house prices to forecasting sales, linear regression helps us uncover patterns in data and make informed decisions.<\/p>\n\n\n\n<p>In this article, we will discuss what linear regression is, how it works, and some real-world use cases. We will also look at how you can implement it using built-in libraries in Python.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_Linear_Regression\"><\/span>What is Linear Regression?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Imagine you want to predict something, like the price of a house. You might guess that bigger houses cost more. Linear regression is a way to turn that guess into a mathematical rule. It helps you find a straight line that best represents the relationship between two or more things. This is what a linear regression graph looks like\u2014a straight line of best fit across data points.  <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"226\" height=\"196\" src=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/linear.png\" alt=\"Linear Regression\" class=\"wp-image-10314\"\/><\/figure>\n<\/div>\n\n\n<p>For example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you study <strong>2 hours<\/strong>, you might score <strong>65%<\/strong> on an exam.<\/li>\n\n\n\n<li>If you study <strong>5 hours<\/strong>, you might score <strong>80%<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>Linear regression creates a formula like this:<br><strong>Exam Score = Starting Point + (Study Hours \u00d7 How Much Each Hour Helps)<\/strong>.<br>This formula lets you predict scores for any number of study hours.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Breaking_Down_the_Terms\"><\/span><strong>Breaking Down the Terms<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>There are some terms that I will use throughout this article, and it&#8217;s important that you understand them. Let me explain the words you\u2019ll hear:<\/p>\n\n\n\n<p><strong>Dependent Variable<\/strong>: This is what you want to predict (e.g., exam scores). Think of it as the &#8220;result&#8221; that depends on other factors.<\/p>\n\n\n\n<p><strong>Independent Variable<\/strong>: This is the factor you think affects the result (e.g., study hours). You use it to predict the dependent variable.<\/p>\n\n\n\n<p><strong>Slope (\u03b2\u2081)<\/strong> (or gradient of a line): How much the dependent variable changes when the independent variable increases by 1 unit. For example, if the slope is 2.5, studying <strong>1 extra hour<\/strong> adds <strong>2.5 points<\/strong> to your exam score.<\/p>\n\n\n\n<p><strong>Intercept (\u03b2\u2080)<\/strong>: The starting value of the dependent variable when the independent variable is 0. For example, if the intercept is 60, you\u2019d score <strong>60%<\/strong> if you studied <strong>0 hours<\/strong>.<\/p>\n\n\n\n<p><strong>Error Term (\u03b5)<\/strong>: The difference between the predicted value and the actual value. For example, if you predict 75% but score 72%, the error is <strong>-3%<\/strong>.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_Does_Linear_Regression_Work\"><\/span><strong>How Does Linear Regression Work?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Let\u2019s say you collect data on study hours and exam scores:<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\"><thead><tr><th>Study Hours (x)<\/th><th>Exam Score (y)<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>62<\/td><\/tr><tr><td>3<\/td><td>68<\/td><\/tr><tr><td>5<\/td><td>75<\/td><\/tr><tr><td>7<\/td><td>82<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Step 1: Plot the Data<\/strong><br>If you graph these points, they might roughly form a straight line.<\/p>\n\n\n\n<p><strong>Step 2: Find the Best-Fit Line<\/strong><br>Linear regression draws a straight line through these points. The &#8220;best&#8221; line is the one where the <strong>total distance<\/strong> between the line and all points is as small as possible. This distance is called the <strong>residual<\/strong> (or error).<\/p>\n\n\n\n<p><strong>Step 3: Minimize Errors<\/strong><br>The algorithm adjusts the slope and intercept to minimize the sum of <strong>squared residuals<\/strong> (errors squared to avoid negative values). This method is called <strong>Ordinary Least Squares (OLS)<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"When_Should_You_Use_Linear_Regression\"><\/span>When Should You Use Linear Regression?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Linear regression works best when:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>You Want to Predict a Number<\/strong>:<br>Like predicting temperature, prices, or sales.<br><em>Example<\/em>: A pizza shop might predict daily sales based on the number of coupons distributed.<\/li>\n\n\n\n<li><strong>The Relationship is Linear<\/strong>:<br>If you plot the data, the points should roughly form a straight line (not a curve).<br><em>Example<\/em>: More rainfall \u2192 higher crop yield (a straight-line relationship).<\/li>\n\n\n\n<li><strong>You Need a Simple Model<\/strong>:<br>Linear regression is easy to explain. You can say, &#8220;For every $100 spent on ads, sales increase by $500.&#8221;<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>When NOT to Use It<\/strong>?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If the relationship is curved (e.g., population growth over time is exponential).<\/li>\n\n\n\n<li>If there are extreme outliers (e.g., one house priced at $10 million in a neighborhood of $200k homes).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Assumptions_of_Linear_Regression\"><\/span>Assumptions of Linear Regression<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The model relies on these assumptions to work properly:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Linearity<\/strong>:<br>The relationship between variables should look like a straight line.<br><em>How to Check<\/em>: Plot the data. If it looks like a cloud along a line, you\u2019re good!<\/li>\n\n\n\n<li><strong>Independence<\/strong>:<br>Data points shouldn\u2019t influence each other.<br><em>Example<\/em>: If you\u2019re predicting student scores, ensure no two students copied answers.<\/li>\n\n\n\n<li><strong>Constant Variance (Homoscedasticity)<\/strong>:<br>The spread of errors should be the same across all values.<br><em>Example<\/em>: If predictions for small houses have errors of \u00b1$10k, large houses should also have ~\u00b1$10k errors.<\/li>\n\n\n\n<li><strong>Normality of Residuals<\/strong>:<br>The errors should follow a bell-shaped curve (like normal data).<br><em>How to Check<\/em>: Use a histogram of residuals.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Lets_Code_Predicting_Exam_Scores\"><\/span>Let\u2019s Code: Predicting Exam Scores<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We\u2019ll use Python to predict exam scores based on study hours. We will be using the <a href=\"https:\/\/scikit-learn.org\/stable\/\" target=\"_blank\" rel=\"noopener\">Scikit-learn<\/a> library to implement the Linear Regression model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create Sample Data<\/h3>\n\n\n\n<p>We\u2019ll generate fake data where exam scores depend on study hours. More formally, this is called <strong>synthetic data<\/strong> because we\u2019re making it up for practice.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.linear_model import LinearRegression\n\n# Generate study hours (1 to 10 hours, with slight randomness)\nstudy_hours = np.arange(1, 11) + np.random.normal(0, 0.5, 10)\n\n# Generate exam scores (2.5\u00d7study_hours + 60, with randomness)\nexam_scores = 2.5 * study_hours + 60 + np.random.normal(0, 2, 10)\n\n# Reshape the data for the model\nX = study_hours.reshape(-1, 1)  # Independent variable (study hours)\ny = exam_scores.reshape(-1, 1)  # Dependent variable (exam scores)\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #C586C0\">import<\/span><span style=\"color: #D4D4D4\"> numpy <\/span><span style=\"color: #C586C0\">as<\/span><span style=\"color: #D4D4D4\"> np<\/span><\/span>\n<span class=\"line\"><span style=\"color: #C586C0\">import<\/span><span style=\"color: #D4D4D4\"> matplotlib.pyplot <\/span><span style=\"color: #C586C0\">as<\/span><span style=\"color: #D4D4D4\"> plt<\/span><\/span>\n<span class=\"line\"><span style=\"color: #C586C0\">from<\/span><span style=\"color: #D4D4D4\"> sklearn.linear_model <\/span><span style=\"color: #C586C0\">import<\/span><span style=\"color: #D4D4D4\"> LinearRegression<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Generate study hours (1 to 10 hours, with slight randomness)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">study_hours = np.arange(<\/span><span style=\"color: #B5CEA8\">1<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">11<\/span><span style=\"color: #D4D4D4\">) + np.random.normal(<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">0.5<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">10<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Generate exam scores (2.5\u00d7study_hours + 60, with randomness)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">exam_scores = <\/span><span style=\"color: #B5CEA8\">2.5<\/span><span style=\"color: #D4D4D4\"> * study_hours + <\/span><span style=\"color: #B5CEA8\">60<\/span><span style=\"color: #D4D4D4\"> + np.random.normal(<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">2<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">10<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Reshape the data for the model<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">X = study_hours.reshape(-<\/span><span style=\"color: #B5CEA8\">1<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">1<\/span><span style=\"color: #D4D4D4\">)  <\/span><span style=\"color: #6A9955\"># Independent variable (study hours)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">y = exam_scores.reshape(-<\/span><span style=\"color: #B5CEA8\">1<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">1<\/span><span style=\"color: #D4D4D4\">)  <\/span><span style=\"color: #6A9955\"># Dependent variable (exam scores)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Train the Model<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"# Create a linear regression model\nmodel = LinearRegression()\n\n# Fit the model to the data (find the best slope and intercept)\nmodel.fit(X, y)\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #6A9955\"># Create a linear regression model<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">model = LinearRegression()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Fit the model to the data (find the best slope and intercept)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">model.fit(X, y)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Make Predictions<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"# Predict exam scores for all study hours in the data\npredicted_scores = model.predict(X)\n\n# Get the slope and intercept\nslope = model.coef_[0][0]   # How much each hour affects the score\nintercept = model.intercept_[0]  # Expected score with 0 study hours\n\nprint(f&quot;Formula: Exam Score = {intercept:.2f} + {slope:.2f} \u00d7 Study Hours&quot;)\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #6A9955\"># Predict exam scores for all study hours in the data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">predicted_scores = model.predict(X)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Get the slope and intercept<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">slope = model.coef_[<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">][<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">]   <\/span><span style=\"color: #6A9955\"># How much each hour affects the score<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">intercept = model.intercept_[<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">]  <\/span><span style=\"color: #6A9955\"># Expected score with 0 study hours<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">print<\/span><span style=\"color: #D4D4D4\">(<\/span><span style=\"color: #569CD6\">f<\/span><span style=\"color: #CE9178\">&quot;Formula: Exam Score = <\/span><span style=\"color: #569CD6\">{<\/span><span style=\"color: #D4D4D4\">intercept<\/span><span style=\"color: #569CD6\">:.2f}<\/span><span style=\"color: #CE9178\"> + <\/span><span style=\"color: #569CD6\">{<\/span><span style=\"color: #D4D4D4\">slope<\/span><span style=\"color: #569CD6\">:.2f}<\/span><span style=\"color: #CE9178\"> \u00d7 Study Hours&quot;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Visualize the Results<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"plt.scatter(study_hours, exam_scores, color='blue', label='Actual Scores')\nplt.plot(study_hours, predicted_scores, color='red', label='Predicted Line')\nplt.xlabel('Study Hours')\nplt.ylabel('Exam Score')\nplt.title('How Study Hours Affect Exam Scores')\nplt.legend()\nplt.show()\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #D4D4D4\">plt.scatter(study_hours, exam_scores, <\/span><span style=\"color: #9CDCFE\">color<\/span><span style=\"color: #D4D4D4\">=<\/span><span style=\"color: #CE9178\">&#39;blue&#39;<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #9CDCFE\">label<\/span><span style=\"color: #D4D4D4\">=<\/span><span style=\"color: #CE9178\">&#39;Actual Scores&#39;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.plot(study_hours, predicted_scores, <\/span><span style=\"color: #9CDCFE\">color<\/span><span style=\"color: #D4D4D4\">=<\/span><span style=\"color: #CE9178\">&#39;red&#39;<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #9CDCFE\">label<\/span><span style=\"color: #D4D4D4\">=<\/span><span style=\"color: #CE9178\">&#39;Predicted Line&#39;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.xlabel(<\/span><span style=\"color: #CE9178\">&#39;Study Hours&#39;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.ylabel(<\/span><span style=\"color: #CE9178\">&#39;Exam Score&#39;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.title(<\/span><span style=\"color: #CE9178\">&#39;How Study Hours Affect Exam Scores&#39;<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.legend()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">plt.show()<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-left\">Output<\/h3>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2025\/02\/Figure_1.png\" alt=\"Linear Regression Example Output\" class=\"wp-image-12410\" style=\"width:589px;height:auto\" srcset=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2025\/02\/Figure_1.png 640w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2025\/02\/Figure_1-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n<\/div>\n\n\n<p>The red line is the model\u2019s prediction and the slope\/gradient (e.g., 2.54) means <strong>each hour of study adds ~2.5 points<\/strong> to your score. The intercept (e.g., 59.63) is the <strong>baseline score if you studied 0 hours<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Real-Life_Example_Pizza_Sales\"><\/span>Real-Life Example: Pizza Sales<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Imagine you own a pizza shop. To attract more customers, you decide to give away coupons and eventually, you start to notice that on days when you distribute <strong>50 coupons<\/strong>, you sell <strong>100 pizzas<\/strong>, and on days with <strong>100 coupons<\/strong>, you sell <strong>150 pizzas<\/strong>. You want to find out how many pizzas you would sell if you distributed 500 coupons, so you decide to use your knowledge of linear regression to solve this problem.<\/p>\n\n\n\n<p>Using linear regression, you can create a formula like:<br><strong>Pizzas Sold = 50 + (1 \u00d7 Number of Coupons)<\/strong>.<\/p>\n\n\n\n<p>This tells you that for every coupon distributed, you sell <strong>1 extra pizza<\/strong>. But even with <strong>0 coupons<\/strong>, you\u2019d still sell <strong>50 pizzas<\/strong> (maybe from regular customers). Now, when you substitute 500 for &#8220;Number of Coupons&#8221;, you find out that you will be able to sell 550 pizzas.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Common_Mistakes_to_Avoid\"><\/span><strong>Common Mistakes to Avoid<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Assuming All Relationships Are Linear<\/strong>:<br>If your data forms a curve, use polynomial regression instead.<\/li>\n\n\n\n<li><strong>Ignoring Outliers<\/strong>:<br>A single outlier (e.g., a student who studied 1 hour but scored 90%) can skew the line.<\/li>\n\n\n\n<li><strong>Overcomplicating<\/strong>:<br>Start with one independent variable (simple regression) before adding more.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Key_Takeaways\"><\/span>Key Takeaways<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>What<\/strong>: Linear regression predicts a number using a straight-line relationship.<\/p>\n\n\n\n<p><strong>How<\/strong>: Minimize the distance between predicted and actual values.<\/p>\n\n\n\n<p><strong>When<\/strong>: Use linear regression when the relationship between the independent and dependent variables appears linear\u2014meaning that a change in the independent variable results in a proportional change in the dependent variable. It is ideal for forecasting and trend analysis in scenarios where the relationship is stable and straightforward.<\/p>\n\n\n\n<p><strong>Related Reading:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/metaschool.so\/articles\/what-is-regression-model\">What is a Regression Model \u2014 A Comprehensive Guide<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/best-machine-learning-libraries\">10 Best Machine Learning Libraries (With Examples)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/cost-function\">What is a Cost Function in Machine Learning? \u2014 Explained<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/what-is-rag-in-ai\">What is RAG in AI \u2013 A Comprehensive Guide<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":19,"featured_media":12407,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[344],"tags":[],"class_list":["post-12405","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence"],"_links":{"self":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/12405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/comments?post=12405"}],"version-history":[{"count":15,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/12405\/revisions"}],"predecessor-version":[{"id":12473,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/12405\/revisions\/12473"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/media\/12407"}],"wp:attachment":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/media?parent=12405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/categories?post=12405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/tags?post=12405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}