{"id":10395,"date":"2024-11-27T12:37:43","date_gmt":"2024-11-27T12:37:43","guid":{"rendered":"https:\/\/metaschool.so\/articles\/?p=10395"},"modified":"2024-12-05T11:53:11","modified_gmt":"2024-12-05T11:53:11","slug":"cross-entropy-loss-function","status":"publish","type":"post","link":"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/","title":{"rendered":"Cross Entropy Loss Function in Machine Learning \u2014 Explained!"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_56_1 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title \" >Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#What_is_Entropy\" title=\"What is Entropy?\">What is Entropy?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#The_Cross_Entropy_Loss_Function\" title=\"The Cross Entropy Loss Function\">The Cross Entropy Loss Function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#Why_Use_Cross_Entropy_Loss\" title=\"Why Use Cross Entropy Loss?\">Why Use Cross Entropy Loss?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#Limitations_of_Cross_Entropy_Loss\" title=\"Limitations of Cross Entropy Loss\">Limitations of Cross Entropy Loss<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#Code_Implementation\" title=\"Code Implementation\">Code Implementation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#Conclusion\" title=\"Conclusion\">Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/metaschool.so\/articles\/cross-entropy-loss-function\/#FAQs\" title=\"FAQs\">FAQs<\/a><\/li><\/ul><\/nav><\/div>\n\n<p>In machine learning, one of the central goals is to create models that make accurate predictions by aligning closely with the true labels of the data. A crucial metric in achieving this objective is the cross entropy loss function. This function measures the dissimilarity between the predicted probability distribution and the true distribution, helping to optimize models by providing feedback on how far off predictions are from the actual values. To fully understand cross entropy, we first need to explore its foundational concept \u2014 entropy \u2014 and then discuss how it functions as a loss function.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>If you are unfamiliar with the concept of a loss function or just need some quick revision, don&#8217;t worry, we got you!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What is a Loss Function?<\/strong><\/h3>\n\n\n\n<p>In machine learning, a loss function is a mathematical function that quantifies how well a model&#8217;s predictions match the actual data labels. It provides a scalar value that indicates the error or discrepancy between the predicted output and the true labels. The purpose of the loss function is to guide the training process by allowing the model to adjust its parameters in a way that minimizes the error.<\/p>\n\n\n\n<p>Loss functions are critical because they allow machine learning algorithms to &#8220;learn&#8221; by penalizing incorrect predictions. The lower the loss, the better the model\u2019s predictions are aligned with the true labels. In classification tasks, the choice of loss function is essential to the model&#8217;s ability to converge to an optimal solution during training.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_Entropy\"><\/span>What is Entropy?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2454\" height=\"2454\" src=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited.jpg\" alt=\"\" class=\"wp-image-10663\" style=\"width:224px;height:auto\" srcset=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited.jpg 2454w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-300x300.jpg 300w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-1024x1024.jpg 1024w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-150x150.jpg 150w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-768x768.jpg 768w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-1536x1536.jpg 1536w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-2048x2048.jpg 2048w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/2002.i513.042_cyber_attack_security_set_isometric-14-edited-1320x1320.jpg 1320w\" sizes=\"auto, (max-width: 2454px) 100vw, 2454px\" \/><\/figure>\n<\/div>\n\n\n<p>Entropy refers to the measure of uncertainty or disorder within a system. In the context of machine learning, entropy quantifies the unpredictability or randomness of the true labels in a classification task. Essentially, it tells us how much &#8220;information&#8221; is needed to describe the true label.<\/p>\n\n\n\n<p>For example, in binary classification, where there are only two possible labels (e.g., 0 or 1), entropy is high when both classes are equally likely, indicating uncertainty. In contrast, entropy is low when one class is much more probable than the other, indicating less uncertainty. In an ideal scenario where the model perfectly predicts the true label, the entropy would be minimal, and the system would be certain. The goal is to allow the model to predict the labels with high confidence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_Cross_Entropy_Loss_Function\"><\/span>The Cross Entropy Loss Function<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The cross entropy loss function builds upon the concepts of entropy and loss functions by measuring the difference between two probability distributions \u2014 the predicted probability distribution and the true distribution. It evaluates the performance of a classification model that outputs probability values between 0 and 1.<\/p>\n\n\n\n<p>Cross entropy loss increases as the predicted probability diverges from the actual label. For example, if the true label is 1 and the model predicts a probability of 0.2, the loss value will be high, indicating poor alignment. However, if the model predicts a probability of 0.9 for the true label, the loss will be low, reflecting better performance. A perfect model, where predicted probabilities exactly match the true labels, achieves a log loss of 0, making cross entropy an essential metric for optimizing classification tasks.<\/p>\n\n\n\n<p>One of its key strengths is its ability to amplify the gradient when the predicted probabilities deviate significantly from the actual labels. This characteristic ensures that the model receives a strong signal to update its weights, leading to faster convergence during training. Additionally, the penalty structure of cross entropy loss helps models avoid getting stuck in local minima, encouraging them to find more generalizable solutions.<\/p>\n\n\n\n<p>For binary classification, the cross entropy loss function is calculated using the formula:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[\\text { Loss }=-\\frac{1}{N} \\sum_{i=1}^N\\left[y_i \\log \\left(\\hat{y}_i\\right)+\\left(1-y_i\\right) \\log \\left(1-\\hat{y}_i\\right)\\right]\\]<\/div>\n\n\n\n<p>In this formula, <em>y<sub>i<\/sub><\/em>\u200b represents the true label, and <em>y<sup>^<\/sup><sub>ij<\/sub><\/em>\u200b is the predicted probability. <\/p>\n\n\n\n<p>The cross entropy loss increases as the predicted probability diverges from the true label. It essentially quantifies how &#8220;surprised&#8221; the model is by the true label, penalizing it for making incorrect predictions.<\/p>\n\n\n\n<p>For multi-class classification, where each data point belongs to one of several classes, the cross entropy formula generalizes to:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[\\text { Loss }=-\\frac{1}{N} \\sum_{i=1}^N \\sum_{j=1}^C y_{i j} \\log \\left(\\hat{y}_{i j}\\right)\\]<\/div>\n\n\n\n<p>Here:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>N <\/em>is the number of data samples.<\/li>\n\n\n\n<li><em>C<\/em> is the total number of classes.<\/li>\n\n\n\n<li><em>y<sub>ij<\/sub><\/em>\u200b is a one-hot encoded true label.<\/li>\n\n\n\n<li><em>y<sup>^<\/sup><sub>ij<\/sub><\/em>\u200b is the predicted probability for class <em>j<\/em>.<\/li>\n<\/ul>\n\n\n\n<p>Cross entropy loss encourages the model to increase the probability for the correct class and decrease it for incorrect classes, optimizing the model\u2019s ability to make accurate predictions. By incorporating the concept of entropy and comparing the predicted and true distributions, this loss function provides a clear and effective way to measure and minimize errors, helping models become more accurate and reliable over time.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h3 class=\"wp-block-heading\"><strong>How to Further Improve the Results of the Cross Entropy Loss Function<\/strong>?<\/h3>\n\n\n\n<p>The effectiveness of the cross entropy loss function can be further enhanced by integrating it with regularization techniques. Regularization helps prevent overfitting \u2014 a scenario where the model performs exceptionally well on training data but fails to generalize to unseen data. By discouraging overly complex models, regularization complements the role of cross entropy loss in building robust, reliable systems.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-1024x1024.png\" alt=\"\" class=\"wp-image-10672\" style=\"width:216px;height:auto\" srcset=\"https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-1024x1024.png 1024w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-300x300.png 300w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-150x150.png 150w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-768x768.png 768w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-1536x1536.png 1536w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1-1320x1320.png 1320w, https:\/\/metaschool.so\/articles\/wp-content\/uploads\/2024\/11\/rb_2149243387-1.png 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>Here\u2019s how common regularization techniques improve the performance of models trained with cross entropy loss:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Dropout<\/strong><br>Dropout is a regularization method that temporarily &#8220;drops&#8221; (disables) random neurons during training, ensuring the model does not become overly reliant on specific neurons or connections. When combined with cross-entropy loss:\n<ul class=\"wp-block-list\">\n<li><strong>Improved Generalization<\/strong>: Dropout forces the model to learn more diverse and robust features, reducing overfitting and enhancing its ability to generalize to unseen data.<\/li>\n\n\n\n<li><strong>Reduced Complexity<\/strong>: By dynamically altering the network architecture during training, dropout simplifies the learned patterns and avoids capturing noise in the data.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>L1 Regularization (Lasso)<\/strong><br>L1 regularization adds a penalty proportional to the absolute values of the model&#8217;s weights to the loss function. This encourages sparsity, driving some weights to zero. When applied alongside cross entropy loss:\n<ul class=\"wp-block-list\">\n<li><strong>Simpler Models<\/strong>: Sparsity helps create simpler models that focus only on the most important features, reducing the risk of overfitting.<\/li>\n\n\n\n<li><strong>Feature Selection<\/strong>: L1 regularization inherently performs feature selection by eliminating irrelevant or less significant features.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>L2 Regularization (Ridge)<\/strong><br>L2 regularization adds a penalty proportional to the square of the weights to the loss function, discouraging excessively large weights. When combined with cross entropy loss:\n<ul class=\"wp-block-list\">\n<li><strong>Weight Stabilization<\/strong>: L2 regularization prevents the model from assigning excessive importance to any single feature, leading to more stable predictions.<\/li>\n\n\n\n<li><strong>Smoother Optimization<\/strong>: It ensures smoother gradients during optimization, aiding in convergence and reducing training instability.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_Use_Cross_Entropy_Loss\"><\/span><strong>Why Use Cross Entropy Loss?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Cross entropy loss has emerged as the go-to metric for classification tasks because of its versatility and effectiveness.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Handles Probabilistic Outputs<\/strong><br>Cross entropy loss works seamlessly with models that output probabilities, such as those using softmax activation. It aligns well with the probabilistic nature of classification tasks, ensuring that the model predicts meaningful probability distributions for each class.<\/li>\n\n\n\n<li><strong>Improves Model Confidence<\/strong><br>By heavily penalizing poorly predicted probabilities, cross entropy loss encourages sharper and more confident predictions. For example, a model predicting 0.8 for the correct class will be penalized much less than one predicting 0.3, pushing it to refine its outputs further.<\/li>\n\n\n\n<li><strong>Scales Well to Multi-Class Problems<\/strong><br>Cross entropy loss can handle both binary and multi-class classification tasks effectively. Its mathematical structure accommodates one-hot encoded true labels and adapts naturally to varying numbers of output classes, making it suitable for complex datasets.<\/li>\n\n\n\n<li><strong>Boosts Convergence Speed<\/strong><br>The gradient amplification provided by cross entropy loss accelerates the learning process, allowing models to converge faster compared to simpler loss functions like mean squared error (MSE).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Use Cases<\/h3>\n\n\n\n<p>Cross entropy loss is a widely used loss function for classification tasks especially due to its emphasis on accurate probability estimation. Here&#8217;s how it applies to different tasks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Image Classification<\/strong>: Cross entropy loss ensures that models generate precise probabilities for each class, which is crucial in applications like facial recognition, object detection, and medical imaging (e.g., identifying diseases from X-rays). This precision allows for confident and accurate decisions, even in multi-class settings.<\/li>\n\n\n\n<li><strong>Sentiment Analysis<\/strong>: In tasks like analyzing customer reviews, social media posts, or survey feedback, cross entropy loss helps refine predictions for sentiment classes (e.g., positive, negative, or neutral). This is essential for understanding nuanced opinions and generating reliable sentiment scores.<\/li>\n\n\n\n<li><strong>Natural Language Processing (NLP)<\/strong>: Cross entropy loss optimizes models for diverse language-related tasks, such as:\n<ul class=\"wp-block-list\">\n<li><strong>Text Classification<\/strong>: Categorizing articles or emails (e.g., spam detection, topic modeling).<\/li>\n\n\n\n<li><strong>Machine Translation<\/strong>: Ensuring accurate word-by-word probability alignment for translating sentences across languages.<\/li>\n\n\n\n<li><strong>Question-Answering Systems<\/strong>: Helping models predict the correct answers with high confidence by fine-tuning probability distributions over potential responses.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Limitations_of_Cross_Entropy_Loss\"><\/span><strong>Limitations of Cross Entropy Loss<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Despite its strengths, cross entropy loss has certain challenges:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Sensitivity to Noisy Labels<\/strong><br>Cross entropy loss assigns high penalties for incorrect predictions, which can lead to overfitting in the presence of noisy or mislabeled data. To address this, careful data preprocessing and noise detection strategies are essential.<\/li>\n\n\n\n<li><strong>Overfitting Risks<\/strong><br>Cross entropy loss, when combined with highly flexible models, can lead to overfitting, especially on small datasets. Regularization techniques like L1\/L2 penalties, dropout, or early stopping can help mitigate this issue.<\/li>\n\n\n\n<li><strong>High Loss for Poor Predictions<\/strong><br>When a model predicts probabilities far from the true labels, cross entropy loss can become very large, which might cause instability during the initial phases of training. Proper learning rate selection and gradient clipping are helpful strategies to manage this.<\/li>\n\n\n\n<li><strong>Computational Intensity in Large Datasets<\/strong><br>For large datasets with multi-class outputs, the computation of cross entropy loss can be resource-intensive, requiring optimized implementations and distributed training setups for efficiency.<\/li>\n<\/ol>\n\n\n\n<p>By understanding and addressing these challenges, practitioners can effectively leverage cross entropy loss to train robust and accurate machine learning models.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Code_Implementation\"><\/span>Code Implementation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Let&#8217;s look at a quick and simple Python implementation of a cross entropy loss function using the <a href=\"https:\/\/pytorch.org\/docs\/stable\/index.html\" target=\"_blank\" rel=\"noopener\">PyTorch library<\/a>.<\/p>\n\n\n\n<p>To run this code, make sure you have PyTorch installed:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"pip install torch\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">pip<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">install<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">torch<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Open your preferred IDE \u2014&nbsp;if you don&#8217;t have one installed, I recommend <a href=\"https:\/\/code.visualstudio.com\/download\" target=\"_blank\" rel=\"noopener\">VSCode<\/a> \u2014&nbsp;and paste the following code in a .py file:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import torch\nimport torch.nn as nn\n\n# Initialize the cross-entropy loss function\nloss_ftn = nn.CrossEntropyLoss()\n\n# Predicted values (logits)\ny_pred = torch.tensor([[2.0, 1.0, 0.1]])  # Shape: [1, 3]\n\n# True class label (index of the correct class)\ny_true = torch.tensor([0])  # Shape: [1]\n\n# Compute the cross-entropy loss\nloss = loss_ftn(y_pred, y_true)\n\nprint(&quot;Cross Entropy Loss:&quot;, loss.item())\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #C586C0\">import<\/span><span style=\"color: #D4D4D4\"> torch<\/span><\/span>\n<span class=\"line\"><span style=\"color: #C586C0\">import<\/span><span style=\"color: #D4D4D4\"> torch.nn <\/span><span style=\"color: #C586C0\">as<\/span><span style=\"color: #D4D4D4\"> nn<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Initialize the cross-entropy loss function<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">loss_ftn = nn.CrossEntropyLoss()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Predicted values (logits)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">y_pred = torch.tensor([[<\/span><span style=\"color: #B5CEA8\">2.0<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">1.0<\/span><span style=\"color: #D4D4D4\">, <\/span><span style=\"color: #B5CEA8\">0.1<\/span><span style=\"color: #D4D4D4\">]])  <\/span><span style=\"color: #6A9955\"># Shape: [1, 3]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># True class label (index of the correct class)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">y_true = torch.tensor([<\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\">])  <\/span><span style=\"color: #6A9955\"># Shape: [1]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #6A9955\"># Compute the cross-entropy loss<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">loss = loss_ftn(y_pred, y_true)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">print<\/span><span style=\"color: #D4D4D4\">(<\/span><span style=\"color: #CE9178\">&quot;Cross Entropy Loss:&quot;<\/span><span style=\"color: #D4D4D4\">, loss.item())<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Explanation:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Importing PyTorch<\/strong>: The code begins by importing the necessary PyTorch libraries. <code>torch<\/code> is used for tensor operations, while <code>torch.nn<\/code> provides the <code>CrossEntropyLoss<\/code> class.<\/li>\n\n\n\n<li><strong>Initialize Loss Function<\/strong>: The <code>CrossEntropyLoss<\/code> function is instantiated as <code>loss_ftn<\/code>. <\/li>\n\n\n\n<li><strong>Predicted Values<\/strong>: <code>y_pred<\/code> represents the raw output logits from a model for three classes. These logits are not probabilities but will be transformed internally by the loss function using softmax.<\/li>\n\n\n\n<li><strong>True Class Label<\/strong>: <code>y_true<\/code> specifies the correct class label. It is provided as an integer index corresponding to the correct class (e.g., 0 for the first class).<\/li>\n\n\n\n<li><strong>Compute Loss<\/strong>: The loss is computed by passing <code>y_pred<\/code> and <code>y_true<\/code> to the <code>loss_ftn<\/code> function. The result is a single scalar value representing the cross entropy loss.<\/li>\n\n\n\n<li><strong>Output<\/strong>: The computed loss is printed, giving insight into the model&#8217;s performance. Lower values indicate better predictions.<\/li>\n<\/ul>\n\n\n\n<p>This approach leverages PyTorch&#8217;s optimized functions, ensuring accuracy and computational efficiency.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><strong>Conclusion<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Cross entropy loss stands as a cornerstone in machine learning, especially in classification tasks, offering a mathematically sound way to optimize models and improve their accuracy. By measuring the dissimilarity between predicted probabilities and true labels, it provides critical feedback that guides the training process. Its ability to handle probabilistic outputs, scale across binary and multi-class problems, and accelerate model convergence makes it an indispensable tool for practitioners.<\/p>\n\n\n\n<p>To achieve the best results, it\u2019s also important to address some real world challenges \u2014 such as sensitivity to noisy labels and overfitting\u2014through techniques like regularization, careful preprocessing, and optimized training practices. When combined with methods like dropout or L1\/L2 penalties, cross entropy loss not only mitigates these challenges but also enhances the model&#8217;s generalization ability, ensuring reliable performance on unseen data.<\/p>\n\n\n\n<p><strong>Related Reading:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/metaschool.so\/articles\/cost-function\/\">What is Cost Function in Machine Learning? \u2013 Explained<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/best-machine-learning-libraries\/\">10 Best Machine Learning Libraries (With Examples)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/moe-mixture-of-experts\/\">What is the Mixture of Experts \u2014 A Comprehensive Guide<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/nltk-sentiment-analysis\/\">NLTK Sentiment Analysis Guide for Beginners<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/metaschool.so\/articles\/what-is-generative-ai\/\">What is Generative AI, ChatGPT, and DALL-E? Explained<br><\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"FAQs\"><\/span>FAQs<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1732003222145\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>What is the difference between entropy and cross-entropy?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Entropy measures the uncertainty in a probability distribution, representing the minimum number of bits required to encode the information. Cross-entropy, on the other hand, measures the difference between two probability distributions\u2014the true distribution and the predicted distribution\u2014quantifying how well the predicted probabilities match the true labels.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1732003242998\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>What is the word cross-entropy?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Cross-entropy refers to a loss function used in machine learning to evaluate the performance of a model\u2019s predicted probabilities against the actual labels. It calculates the negative log-likelihood of the true labels under the predicted probability distribution, ensuring accurate predictions.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1732003253672\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>What is softmax and cross-entropy?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Softmax is an activation function that converts raw model outputs (logits) into probabilities by normalizing them to sum to 1. Cross-entropy is the corresponding loss function used alongside softmax to evaluate how close the predicted probability distribution is to the true labels in classification tasks.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1732004507547\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>What is cross-entropy in decision tree?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>In decision trees, cross-entropy can be used as a metric to measure impurity or uncertainty at each split. It evaluates how well the split separates the classes by minimizing the entropy of the resulting child nodes, leading to a more pure classification.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":19,"featured_media":10897,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[344],"tags":[],"class_list":["post-10395","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence"],"_links":{"self":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/10395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/comments?post=10395"}],"version-history":[{"count":37,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/10395\/revisions"}],"predecessor-version":[{"id":10675,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/posts\/10395\/revisions\/10675"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/media\/10897"}],"wp:attachment":[{"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/media?parent=10395"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/categories?post=10395"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metaschool.so\/articles\/wp-json\/wp\/v2\/tags?post=10395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}