scikit-learn
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
14 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
14 Bytes
diff --git a/‎dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb
Lines changed: 9 additions & 9 deletions b/‎dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb
Lines changed: 9 additions & 9 deletions
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
47 Bytes b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
47 Bytes
diff --git a/‎dev/_downloads/9ca7cbe47e4cace7242fe4c5c43dfa52/plot_cost_sensitive_learning.py
Lines changed: 11 additions & 11 deletions b/‎dev/_downloads/9ca7cbe47e4cace7242fe4c5c43dfa52/plot_cost_sensitive_learning.py
Lines changed: 11 additions & 11 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.zip
3.08 KB b/‎dev/_downloads/scikit-learn-docs.zip
3.08 KB
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
193 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
193 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-1 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-1 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
127 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
127 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
71 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
71 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_thumb.png
64 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_thumb.png
64 Bytes
@@ -422,14 +422,14 @@
       },
       "outputs": [],
       "source": [
-        "fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount ($)\")"
+        "fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount (\u20ac)\")"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of $5 while the refusal of a fraudulent transaction is estimated\nto a gain of $50 dollars and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
+        "### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of 5\u20ac while the refusal of a fraudulent transaction is estimated\nto a gain of 50\u20ac and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
       ]
     },
     {
@@ -505,14 +505,14 @@
       },
       "outputs": [],
       "source": [
-        "from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n    easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: ${benefit_cost:,.2f}\")"
+        "from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n    easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: {benefit_cost:,.2f}\u20ac\")"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "A classifier that predict all transactions as legitimate would create a profit of\naround $220,000. We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"
+        "A classifier that predict all transactions as legitimate would create a profit of\naround 220,000.\u20ac We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"
       ]
     },
     {
@@ -523,14 +523,14 @@
       },
       "outputs": [],
       "source": [
-        "intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n    intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: ${benefit_cost:,.2f}\")"
+        "intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n    intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: {benefit_cost:,.2f}\u20ac\")"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Such a classifier create a loss of around $670,000. A predictive model should allow\nus to make a profit larger than $220,000. It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"
+        "Such a classifier create a loss of around 670,000.\u20ac A predictive model should allow\nus to make a profit larger than 220,000.\u20ac It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"
       ]
     },
     {
@@ -559,7 +559,7 @@
       },
       "outputs": [],
       "source": [
-        "from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n    data_train, target_train\n)\n\nprint(\n    \"Benefit/cost of our logistic regression: \"\n    f\"${business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
+        "from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n    data_train, target_train\n)\n\nprint(\n    \"Benefit/cost of our logistic regression: \"\n    f\"{business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
       ]
     },
     {
@@ -606,7 +606,7 @@
       },
       "outputs": [],
       "source": [
-        "print(\n    \"Benefit/cost of our logistic regression: \"\n    f\"${business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"
+        "print(\n    \"Benefit/cost of our logistic regression: \"\n    f\"{business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"
       ]
     },
     {
@@ -635,7 +635,7 @@
       },
       "outputs": [],
       "source": [
-        "business_score = business_scorer(\n    model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: ${business_score:,.2f}\")\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"
+        "business_score = business_scorer(\n    model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: {business_score:,.2f}\u20ac\")\nprint(\n    \"Balanced accuracy of our logistic regression: \"\n    f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"
       ]
     },
     {
 
@@ -489,7 +489,7 @@ def plot_roc_pr_curves(vanilla_model, tuned_model, *, title):
 _, ax = plt.subplots()
 ax.hist(amount_fraud, bins=100)
 ax.set_title("Amount of fraud transaction")
-_ = ax.set_xlabel("Amount ($)")
+_ = ax.set_xlabel("Amount (€)")
 
 # %%
 # Addressing the problem with a business metric
@@ -501,8 +501,8 @@ def plot_roc_pr_curves(vanilla_model, tuned_model, *, title):
 # transaction result in a loss of the amount of the transaction. As stated in [2]_, the
 # gain and loss related to refusals (of fraudulent and legitimate transactions) are not
 # trivial to define. Here, we define that a refusal of a legitimate transaction is
-# estimated to a loss of $5 while the refusal of a fraudulent transaction is estimated
-# to a gain of $50 dollars and the amount of the transaction. Therefore, we define the
+# estimated to a loss of 5€ while the refusal of a fraudulent transaction is estimated
+# to a gain of 50€ and the amount of the transaction. Therefore, we define the
 # following function to compute the total benefit of a given decision:
 
 
@@ -557,22 +557,22 @@ def business_metric(y_true, y_pred, amount):
 benefit_cost = business_scorer(
     easy_going_classifier, data_test, target_test, amount=amount_test
 )
-print(f"Benefit/cost of our easy-going classifier: ${benefit_cost:,.2f}")
+print(f"Benefit/cost of our easy-going classifier: {benefit_cost:,.2f}€")
 
 # %%
 # A classifier that predict all transactions as legitimate would create a profit of
-# around $220,000. We make the same evaluation for a classifier that predicts all
+# around 220,000.€ We make the same evaluation for a classifier that predicts all
 # transactions as fraudulent.
 intolerant_classifier = DummyClassifier(strategy="constant", constant=1)
 intolerant_classifier.fit(data_train, target_train)
 benefit_cost = business_scorer(
     intolerant_classifier, data_test, target_test, amount=amount_test
 )
-print(f"Benefit/cost of our intolerant classifier: ${benefit_cost:,.2f}")
+print(f"Benefit/cost of our intolerant classifier: {benefit_cost:,.2f}€")
 
 # %%
-# Such a classifier create a loss of around $670,000. A predictive model should allow
-# us to make a profit larger than $220,000. It is interesting to compare this business
+# Such a classifier create a loss of around 670,000.€ A predictive model should allow
+# us to make a profit larger than 220,000.€ It is interesting to compare this business
 # metric with another "standard" statistical metric such as the balanced accuracy.
 from sklearn.metrics import get_scorer
 
@@ -607,7 +607,7 @@ def business_metric(y_true, y_pred, amount):
 
 print(
     "Benefit/cost of our logistic regression: "
-    f"${business_scorer(model, data_test, target_test, amount=amount_test):,.2f}"
+    f"{business_scorer(model, data_test, target_test, amount=amount_test):,.2f}€"
 )
 print(
     "Balanced accuracy of our logistic regression: "
@@ -645,7 +645,7 @@ def business_metric(y_true, y_pred, amount):
 # %%
 print(
     "Benefit/cost of our logistic regression: "
-    f"${business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}"
+    f"{business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}€"
 )
 print(
     "Balanced accuracy of our logistic regression: "
@@ -691,7 +691,7 @@ def business_metric(y_true, y_pred, amount):
 business_score = business_scorer(
     model_fixed_threshold, data_test, target_test, amount=amount_test
 )
-print(f"Benefit/cost of our logistic regression: ${business_score:,.2f}")
+print(f"Benefit/cost of our logistic regression: {business_score:,.2f}€")
 print(
     "Balanced accuracy of our logistic regression: "
     f"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}"
Original file line number	Diff line number	Diff line change
`@@ -422,14 +422,14 @@`
`422`	`422`	`},`
`423`	`423`	`"outputs": [],`
`424`	`424`	`"source": [`
`425`		`- "fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount ($)\")"`
	`425`	`+ "fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount (\u20ac)\")"`
`426`	`426`	`]`
`427`	`427`	`},`
`428`	`428`	`{`
`429`	`429`	`"cell_type": "markdown",`
`430`	`430`	`"metadata": {},`
`431`	`431`	`"source": [`
`432`		- "### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of $5 while the refusal of a fraudulent transaction is estimated\nto a gain of $50 dollars and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
	`432`	+ "### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of 5\u20ac while the refusal of a fraudulent transaction is estimated\nto a gain of 50\u20ac and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
`433`	`433`	`]`
`434`	`434`	`},`
`435`	`435`	`{`
`@@ -505,14 +505,14 @@`
`505`	`505`	`},`
`506`	`506`	`"outputs": [],`
`507`	`507`	`"source": [`
`508`		`- "from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: ${benefit_cost:,.2f}\")"`
	`508`	`+ "from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: {benefit_cost:,.2f}\u20ac\")"`
`509`	`509`	`]`
`510`	`510`	`},`
`511`	`511`	`{`
`512`	`512`	`"cell_type": "markdown",`
`513`	`513`	`"metadata": {},`
`514`	`514`	`"source": [`
`515`		`- "A classifier that predict all transactions as legitimate would create a profit of\naround $220,000. We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"`
	`515`	`+ "A classifier that predict all transactions as legitimate would create a profit of\naround 220,000.\u20ac We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"`
`516`	`516`	`]`
`517`	`517`	`},`
`518`	`518`	`{`
`@@ -523,14 +523,14 @@`
`523`	`523`	`},`
`524`	`524`	`"outputs": [],`
`525`	`525`	`"source": [`
`526`		`- "intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: ${benefit_cost:,.2f}\")"`
	`526`	`+ "intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: {benefit_cost:,.2f}\u20ac\")"`
`527`	`527`	`]`
`528`	`528`	`},`
`529`	`529`	`{`
`530`	`530`	`"cell_type": "markdown",`
`531`	`531`	`"metadata": {},`
`532`	`532`	`"source": [`
`533`		`- "Such a classifier create a loss of around $670,000. A predictive model should allow\nus to make a profit larger than $220,000. It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"`
	`533`	`+ "Such a classifier create a loss of around 670,000.\u20ac A predictive model should allow\nus to make a profit larger than 220,000.\u20ac It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"`
`534`	`534`	`]`
`535`	`535`	`},`
`536`	`536`	`{`
`@@ -559,7 +559,7 @@`
`559`	`559`	`},`
`560`	`560`	`"outputs": [],`
`561`	`561`	`"source": [`
`562`		- "from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n data_train, target_train\n)\n\nprint(\n \"Benefit/cost of our logistic regression: \"\n f\"${business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
	`562`	+ "from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n data_train, target_train\n)\n\nprint(\n \"Benefit/cost of our logistic regression: \"\n f\"{business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
`563`	`563`	`]`
`564`	`564`	`},`
`565`	`565`	`{`
`@@ -606,7 +606,7 @@`
`606`	`606`	`},`
`607`	`607`	`"outputs": [],`
`608`	`608`	`"source": [`
`609`		`- "print(\n \"Benefit/cost of our logistic regression: \"\n f\"${business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"`
	`609`	`+ "print(\n \"Benefit/cost of our logistic regression: \"\n f\"{business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"`
`610`	`610`	`]`
`611`	`611`	`},`
`612`	`612`	`{`
`@@ -635,7 +635,7 @@`
`635`	`635`	`},`
`636`	`636`	`"outputs": [],`
`637`	`637`	`"source": [`
`638`		`- "business_score = business_scorer(\n model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: ${business_score:,.2f}\")\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"`
	`638`	`+ "business_score = business_scorer(\n model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: {business_score:,.2f}\u20ac\")\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"`
`639`	`639`	`]`
`640`	`640`	`},`
`641`	`641`	`{`