Skip to content

Commit c9cf997

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 28c9f50991d04a3d00913dfa19048d095446bc73
1 parent 346853d commit c9cf997

File tree

1,333 files changed

+6924
-6924
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,333 files changed

+6924
-6924
lines changed
Binary file not shown.

dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -422,14 +422,14 @@
422422
},
423423
"outputs": [],
424424
"source": [
425-
"fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount ($)\")"
425+
"fraud = target == 1\namount_fraud = data[\"Amount\"][fraud]\n_, ax = plt.subplots()\nax.hist(amount_fraud, bins=100)\nax.set_title(\"Amount of fraud transaction\")\n_ = ax.set_xlabel(\"Amount (\u20ac)\")"
426426
]
427427
},
428428
{
429429
"cell_type": "markdown",
430430
"metadata": {},
431431
"source": [
432-
"### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of $5 while the refusal of a fraudulent transaction is estimated\nto a gain of $50 dollars and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
432+
"### Addressing the problem with a business metric\n\nNow, we create the business metric that depends on the amount of each transaction. We\ndefine the cost matrix similarly to [2]_. Accepting a legitimate transaction provides\na gain of 2% of the amount of the transaction. However, accepting a fraudulent\ntransaction result in a loss of the amount of the transaction. As stated in [2]_, the\ngain and loss related to refusals (of fraudulent and legitimate transactions) are not\ntrivial to define. Here, we define that a refusal of a legitimate transaction is\nestimated to a loss of 5\u20ac while the refusal of a fraudulent transaction is estimated\nto a gain of 50\u20ac and the amount of the transaction. Therefore, we define the\nfollowing function to compute the total benefit of a given decision:\n\n"
433433
]
434434
},
435435
{
@@ -505,14 +505,14 @@
505505
},
506506
"outputs": [],
507507
"source": [
508-
"from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: ${benefit_cost:,.2f}\")"
508+
"from sklearn.dummy import DummyClassifier\n\neasy_going_classifier = DummyClassifier(strategy=\"constant\", constant=0)\neasy_going_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n easy_going_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our easy-going classifier: {benefit_cost:,.2f}\u20ac\")"
509509
]
510510
},
511511
{
512512
"cell_type": "markdown",
513513
"metadata": {},
514514
"source": [
515-
"A classifier that predict all transactions as legitimate would create a profit of\naround $220,000. We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"
515+
"A classifier that predict all transactions as legitimate would create a profit of\naround 220,000.\u20ac We make the same evaluation for a classifier that predicts all\ntransactions as fraudulent.\n\n"
516516
]
517517
},
518518
{
@@ -523,14 +523,14 @@
523523
},
524524
"outputs": [],
525525
"source": [
526-
"intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: ${benefit_cost:,.2f}\")"
526+
"intolerant_classifier = DummyClassifier(strategy=\"constant\", constant=1)\nintolerant_classifier.fit(data_train, target_train)\nbenefit_cost = business_scorer(\n intolerant_classifier, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our intolerant classifier: {benefit_cost:,.2f}\u20ac\")"
527527
]
528528
},
529529
{
530530
"cell_type": "markdown",
531531
"metadata": {},
532532
"source": [
533-
"Such a classifier create a loss of around $670,000. A predictive model should allow\nus to make a profit larger than $220,000. It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"
533+
"Such a classifier create a loss of around 670,000.\u20ac A predictive model should allow\nus to make a profit larger than 220,000.\u20ac It is interesting to compare this business\nmetric with another \"standard\" statistical metric such as the balanced accuracy.\n\n"
534534
]
535535
},
536536
{
@@ -559,7 +559,7 @@
559559
},
560560
"outputs": [],
561561
"source": [
562-
"from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n data_train, target_train\n)\n\nprint(\n \"Benefit/cost of our logistic regression: \"\n f\"${business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
562+
"from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nlogistic_regression = make_pipeline(StandardScaler(), LogisticRegression())\nparam_grid = {\"logisticregression__C\": np.logspace(-6, 6, 13)}\nmodel = GridSearchCV(logistic_regression, param_grid, scoring=\"neg_log_loss\").fit(\n data_train, target_train\n)\n\nprint(\n \"Benefit/cost of our logistic regression: \"\n f\"{business_scorer(model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model, data_test, target_test):.3f}\"\n)"
563563
]
564564
},
565565
{
@@ -606,7 +606,7 @@
606606
},
607607
"outputs": [],
608608
"source": [
609-
"print(\n \"Benefit/cost of our logistic regression: \"\n f\"${business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"
609+
"print(\n \"Benefit/cost of our logistic regression: \"\n f\"{business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}\u20ac\"\n)\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(tuned_model, data_test, target_test):.3f}\"\n)"
610610
]
611611
},
612612
{
@@ -635,7 +635,7 @@
635635
},
636636
"outputs": [],
637637
"source": [
638-
"business_score = business_scorer(\n model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: ${business_score:,.2f}\")\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"
638+
"business_score = business_scorer(\n model_fixed_threshold, data_test, target_test, amount=amount_test\n)\nprint(f\"Benefit/cost of our logistic regression: {business_score:,.2f}\u20ac\")\nprint(\n \"Balanced accuracy of our logistic regression: \"\n f\"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}\"\n)"
639639
]
640640
},
641641
{
Binary file not shown.

dev/_downloads/9ca7cbe47e4cace7242fe4c5c43dfa52/plot_cost_sensitive_learning.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@ def plot_roc_pr_curves(vanilla_model, tuned_model, *, title):
489489
_, ax = plt.subplots()
490490
ax.hist(amount_fraud, bins=100)
491491
ax.set_title("Amount of fraud transaction")
492-
_ = ax.set_xlabel("Amount ($)")
492+
_ = ax.set_xlabel("Amount ()")
493493

494494
# %%
495495
# Addressing the problem with a business metric
@@ -501,8 +501,8 @@ def plot_roc_pr_curves(vanilla_model, tuned_model, *, title):
501501
# transaction result in a loss of the amount of the transaction. As stated in [2]_, the
502502
# gain and loss related to refusals (of fraudulent and legitimate transactions) are not
503503
# trivial to define. Here, we define that a refusal of a legitimate transaction is
504-
# estimated to a loss of $5 while the refusal of a fraudulent transaction is estimated
505-
# to a gain of $50 dollars and the amount of the transaction. Therefore, we define the
504+
# estimated to a loss of 5€ while the refusal of a fraudulent transaction is estimated
505+
# to a gain of 50€ and the amount of the transaction. Therefore, we define the
506506
# following function to compute the total benefit of a given decision:
507507

508508

@@ -557,22 +557,22 @@ def business_metric(y_true, y_pred, amount):
557557
benefit_cost = business_scorer(
558558
easy_going_classifier, data_test, target_test, amount=amount_test
559559
)
560-
print(f"Benefit/cost of our easy-going classifier: ${benefit_cost:,.2f}")
560+
print(f"Benefit/cost of our easy-going classifier: {benefit_cost:,.2f}")
561561

562562
# %%
563563
# A classifier that predict all transactions as legitimate would create a profit of
564-
# around $220,000. We make the same evaluation for a classifier that predicts all
564+
# around 220,000. We make the same evaluation for a classifier that predicts all
565565
# transactions as fraudulent.
566566
intolerant_classifier = DummyClassifier(strategy="constant", constant=1)
567567
intolerant_classifier.fit(data_train, target_train)
568568
benefit_cost = business_scorer(
569569
intolerant_classifier, data_test, target_test, amount=amount_test
570570
)
571-
print(f"Benefit/cost of our intolerant classifier: ${benefit_cost:,.2f}")
571+
print(f"Benefit/cost of our intolerant classifier: {benefit_cost:,.2f}")
572572

573573
# %%
574-
# Such a classifier create a loss of around $670,000. A predictive model should allow
575-
# us to make a profit larger than $220,000. It is interesting to compare this business
574+
# Such a classifier create a loss of around 670,000. A predictive model should allow
575+
# us to make a profit larger than 220,000. It is interesting to compare this business
576576
# metric with another "standard" statistical metric such as the balanced accuracy.
577577
from sklearn.metrics import get_scorer
578578

@@ -607,7 +607,7 @@ def business_metric(y_true, y_pred, amount):
607607

608608
print(
609609
"Benefit/cost of our logistic regression: "
610-
f"${business_scorer(model, data_test, target_test, amount=amount_test):,.2f}"
610+
f"{business_scorer(model, data_test, target_test, amount=amount_test):,.2f}"
611611
)
612612
print(
613613
"Balanced accuracy of our logistic regression: "
@@ -645,7 +645,7 @@ def business_metric(y_true, y_pred, amount):
645645
# %%
646646
print(
647647
"Benefit/cost of our logistic regression: "
648-
f"${business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}"
648+
f"{business_scorer(tuned_model, data_test, target_test, amount=amount_test):,.2f}"
649649
)
650650
print(
651651
"Balanced accuracy of our logistic regression: "
@@ -691,7 +691,7 @@ def business_metric(y_true, y_pred, amount):
691691
business_score = business_scorer(
692692
model_fixed_threshold, data_test, target_test, amount=amount_test
693693
)
694-
print(f"Benefit/cost of our logistic regression: ${business_score:,.2f}")
694+
print(f"Benefit/cost of our logistic regression: {business_score:,.2f}")
695695
print(
696696
"Balanced accuracy of our logistic regression: "
697697
f"{balanced_accuracy_scorer(model_fixed_threshold, data_test, target_test):.3f}"

dev/_downloads/scikit-learn-docs.zip

3.08 KB
Binary file not shown.
193 Bytes
-1 Bytes
127 Bytes
71 Bytes
64 Bytes

0 commit comments

Comments
 (0)