【构建ML驱动的应用程序】第 7 章：使用分类器编写推荐

发布时间：2022-12-13 15:02:26 所属栏目：应用来源：转载

导读： 生成编辑建议
这ML Editor 可以受益于我们描述的四种生成推荐的方法中的任何一种。事实上，所有这些方法都在本书的 GitHub 存储库中的生成推荐笔记本中进行了展示。因为我们使用的模型速度

生成编辑建议

这ML Editor 可以受益于我们描述的四种生成推荐的方法中的任何一种。事实上，所有这些方法都在本书的 GitHub 存储库中的生成推荐笔记本中进行了展示。因为我们使用的模型速度很快应用程序编写，所以我们将在这里说明最精细的方法，使用黑盒解释器。

让我们首先看一下接受问题并根据经过训练的模型提供编辑建议的整个推荐功能。这是这个函数的样子：

def get_recommendation_and_prediction_from_text(input_text, num_feats=10):
    global clf, explainer
    feats = get_features_from_input_text(input_text)
    pos_score = clf.predict_proba([feats])[0][1]
    exp = explainer.explain_instance(
        feats, clf.predict_proba, num_features=num_feats, labels=(1,)
    )
    parsed_exps = parse_explanations(exp.as_list())
    recs = get_recommendation_string_from_parsed_exps(parsed_exps)
    return recs, pos_score

在示例输入上调用此函数并漂亮地打印其结果会产生如下建议。然后我们可以向用户显示这些建议，让他们重复他们的问题。

recos, score = get_recommendation_and_prediction_from_text(example_question)
print("%s score" % score)

0.4 score

print(*recos, sep="\n")

Increase question length
Increase vocabulary diversity
Increase frequency of question marks
No need to increase frequency of periods
Decrease question length
Decrease frequency of determiners
Increase frequency of commas
No need to decrease frequency of adverbs
Increase frequency of coordinating conjunctions
Increase frequency of subordinating conjunctions

让我们分解这个功能。从它的签名开始，该函数将一个表示问题的输入字符串作为参数，以及一个可选参数，用于确定要推荐多少最重要的功能。它返回建议，以及代表问题当前质量的分数。

深入到问题的主体，第一行指的是两个全局定义的变量，训练模型和 LIME 解释器的实例，就像我们在中定义的那样。接下来的两行从输入文本生成特征，并将这些特征传递给分类器进行预测。然后，exp通过使用 LIME 生成解释来定义。

最后两个函数调用将这些解释转化为人类可读的建议。让我们看看如何通过查看这些函数的定义，从parse_explanations.

def parse_explanations(exp_list):
    global FEATURE_DISPLAY_NAMES
    parsed_exps = []
    for feat_bound, impact in exp_list:
        conditions = feat_bound.split(" ")
        # We ignore doubly bounded conditions , e.g. 1 <= a < 3 because
        # they are harder to formulate as a recommendation
        if len(conditions) == 3:
            feat_name, order, threshold = conditions
            simple_order = simplify_order_sign(order)
            recommended_mod = get_recommended_modification(simple_order, impact)
            parsed_exps.append(

                {
                    "feature": feat_name,
                    "feature_display_name": FEATURE_DISPLAY_NAMES[feat_name],
                    "order": simple_order,
                    "threshold": threshold,
                    "impact": impact,
                    "recommendation": recommended_mod,
                }
            )
    return parsed_exps

这个函数很长，但是它完成了一个相对简单的目标。它采用 LIME 返回的特征重要性数组，并生成可用于推荐的结构化程度更高的字典。这是此转换的示例：

# exps is in the format of LIME explanations
>> exps = [('num_chars <= 408.00', -0.03908691525058592),
 ('DET > 0.03', -0.014685507408497802)]
>> parse_explanations(exps)
[{'feature': 'num_chars',
  'feature_display_name': 'question length',
  'order': '<',
  'threshold': '408.00',
  'impact': -0.03908691525058592,
  'recommendation': 'Increase'},
 {'feature': 'DET',
  'feature_display_name': 'frequency of determiners',
  'order': '>',
  'threshold': '0.03',
  'impact': -0.014685507408497802,
  'recommendation': 'Decrease'}]

请注意，函数调用将 LIME 显示的阈值转换为是否应增加或减少特征值的建议。这是使用get_recommended_modification此处显示的函数完成的：

def get_recommended_modification(simple_order, impact):
    bigger_than_threshold = simple_order == ">"
    has_positive_impact = impact > 0
    if bigger_than_threshold and has_positive_impact:
        return "No need to decrease"
    if not bigger_than_threshold and not has_positive_impact:
        return "Increase"
    if bigger_than_threshold and not has_positive_impact:
        return "Decrease"
    if not bigger_than_threshold and has_positive_impact:
        return "No need to increase"

一旦解释被解析为建议，剩下的就是以适当的格式显示它们。这是由中的最后一个函数调用完成的get_recommendation_and_prediction_from_text，如下所示：

def get_recommendation_string_from_parsed_exps(exp_list):
    recommendations = []
    for feature_exp in exp_list:
        recommendation = "%s %s" % (
            feature_exp["recommendation"],

            feature_exp["feature_display_name"],
        )
        recommendations.append(recommendation)
    return recommendations

如果您想试用此编辑器并对其进行迭代，请随时参考本书 GitHub 存储库中的生成建议笔记本。在笔记本的末尾，我包含了一个使用模型建议多次改写问题并提高其分数的示例。我在这里复制这个例子来演示如何使用这些建议来指导用户编辑问题。

// First attempt at a question
>> get_recommendation_and_prediction_from_text(
    """
I want to learn how models are made
"""
)
0.39 score
Increase question length
Increase vocabulary diversity
Increase frequency of question marks
No need to increase frequency of periods
No need to decrease frequency of stop words
// Following the first three recommendations
>> get_recommendation_and_prediction_from_text(
    """
I'd like to learn about building machine learning products.
Are there any good product focused resources?
Would you be able to recommend educational books?
"""
)
0.48 score
Increase question length
Increase vocabulary diversity
Increase frequency of adverbs
No need to decrease frequency of question marks
Increase frequency of commas
// Following the recommendations once more
>> get_recommendation_and_prediction_from_text(
    """
I'd like to learn more about ML, specifically how to build ML products.
When I attempt to build such products, I always face the same challenge:
how do you go beyond a model?
What are the best practices to use a model in a concrete application?
Are there any good product focused resources?
Would you be able to recommend educational books?
"""
)
0.53 score

瞧，我们现在有了一个可以接受问题并向用户提供可操作建议的管道。这个管道绝不是完美的，但我们现在有一个工作的端到端 ML 驱动的编辑器。如果您想尝试改进它，我鼓励您与当前版本进行交互并确定要解决的故障模式。有趣的是，虽然模型总是可以迭代的，但我认为这个编辑器最有希望改进的方面是生成用户更清楚的新功能。

（编辑：威海站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

【构建ML驱动的应用程序】第 7 章 ：使用分类器编写推荐

【构建ML驱动的应用程序】第 7 章：使用分类器编写推荐