sklearn中在模型学习中使用preprocessing.scale(x),之后预测模型该怎么使用模型的参数呢?

x = preprocessing.scale(x)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
clf = LinearRegression()
clf.fit(x_train, y_train)
模型完成之后,该怎么对其他数据进行预测,怎么使用模型的参数呢?

谢谢~

讨论数量: 3
Jason990420

直接使用 y_predict = clf.predict(X_test) 就可以了

import numpy as np

from sklearn import preprocessing
from sklearn.datasets import load_iris
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
target_names = iris.target_names

X   = preprocessing.scale(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = LinearRegression()
clf.fit(X_train, y_train)

coef        = clf.coef_
intercept   = clf.intercept_

score_train = clf.score(X_train, y_train)
score_test  = clf.score(X_test, y_test)

y_predict = clf.predict(X_test)
>>> coef
array([-0.06049182, -0.02313831,  0.37996834,  0.45398214])
>>> intercept
0.9930958853271531
>>> score_train
0.9209146175497155
>>> score_test
0.9623136309667657
>>> y_test
array([1, 2, 0, 2, 0, 1, 2, 2, 0, 0, 2, 2, 0, 1, 0, 0, 0, 2, 0, 0, 1, 1,
       2, 2, 0, 1, 2, 1, 1, 2])
>>> y_predict
array([ 1.19876816,  2.05668051,  0.0259493 ,  2.01938856, -0.04817772,
        1.20986557,  1.80263867,  1.74973674, -0.06816341, -0.01783932,
        1.90203304,  2.03571552,  0.09293602,  0.92162817, -0.02608713,
       -0.12170897, -0.00287997,  1.96105866, -0.02267151, -0.09347547,
        0.97004578,  1.15132506,  1.87500865,  2.04334094, -0.08614581,
        0.87721717,  1.564645  ,  1.1025101 ,  1.3396787 ,  1.5513618 ])
3年前 评论

@Jason990420 x_test 是在X = preprocessing.scale(X) 之后。 如果是一个新的数据集呢?

该怎么使用现有模型的参数呢?谢谢~

3年前 评论
Jason990420

新的数据集, 当然也要经过相同的预处理.

模型的参数, 基本上没用, 除非你要那一个线性方程式. 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀

有用的是 model & 参数, 可以利用 pickle存下来, 下次不用再经训练直接使用.

# Save model
import pickle
with open(filename, 'wb') as f:
    pickle.dump(model, f)
# Load model
with open(filename, 'rb') as f:
    loaded_model = pickle.load(f)
y_predict = loaded_model.predict(X_data)
result = loaded_model.score(X_data, Y_data)
3年前 评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!