cph = CoxPHFitter()训练过程中遇到的坑以及画图

发布时间:2024-12-20 23:48

宠物训练过程中,有时会遇到意外的搞笑情况 #生活乐趣# #日常生活乐趣# #宠物陪伴的乐趣# #宠物训练趣闻#

画图报错:会报valueError,原因是可能画图软件没有达到指定版本;
解决方案:(1)更新plt,安装最新的到0.17;可能还会需要安装最新的lifelines;
DataFrames的画图:
参考:https://blog.csdn.net/grey_csdn/article/details/70768721
如下,DataFrame画图:

from pandas import Series,DataFrame from numpy.random import randn import numpy as np import matplotlib.pyplot as plt df = DataFrame(randn(10,5),columns=['A','B','C','D','E'],index = np.arange(0,100,10)) df.plot()123456

cph = CoxPHFitter()画图: import pandas as pd from lifelines import CoxPHFitter import matplotlib.pyplot as plt cph = CoxPHFitter() df1 = pd.read_csv('/home/sc/Downloads/tmp/shixin_cox_all_data_to_model_new.csv') #训练方式1,只用以下几个特征训练 # c = ['defendant_judgedoc_cnt','network_share_zhixing_cnt','shixin_label', 'survival_time','regcap','judgedoc_cnt'] c =['is_revoke','is_cancel','court_notice_is_no','established_year','r1_subsidiary_invest_max_dx_zx','r2_controlled_invest_max_dx_zx', 'r4_common_corporate_shi_xin', 'r4_common_corporate_zhi_xin','judgedoc_cnt', 'network_share_judge_doc_cnt','network_all_link_defendant_judgedoc_cnt', 'companyname_change_cnt','business_range_change_cnt','regcap_change_cnt','share_change_cnt','fr_change_cnt', 'address_change_cnt','director_change_cnt','network_fr_judgedoc_cnt','shixin_label', 'survival_time'] #'is_cancel', df1 =df1[c] #训练方式2:去掉全为0的特征. # a =['company_name','r1_subsidiary_invest_max', 'r2_controlled_invest_max', 'r3_common_company_controlled_invest', 'r4_common_corporate'] # c_1 =['network_share_shixin_cnt','litigant_defendant_contract_dispute_cnt','litigant_defendant_bust_cnt','litigant_copyright_dispute_cnt'] # a.extend(c_1) # df1 = df1.drop(a, axis=1) df1 = df1.fillna(0) # shixin_0 = df1[(df1['shixin_label'] == 0)][0:5000] # shixin_1 = df1[(df1['shixin_label'] == 1)][0:2000] # df1 = pd.concat([shixin_0,shixin_1]) shixin_0 = df1[(df1['shixin_label'] == 0)][0:100000] shixin_1 = df1[(df1['shixin_label'] == 1)][0:30000] df1 = pd.concat([shixin_0,shixin_1]) # df1 = df1.sort_values(by="survival_time" , ascending=True) # print(df1["survival_time"]) # df1['group'] =(df1.groupby(['survival_time','shixin_label']).size()).tolist() # # print(df1['group']) cph.fit(df1, duration_col='survival_time', event_col='shixin_label', show_progress=True, step_size=0.1) cph.print_summary() cph.plot() #画得是两个变量之间的相关关系值 plt.show() cph.plot_covariate_groups('established_year', [0, 5, 10, 15]) plt.show() # harper= df1['established_year'] # ax = plt.subplot(2,1,1) # df1.predict_cumulative_hazard(harper).plot(ax=ax) # # ax = plt.subplot(2,1,2) # df1.predict_survival_function(harper).plot(ax=ax) # from lifelines import CoxPHFitter # from lifelines.datasets import load_regression_dataset # from lifelines.utils import k_fold_cross_validation # import numpy as np # regression_dataset = load_regression_dataset() # cph = CoxPHFitter() # ###做k折交叉验证的时候,会导致有些特征取值全为0,会报ValueError: delta contains nan value(s). Convergence halted.错误; # scores = k_fold_cross_validation(cph, df1, duration_col='survival_time', event_col='shixin_label',k=3) # print(scores) # print(np.mean(scores)) # print(np.std(scores))

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768

(2)训练过程中遇到的坑:
虽然结果指标Concordance相比之前提升了不少,但是其特征的显著性全都很低,原因是步长step_size调的过小,将step_size=0.00001 调至step_size=0.1,即可以看到有些特征的显著性较强(三颗星:*),这背后的原因还没弄清楚;另外,会发现样本量整体数量与样本中正负样本比例对结果会造成轻微影响;

网址:cph = CoxPHFitter()训练过程中遇到的坑以及画图 https://www.yuejiaxmz.com/news/view/528839

相关内容

pytorch模型训练流程中遇到的一些坑(持续更新)
亲子口才训练方法
第七届工程训练比赛之智能垃圾分类
【Python】Python连接Hadoop数据中遇到的各种坑(汇总)
放松训练操作和实施的过程包括()
如何通过冥想训练心灵
DBT技巧训练讲义及作业单
训练程序
2023年健身训练计划(六篇)
手绘教程!怀旧风格学生时代插画原创思路及过程分享

随便看看