Since the domestic financialP2P暴雷,The domestic many small lending institutions come into southeast Asia、Unexplored markets such as Africa,像印尼、印度、菲律賓、泰國、越南、尼日利亞等國家.
Analysis of the southeast Asia/The market characteristics of the African countries,A low financial inclusion(2017年越南有30.8%的人擁有銀行賬戶),The high demand for financial(2017Borrowing in the proportion of people49.0%)And Internet penetration(2018年為66%)And mobile connectivity,For the development of the southeast Asian financial loans of science and technology provides the most favorable conditions,Opens the savage growth mode.
Combined with the regional loan market situation,Usually credit system construction and the economy are more bad,And most of the user credentials more bad(Also does not meet the bank's loan qualifications).種種因素下,Agency for lending user credit/Fraud risk control is more bad of the,Small credit defaults generally high(As some of the institutions lending to new users bad debts rate can be up to 20~30%,And the bank bad debts usually in10%左右).
In southeast Asia to carry out microcredit products,普遍是714高炮(貸款周期7-14天,High late fee or deducted from the principal in advance when lending interest-砍頭息,Some actual annualized rates have reached300%).
High interest rates must with high risk,This business is also very vulnerable to financial regulatory policy to block.
Such high debts in,If small credit institutions in the user credit lending to grasp,Even higher interest rates may not cover such a high credit risk.
可見,Risk control is the core of the small business loan losses control,Risk control system is usually made of 反欺詐(Id information to verify、人臉識別驗證、黑名單)+ Apply for grading model of.
Risk control is good or not is the key to data acquisition and accumulation of.An obvious difference reflected in,New lending institutions user defaults is20~30%(In the proportion of loan fraud fraud should be quite high),For the old users within the institutions after loan(Before a repeat of borrowing loans user)Bad debts has only4%.
也就是,For institutional users with a master borrowing history,The bad debt rate was significantly lower!Credit risk control ability differences also is actually the embodiment of the monopoly advantage data! For a small loan institutions,After marketing extension new users,How to apply the risk control model accurately assess the new users as far as possible,And to give a lower limit,When it raised quotas after I have a good credit history,To maintain and extend this part of the complex credit users is the key of the business profits.
Small loans from overseas agency to apply for grade model of main source data have:
The project is based on southeast Asia as a recent500The microfinance deal(數據源於網絡,侵刪),獲取相應Experian征信報告數據,並用PythonWork out credit reporting features of sliding window:如近30Day loan number,On average amount、Recently the loan date interval、History overdue frequency characteristics,通過LightGBMBuild application scoring model.
ExperianCredit report of the original message contains the personal basic information、Recently the loan information、信用卡、Information such as loans, such as historic performance.The following code sliding time window,提取相應的特征.
# 完整代碼請關注公眾號“算法進階”或訪問https://github.com/aialgorithm/Blog
def add_fea_grids(fea_dict, mult_datas, apply_dt='20200101', dt_key='Open_Date', calc_key="data['Amount_Past_Due']",groupfun=['count','sum', 'median','mean','max','min','std'], dt_grids=[7, 30,60,360,9999]):
"""
Credit report using sliding time window-近N天,Processing fieldA的 計數、平均、Sum the characteristics.
fea_dict:Final characteristics stored dictionary
mult_datas:Multiple records value
calc_key:The relative position of data fields
"""
new_fea = {} # Record the time window of the original features
for _dt in dt_grids:
new_fea.setdefault(_dt, [])# According to the initialization time window
fea_suffix = calc_key.split("'")[-2] + str(len(calc_key)) # The prefix note
if mult_datas:
mult_datas = con_list(mult_datas)
for data in mult_datas:
if len(data[dt_key]) >=4 and data[dt_key] < apply_dt: #Filter the records before the date of application,Report should be subject to real-time call
for _dt in dt_grids:
if (_dt==9999) or (ddt.datetime.strptime(str(data[dt_key]),"%Y%m%d") >= (ddt.datetime.strptime(str(apply_dt),"%Y%m%d") + ddt.timedelta(days=-_dt))) :# Screening for nearlyN天的記錄,為9999Don't do screening
if "Date" in calc_key or "Year" in calc_key : #Determine whether to date type,Date of direct calculation for interval
fea_value = diff_date(apply_dt, eval(calc_key) )
elif "mean" in groupfun: # 判斷是否為數值型,Direct extraction to the corresponding time window
fea_value = to_float_ornan(eval(calc_key))
else:# Other type according to the characters of the process
fea_value = eval(calc_key)
new_fea[_dt].append(fea_value) # { 30: [2767.0, 0.0]}
for _k, data_list in new_fea.items(): # Generate specific features
for fun in groupfun:
fea_name = fea_suffix+ '_'+ fun + '_' +str(_k)
fea_dict.setdefault(fea_name, [])
if len(data_list) > 0:
final_value = fun_dict[fun](data_list)
else :
final_value = np.nan
fea_dict[fea_name].append(final_value)
復制代碼
Consider the credit report privacy,This project provides only a report sample to do feature processing.Features processed selection,Associated overdue label,Form the following the final data features wide table.
df2 = pd.read_pickle('filter_feas_df.pkl')
print(df2.label.value_counts()) # Overdue label aslabel==0
df2.head()
復制代碼
train_x, test_x, train_y, test_y = train_test_split(df2.drop('label',axis=1),df2.label,random_state=1)
lgb = lightgbm.LGBMClassifier()
lgb.fit(train_x, train_y,
eval_set=[(test_x,test_y)],
eval_metric='auc',
early_stopping_rounds=50,
verbose=-1)
print('train ',model_metrics2(lgb,train_x, train_y))
print('test ',model_metrics2(lgb,test_x,test_y))
復制代碼
Only with the characteristics of the credit report data,Visible to the user's overdue recognition effect is generally,Test AUC僅60%左右(The follow-up or have hoped to join some messages、History of borrowing class data such as).Comprehensive analysis model of important features,主要為:
以上就是本次分享的所有內容,如果你覺得文章還不錯,歡迎關注公眾號:Python編程學習圈,每日干貨分享,內容覆蓋Python電子書、教程、數據庫編程、Django,爬蟲,雲計算等等.或是前往編程學習網,了解更多編程技術知識.