Loan default prediction - cleaning the data and intro EDA


The following dataset contains information on loans. Can you do the following to prepare the data set for analysis?

  • Create a new column called "loan_status_type" which will categorize "loan_status" into the following:

  • Current - loans currently outstanding

  • Closed - loans that are no longer open

  • Create a new column called "loan_status_standing" which will categorize "loan_status" into the following:

  • Good - customers who have (so far) successfully met the condition of their loan (e.g. no missed payments, no late fees accumulated)

  • Bad - customers who have missed payments / defaulted

With these 2 new columns, can you plot the month and year the loan was issued and the sum of the loan amounts by loan_status_type and loan_status_contract?

The data provided is a subset of a larger dataset. You can find more information about the larger dataset here.

LoanStatNew Description
zip_code The first 3 numbers of the zip code provided by the borrower in the loan application.
addr_state The state provided by the borrower in the loan application
annual_inc The annual income provided by the borrower during registration.
collection_recovery_fee post charge off collection fee
collections_12_mths_ex_med Number of collections in 12 months excluding medical collections
delinq_2yrs The number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
desc Loan description provided by the borrower
dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
earliest_cr_line The month the borrower’s earliest reported credit line was opened
emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
emp_title The job title supplied by the Borrower when applying for the loan.
funded_amnt The total amount committed to that loan at that point in time.
funded_amnt_inv The total amount committed by investors for that loan at that point in time.
grade LC assigned loan grade
home_ownership The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
id A unique LC assigned ID for the loan listing.
initial_list_status The initial listing status of the loan. Possible values are – W, F
inq_last_6mths The number of inquiries by creditors during the past 6 months.
installment The monthly payment owed by the borrower if the loan originates.
int_rate Interest Rate on the loan
issue_d The month which the loan was funded
last_credit_pull_d The most recent month LC pulled credit for this loan
last_pymnt_amnt Last total payment amount received
last_pymnt_d Last month payment was received
loan_amnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
loan_status Current status of the loan
member_id A unique LC assigned Id for the borrower member.
mths_since_last_delinq The number of months since the borrower’s last delinquency.
mths_since_last_major_derog Months since most recent 90-day or worse rating
mths_since_last_record The number of months since the last public record.
next_pymnt_d Next scheduled payment date
open_acc The number of open credit lines in the borrower’s credit file.
out_prncp Remaining outstanding principal for total amount funded
out_prncp_inv Remaining outstanding principal for portion of total amount funded by investors
policy_code Publicly available policy_code=1, new products not publicly available policy_code=2
pub_rec Number of derogatory public records
purpose A category provided by the borrower for the loan request.
pymnt_plan Indicates if a payment plan has been put in place for the loan
recoveries post charge off gross recovery
revol_bal Total credit revolving balance
revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
sub_grade LC assigned loan subgrade
term The number of payments on the loan. Values are in months and can be either 36 or 60.
title The loan title provided by the borrower
total_acc The total number of credit lines currently in the borrower’s credit file
total_pymnt Payments received to date for total amount funded
total_pymnt_inv Payments received to date for portion of total amount funded by investors
total_rec_int Interest received to date
total_rec_late_fee Late fees received to date
total_rec_prncp Principal received to date
url URL for the LC page with listing data.


Access restricted

Subscribe to premium account to see the solution.

Get premium now