Loan default prediction - cleaning the data and intro EDA

Hard

The following dataset contains information on loans. Can you do the following to prepare the data set for analysis?

  • Create a new column called "loan_status_type" which will categorize "loan_status" into the following:

  • Current - loans currently outstanding

  • Closed - loans that are no longer open

  • Create a new column called "loan_status_standing" which will categorize "loan_status" into the following:

  • Good - customers who have (so far) successfully met the condition of their loan (e.g. no missed payments, no late fees accumulated)

  • Bad - customers who have missed payments / defaulted

With these 2 new columns, can you plot the month and year the loan was issued and the sum of the loan amounts by loan_status_type and loan_status_contract?

The data provided is a subset of a larger dataset. You can find more information about the larger dataset here.

LoanStatNewDescription
zip_codeThe first 3 numbers of the zip code provided by the borrower in the loan application.
addr_stateThe state provided by the borrower in the loan application
annual_incThe annual income provided by the borrower during registration.
collection_recovery_feepost charge off collection fee
collections_12_mths_ex_medNumber of collections in 12 months excluding medical collections
delinq_2yrsThe number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
descLoan description provided by the borrower
dtiA ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
earlie...

Sign in to InterviewQs to view this question

Or

New to InterviewQs? Sign up now.

By proceeding, you agree to our Terms and Conditions and Privacy Policy.

Loading editor...