Fradulent retail accounts

Below is a daily table for an active acount at Shopify (an online ecommerce, retail platform). The table is called store_account and the columns are:

Column Name Data Type Description
store_id integer a unique Shopify store id
date string date
status string Possible values are: [‘open’, 'closed’, ‘fraud’]
revenue double Amount of spend in USD

Here's some additional information about the table:

  • The granularity of the table is store_id and day
  • Assume “close” and “fraud” are permanent labels
  • Active = daily revenue > 0
  • Accounts get labeled by Shopify as fraudulent and they no longer can sell product
  • Every day of the table has every store_id that has ever used Shopify

Question: What percent of active stores were fraudulent by day, for each day in the last month? Solution will be given using Python (Pandas) for premium users.

Some clarifications:

  • We want one value for each day in the month.
  • A store can be fraudulent and active on same day. E.g. they could generate revenue until 10AM, then be flagged as fradulent from 10AM onward.

Ace your next data science interview

Get better at data science interviews by solving a few questions per week