Formatting dirty data
Suppose you have the following dataset which contains which contains (1st tab) a list of items purchased by a given user, (2nd tab) a mapping which maps the item_id to the item name and price.
Can you format the data into a matrix with users in rows and the items they purchased into columns along with the frequency of the purchase for each type of item?
For example, if we have a user with the following row:
|12345||1, 4, 4, 3, 5, 5, 5|
We would want the output to look like the following:
Subscribe to premium account to see the solution.Get premium now