If you're new to the data space, or if you've recently learned a new skill, or just trying to build a more robust data science/analystportfolio, a perfect way of solidifying your skills is to do some mini-projects focused on your new skills. Below we outline a few places you can find publicly available data for your next project.
If you're interested in practicing real data scientist and analyst interview questions, feel free to sign up for our email newsletter, where we send a few curated questions per week to help you prepare for interviews at top companies.
FiveThirtyEight is an interactive news and sports site that has some incredible data visualizations (which you should totally check out). They makes a lot of their data open to the public, meaning you can download and play with the source data yourself!
Here are some examples:
BuzzFeed makes the data sets, analysis, libraries, tools, and guides used in its articles available on Github. Check them out to learn from some of the best!
Here are some examples:
Kaggle, recently acquired by Google, is a place where you can learn, practice, and fine-tune your data science/analytics skills. They have tons of data that’s open to the public, and allow users of the platform to share code so you can learn best practices within the data space. They also host competitions where you can win real money if you have a top ranking model!
Here are some examples:
Socrata hosts cleaned open source data sources ranging from government, business, and education data sets.
Here are some examples:
This github hosts a library of awesome, public datasets! They are all sorted by category and link you straight to the hosting website.
Here are some examples:
Google lists all of the data sets on a page. Google has a cloud hosting service called Google Cloud Platform (GCP), and you can query using a tool called BigQuery to explore these datasets. You'll need to sign up for a GCP account, but the first 1TB of queries you make are free! But be careful not to go over or you’ll have to pay!
Here are some examples:
University of California Irvine hosts 440 data set as a service to the machine learning community. These data sets are nice because most of them are squeky clean, and are ready for modeling!
Here are some examples:
Data.gov allows you to download and explore data from multiple US government agencies. Data can range from government budgets to climate data. The data is very well documented so you should have an easy time to navigate the sources.
You can browse the data sets on Data.gov directly, without registering. You can browse by topic area, or search for a specific data set.
Here are some examples:
Academic Torrents is a site that is geared around sharing the data sets from scientific papers. It has tons of interesting data sets. You can browse the data sets directly on the site, and download if you find interesting!
Here are some examples:
Quandl is a repository of economic and financial data. Some of the datasets are free, while others are up for purchase.
Here are some examples:
Jeremy Singer-Vine collects awesome data sets across multiple sources. If you're interested in getting data sets straight to your inbox, you should consider signing up for his newsletter.
We send 3 questions each week to thousands of data scientists and analysts preparing for interviews or just keeping their skills sharp. You can sign up to receive the questions for free on our home page.