Dataset: https://www.kaggle.com/datasets/himanshu9712/nypd-…
1. Import the dat
Dataset: https://www.kaggle.com/datasets/himanshu9712/nypd-…
1. Import the data to clean
The data must imported using API code. We will cover most of these in class.
Kaggle API
o https://www.kaggle.com/code/donkeys/kaggle-python-… Pandas_datareader
o https://pypi.org/project/pandas-datareader/
Pandas read_ functions (read_html, read_csv, read_json, read_table)
o *Read_csv() should point to the ONLINE source, not a local file Twitter, reddit, and other APIs available to download from pypi.org
2. Cleanup
Identify and profile the data
o Are there incorrect columns names?
Standardize the data
o Uppercases, lowercases, dates and times, dashes in ssn or phone, etc
Deal with missing data o NANs
Remove unnecessary columns or rows
Report how many records were deleted or updated
Leave a Reply