Data Science Digest 5

Posted by Chisel Analytics on Dec 26, 2019 6:45:00 AM
Chisel Analytics
Find me on:

Title: The Easy Way to Do Advanced Data Visualisation for Data Scientists

Author: George Seif, AI/Machine Learning Engineer, Kdnuggets
Source: kdnuggets.com/2019/08/advanced-data-visualisation-data-scientists.html
How:  Python library Plotly, D3.js
When to use this
: If data visualization isn't your primary area...and yet you are tasked to provide data visualizations.
Why it's helpful: Plotly provides interactivity out of the box, versus Matplotlib.
Suggested application: Fancy plots, scatter plots, box plots, heat maps.
Business impact or insights to be gained: Simpler to build with than Matplotlib with interactivity which will be well received by non-data specialist stakeholders.

Title: Version Control for Data Science — Tracking Machine Learning models and datasets

Author: Vipul Jain, The Journal Blog
Source: https://blog.usejournal.com/version-control-for-data-science-tracking-your-machine-learning-models-and-datasets-aaa61f20bb45
How:  https://dvc.org/, detailed installation instructions in a linked blog post. Works on top of GIT. System agnostic - supports GCS/S3/Azure and more.
When to use this
: When you want to control and monitor different versions of large data files like datasets and trained model files, including having the ability to rollback and/or switch among versions.

Why it's helpful: DVC enhances productivity and eliminates lost time during data processing and creating models to repeat the same state without maintaining manual log.
Suggested application: Tracking Machine Learning models, datasets, and label encodings, etc.
Business impact or insights to be gained: DVC saves time and money, creates efficient workflows. It can reuse and reproduce files fast while managing versions, running simulations, and testing programs.

Title: Stop Using Mean to Fill Missing Data

Author:  Dario Radecic, Towards Data Science
Source: https://towardsdatascience.com/stop-using-mean-to-fill-missing-data-678c0d396e22
How: Multivariate Imputation by Chained Equation (MICE), impyute library through p.
When to use this: When you need to fill missing data.
Why it's helpful: Fills the missing data numerous times. MICE can efficiently manipulate different types of data, such as continuous and binary. It can create multiple “complete” datasets. It provides more accurate datasets than the Mean Imputation approach.
Suggested application:  When working on predictive models with incomplete data sets, MICE provides higher accuracy than using MEAN values.
Business impact or insights to be gained: Improves the accuracy of datasets. As a result, you can give stakeholders correct data that better describes real-world conditions. This accurate data, in turn, will create improved outcomes.

Don't forget to subscribe for more tips and tricks to stay on top of Data Science developments!

Topics: Professional Development, Data Science Developments

Chisel Analytics

The Benefits of Analytics

Expand your insights into the opportunities that analytics can offer. Chisel Analytics provides a platform that aims to break down the barriers to building or growing your data science and analytics programs. Our blog, tools and resources help companies, recruiters and data specialists stay informed, stay organized and stay engaged.

Sign up to get content relevant to you:

About Data Science for Analytics and Operations Leaders
What IT Managers Need to Know about Data Science
Recruiting for the Data Science
Data Science Digest

Subscribe Here!

Recent Posts