Supervised and unsupervised learning algorithms are often the first two ‘families’ of techniques introduced in machine learning classrooms and textbooks. So, what are they?
Supervised and Unsupervised Machine Learning Primer
Topics: Skillset of Data Analysts, Professional Development, How Tos, Machine Learning
My first year as a data scientist, I witnessed myself and others retyping the same lines of code and retracing our work time and time again. Perhaps some of this did not warrant concern.
After all, how long does it take to type the standard imports,
1 |
import pandas as pd
|
1 |
import numpy as np |
1 |
import matplotlib.pyplot as plt |
1 |
%matplotlib inline |
and the like?
Yet there were also plenty of real concerns, as my colleagues and I performed many of the same tasks repeatedly, filling null values, standardizing column names, and creating dummy variables. Shouldn’t we be able to standardize these rote processes and not have to recode the entire preprocessing pipeline every time?
Even worse, sometimes after a day’s worth of exploratory analysis, fruitful insights would surface, only to realize that the Jupyter notebook you’d been working on was a jumbled mess, having jumped around in the notebook repeatedly, fixing errors and rerunning cells. How on earth are you supposed to now repeat that process?
It’s also funny to me that despite proclaiming the immense value of object orientated programming, none of my instructors pointed out how to practically implement such a philosophy into a daily workflow.
I hope this article helps you sidestep the pitfalls many of us have fallen into in order to develop a more productive and sensible workflow.
Topics: Skillset of Data Analysts, How Tos, Jupyter, Python, Efficiency
Assessing Sentiment and Other Insights with Twitter Data
How can you use the Twitter API to keep a pulse on your customer base or market trends? From tracking followers to analyzing brand affinity, we’ll take a look at some various techniques that can be leveraged via the Twitter API along with logistic considerations, and regulations surrounding the Twitter API’s term of service.
Topics: Skillset of Data Analysts, Data Science Developments
In building a data-driven organization, unifying disparate datasets is essential, providing a comprehensive baseline for modeling and analysis.
But joining data together to establish this baseline can be messy.
Topics: Skillset of Data Analysts, Data Science Developments, How Tos