Building a Dashboard: Framing the Problem and Getting Started

Posted by Matt Mitchell on Jul 19, 2020 7:49:05 AM

As mentioned in our prior article on on dashboard development, framing the problem is a critical first step in your process, and one that will ultimately drive utilization and outcomes if done correctly.

It sounds simple, but it's also equally important to outline and follow a project plan. Doing so will allow you to effectively organize your efforts, keep the project going, and ground you to core goals and objectives.

Read More

Topics: Professional Development, BI, How Tos, Efficiency, Dashboards & Visualization

Supervised and Unsupervised Machine Learning Primer

Posted by Matt Mitchell on Jul 3, 2020 6:52:35 AM

Supervised and unsupervised learning algorithms are often the first two ‘families’ of techniques introduced in machine learning classrooms and textbooks. So, what are they?

Read More

Topics: Skillset of Data Analysts, Professional Development, How Tos, Machine Learning

Dashboard and Visualization Design Principals

Posted by Matt Mitchell on Jun 22, 2020 8:38:47 PM

Designing a meaningful dashboard or visualization can be a complex and difficult task.

Outlining how best to display data on top of what metrics to track and highlight is a big ask, and doing it ineffectively can diminish the impact of your analytical insights.

This article will walk you through some design considerations and how to go about implementing your very own dashboard.

Read More

Topics: Professional Development, Data Science Developments, How Tos, Dashboards & Visualization

Clicking, Typing, Hovering and Scrolling with Selenium

Posted by Matt Mitchell on Jun 12, 2020 8:32:17 AM

So you've tried to scrape some data from the latest website, only to realize your current tool set of parsing HTML pages no longer suffices.

With the rise of AJAX, many of today's websites (including the likes of Netflix and AirBnB) use React.js or similar frameworks to build interactive interfaces where the DOM itself is updated fluently based on user interactions. This contrasts with older methods of navigating to a new URL and making an additional HTTP request.

In these scenarios, older tools such as BeautifulSoup may not be enough.

Read More

Topics: Professional Development, How Tos, Web Scraping

Data Science Digest 10

Posted by Chisel Analytics on Mar 19, 2020 6:45:00 AM

As a data professional, time is at a premium. Here are some tips and trends you'll want to stay on top of!

Title: Towards open health analytics: our guide to sharing code safely on GitHub

Source: https://towardsdatascience.com/towards-open-health-analytics-our-guide-to-sharing-code-safely-on-github-5d1e018897cb
Author: Fiona Grimm
How: Provides step-by-step instructions and things to consider
When to use this: When preparing to create a GitHub page, especially which may include sensitive data
Why it's helpful: Case study with tips, instructions, checklist and links from someone who has done this before
Suggested application: Contribute to and benefit from the input of the global community
Business impact or insights to be gained: Good reference to provide management who might be resistant or concerned about sharing company code or information

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 9

Posted by Chisel Analytics on Feb 20, 2020 6:45:00 AM

Keeping up is hard for data scientists to do. Chisel Analytics is happy to help!

Title: Pandas Version 1.0 is Out! Top 4 Features Every Data Scientist Should Know

Source: https://www.analyticsvidhya.com/blog/2020/01/pandas-version-1-top-4-features/
How: Make sure you have the current version of Pandas. If yours is an older version (includes 2.x), please update with
$ pip install --upgrade pandas==1.0.0rc0
Also, "first upgrade to Pandas 0.25 and to ensure your code is working without warnings, before upgrading to pandas 1.0."
When to use this: When you want to: filter and "analyze categorical and text-based features;" do calculations with missing values to generate "null" versus false; present data about the info in your dataframe or markdown tables in a clear fashion; plus more enhancements.
Why it's helpful: Now this widely used library offers: Dedicated DataTypes for strings, New Scalar for Missing Values, Improved Data Information Table, Markdown format for Dataframes.
Suggested application: When sharing information with those not used to working in the datasets or keeping logs for future and quick reference, or running calculations that can incorporate more records by leveraging a "null" value versus "false".
Business impact or insights to be gained: as more real world challenges are faced by data professionals, this open source data analysis/ manipulation tool continues to evolve to provide fast, flexible and expressive data structures for working with relational or labeled data

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 8

Posted by Chisel Analytics on Feb 6, 2020 6:45:00 AM

Keeping up is hard for data scientists to do. Chisel Analytics is happy to help!

Title: Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data

Source: https://github.com/benedekrozemberczki/karateclub and https://karateclub.readthedocs.io/en/latest/notes/introduction.html
How: GitHub installation and documentation for data handling, full list of implemented methods, and datasets.
When to use this: When you need to perform "small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping community detection methods."
Why it's helpful: Incorporates Overlapping Community Detection, Non-Overlapping Community Detection, Neighborhood-Based Node Level Embedding, Structural Node Level Embedding, Attributed Node Level Embedding, and Graph Level Embedding.
Suggested application: Use the clusterings and embeddings for downstream learning. Use case examples include: how well Facebook page clusters and group memberships are aligned, abuse of the platform Twitch, classification of threads on Reddit.
Business impact or insights to be gained: "Only quick and minimal changes to the code are needed when a model performs poorly."

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 7

Posted by Chisel Analytics on Jan 23, 2020 6:45:00 AM

A quick look at three of the major players and some of what they have to offer around machine learning.

Title: Amazon Forecast - Accurate time-series forecasting service, based on the same technology used at Amazon.com, no machine learning experience required

Source: https://aws.amazon.com/forecast/
How: Upload your historical and related data, Amazon machine learning and AI generates various forecasts.
When to use this: When you don't have the resources, tools or in-house talent to build out a forecasting model and system which can accommodate multiple data series which change over time.
Why it's helpful: Fully managed service "so no servers to provision or machine learning models to build, train or deploy." Pay as you go so workable for most budgets.
Suggested application: Product demand planning, financial planning, resource planning.
Business impact or insights to be gained: Leveraging machine learning developed by Amazon, forecasts are more accurate and prepared in much shorter time (e.g., from months to hours).

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 6

Posted by Chisel Analytics on Jan 9, 2020 6:45:00 AM

Title: The 5 Most Useful Techniques to Handle Imbalanced Datasets

Author: Rahul Agarwal, Senior Statistical Analyst at Walmart Labs
Source: https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
How: resampling, imbalanced-learn(imblearn); Tomek Links, SMOTE (Synthetic Minority Oversampling Technique); sklearn,
When to use this: At the occurrence of imbalanced datasets, that is, when "you have such a small sample for the positive class in your dataset that the model is unable to learn".
Why it's helpful: Address the problem of an imbalanced dataset: Random undersampling and oversampling, Undersampling and Oversampling using imbalanced-learn, Class weights in the models, and Change your Evaluation Metric.
Suggested application: Finance, marketing/ ad serving, transportation/ airline, medical, content moderation, etc.
Business impact or insights to be gained: Imbalanced datasets "fail to capture the minority class, which is most often the point of creating the model in the first place." Thus, analysis might overlook fraudulent bank transactions, identifying whether a patient has a rare disease, the faulty structural integrity of aircraft, etc.

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 5

Posted by Chisel Analytics on Dec 26, 2019 6:45:00 AM

Title: The Easy Way to Do Advanced Data Visualisation for Data Scientists

Author: George Seif, AI/Machine Learning Engineer, Kdnuggets
Source: kdnuggets.com/2019/08/advanced-data-visualisation-data-scientists.html
How:  Python library Plotly, D3.js
When to use this
: If data visualization isn't your primary area...and yet you are tasked to provide data visualizations.
Why it's helpful: Plotly provides interactivity out of the box, versus Matplotlib.
Suggested application: Fancy plots, scatter plots, box plots, heat maps.
Business impact or insights to be gained: Simpler to build with than Matplotlib with interactivity which will be well received by non-data specialist stakeholders.

Read More

Topics: Professional Development, Data Science Developments

Chisel Analytics

The Benefits of Analytics

Expand your insights into the opportunities that analytics can offer. Chisel Analytics provides a platform that aims to break down the barriers to building or growing your data science and analytics programs. Our blog, tools and resources help companies, recruiters and data specialists stay informed, stay organized and stay engaged.

Sign up to get content relevant to you:

About Data Science for Analytics and Operations Leaders
What IT Managers Need to Know about Data Science
Recruiting for the Data Science
Data Science Digest

Subscribe Here!

Recent Posts