Data at Work: 3 Real-World Problems Solved by Data Science
By Patrick Smith
At first glance, data science seems to be just another business buzzword — something abstract and ill-defined. While data can, in fact, be both of these things, it’s anything but a buzzword. Data science and its applications have been steadily changing the way we do business and live our day-to-day lives — and considering that 90% of all of the world’s data has been created in the past few years, there’s a lot of growth ahead of this exciting field.
While traditional statistics and data analysis have always focused on using data to explain and predict, data science takes this further . It uses data to learn — constructing algorithms and programs that collect from various sources and apply hybrids of mathematical and computer science methods to derive deeper actionable insights. Whereas traditional analysis uses structured data sets, data science dares to ask further questions, looking at unstructured “big data” derived from millions of sources and nontraditional mediums such as text, video, and images. This allows companies to make better decisions based on its customer data.
So how is this all manifesting in the market? Here, we look at three real-world examples of how data science drives business innovation across various industries and solves complex problems.
AirBnB uses data science and advanced analytics to help renters set their prices.
The vacation broker Airbnb has always been a business informed by data. From understanding the demographics of renters to predicting availability and prices, Airbnb is a prime example of how the tech industry is leveraging data science. In fact, the company even has an entire section of its blog dedicated to the groundbreaking work its data team is doing. The team understands the importance of data quality, data mining, and data analytics.
Faced with a large amount of data from customers, hosts, locations, and demand for rentals, Airbnb went about using data science to create a dynamic pricing system called Aerosolve, which has since been released as an open-source resource.
Using a machine learning algorithm, Aerosolve’s predictive model takes the optimal price for a rental based on its location, time of year, and a variety of other attributes. For Airbnb hosts, it revolutionized how rental owners can best set their prices in the market and maximize returns. And that’s not all — Airbnb’s data scientists have also recently launched Airflow , an open source workflow management platform for building data pipelines to ingest data easily.
There’s no shortage of need for these solutions, and for the foreseeable future, we’ll be seeing explosive growth in data science solutions for technology companies like Airbnb
Data science revolutionizes sports analytics.
After the 2003 book Moneyball (and corresponding 2011 film) became successful, sports teams have realized that their data is more powerful than they had ever imagined. Over the past few years, the Strategic Innovations Group at the consulting firm Booz Allen Hamilton has been doing just that — working to transform the way teams utilize data.
Using data science and machine learning tactics, Booz Allen’s team developed an application for MLB coaches to predict any pitcher’s throw with up to 75% accuracy, changing the way that teams prepare for a game. Looking at all pitchers who had thrown more than 1,000 pitches, the team developed a model that considers current at-bat statistics, in-game situations, and generic pitching measures to predict the next pitch.
Now, before a game starts, a coach can analyze an opposing team’s lineup and run predictive models to anticipate how to structure his plays to add capability for his team and change how the game itself is played.
Nonprofits solve the most pressing social issues with data.
Founded in 2014, San Francisco-based Bayes Impact is a group of experienced data scientists assisting nonprofits in tackling some of the world’s heaviest data challenges. Since it’s founding, Bayes has helped the U.S. Department of Health make better matches between organ donors and those who need transplants, worked with the Michael J. Fox Foundation to develop better data science methods for Parkinson’s research, and created methods to help detect fraud in microfinance. Bayes is also developing a model to help the City of San Francisco harness data science to optimize essential services like emergency response rates. Through organizations like Bayes, data science has the power to make a significant social impact in our data-driven world.
So, what does all of this mean for the job market? With the ever-increasing need for data-driven solutions across every industry, the demand for data scientists has outpaced supply. According to a recent study by McKinsey , “By 2018, the United States will face a shortage of up to 190,000 data scientists with advanced training in statistics and machine learning as well as 1.5 million managers and analysts with enough proficiency in statistics to use big data effectively.”
It’s no wonder, then, that data scientists are one of the few non-managerial positions included by Glassdoor in the top 25 highest-paying jobs in America . Plus, in their annual list of the 25 Best Jobs in America , Glassdoor rated data scientists as No. 1 one due to the high median base salary, a number of openings, and career opportunity.
Two things are certain: There is a serious need for data scientists in today’s job market, and no shortage of life-changing problems that data wranglers can solve.
Learn how to solve today’s toughest problems with data.
LEARN MORE ABOUT OUR PART TIME DATA SCIENCE COURSE

Get in Touch

Data Science Central
- Author Portal
- 3D Printing
- AI Data Stores
- AI Hardware
- AI Linguistics
- AI User Interfaces and Experience
- AI Visualization
- Cloud and Edge
- Cognitive Computing
- Containers and Virtualization
- Data Science
- Data Security
- Digital Factoring
- Drones and Robot AI
- Internet of Things
- Knowledge Engineering
- Machine Learning
- Quantum Computing
- Robotic Process Automation
- The Mathematics of AI
- Tools and Techniques
- Virtual Reality and Gaming
- Blockchain & Identity
- Business Agility
- Business Analytics
- Data Lifecycle Management
- Data Privacy
- Data Strategist
- Data Trends
- Digital Communications
- Digital Disruption
- Digital Professional
- Digital Twins
- Digital Workplace
- Marketing Tech
- Sustainability
- Agriculture and Food AI
- AI and Science
- AI in Government
- Autonomous Vehicles
- Education AI
- Energy Tech
- Financial Services AI
- Healthcare AI
- Logistics and Supply Chain AI
- Manufacturing AI
- Mobile and Telecom AI
- News and Entertainment AI
- Smart Cities
- Social Media and AI
- Functional Languages
- Other Languages
- Query Languages
- Web Languages
- Education Spotlight
- Newsletters
- O’Reilly Media
33 unusual problems that can be solved with data science
- August 28, 2014 at 5:00 pm
Here is a non-exhausting list of curious problems that could greatly benefit from data analysis. If you think you can’t get a job as a data scientist (because you only apply to jobs at Facebook, LinkedIn, Twitter or Apple), here’s a way to find or create new jobs, broaden your horizons, and make Earth a better world not just for human beings, but for all living creatures. Even beyond Earth indeed. Help us grow this list of 33 problems, to 100+.
The actual number is higher than 33, as I’m adding new entries.

Figure 1: related to problem #33
- Automated translation, including translating one programming language into another one (for instance, SQL to Python – the converse is not possible)
- Spell checks, especially for people writing in multiple languages – lot’s of progress to be made here, including automatically recognizing the language when you type, and stop trying to correct the same word every single time (some browsers have tried to change Ning to Nong hundreds of times, and I have no idea why after 50 failures they continue to try – I call this machine unlearning )
- Detection of earth-like planets – focus on planetary systems with many planets to increase odds of finding inhabitable planets, rather than stars and planets matching our Sun and Earth
- Distinguishing between noise and signal on millions of NASA pictures or videos, to identify patterns
- Automated piloting (drones, cars without pilots)
- Customized, patient-specific medications and diets
- Predicting and legally manipulating elections
- Predicting oil demand, oil reserves, oil price, impact of coal usage
- Predicting chances that a container in a port contains a nuclear bomb
- Assessing the probability that a convict is really the culprit, especially when a chain of events resulted in a crime or accident (think about a civil airplane shot down by a missile)
- Computing correct average time-to-crime statistics for an average gun (using censored models to compensate for the bias caused by new guns not having a criminal history attached to them)
- Predicting iceberg paths: this occasionally requires icebergs to be towed to avoid collisions
- Oil wells drilling optimization: how to digg as few test wells as possible to detect the entire area where oil can be found
- Predicting solar flares: timing, duration, intensity and localization
- Predicting Earthquakes
- Predicting very local weather (short-term) or global weather (long-term); reconstructing past weather (like 200 million years old)
- Predicting weather on Mars to identify best time and spots for a landing
- Predict riots based on tweets
- Designing metrics to predict student success, or employee attrition
- Predicting book sales, determining correct price, price elasticity and whether a specific book should be accepted or rejected by a publisher, based on projected ROI
- Predicting volcano risk, to evacuate populations or cancel flights, while minimizing expenses caused by these decisions
- Predicting 500-year floods, to build dams
- Actuarial science: predict your death, and health expenditures, to compute your premiums (based on which population segment you belong to)
- Predicting reproduction rate in animal populations
- Predicting food reserves each year (fish, meat, crops including crop failures caused by diseases or other problems). Same with electricity and water consumption, as well as rare metals or elements that are critical to build computers and other modern products.
- Predicting longevity of a product, or a customer
- Asteroid risks
- Predicting duration, extent and severity of draught or fires
- Predicting racial and religious mix in a population, detecting change point (e.g. when more people speak Spanish than English, in California) to adapt policies accordingly
- Attribution modeling to optimize advertising mix, branding efforts and organic traffic
- Predicting new flu viruses to design efficient vaccines each year
- Explaing hexagonal patterns in this Death Valley picture (see Figure 1)
- Road constructions, HOV lanes, and traffic lights designed to optimize highway traffic. Major bottlenecks are caused by 3-lanes highways suddenly narrowing down to 2-lanes on a short section and for no reasons, usually less than 100 yards long. No need for big data to understand and fix this, though if you don’t know basic physics (fluids theory) and your job is traffic planning / optimization / engineering, then big data – if used smartly – will help you find the cause, and compensate for your lack of good judgement. These bottlenecks should be your top proprity, and not expensive to fix.
- Google algorithm to predict duration of a road trip, doing much better than GPS systems not connected to the Internet. Potential improvement: when Google tells me that I will arrive in Portland at 5pm when I’m currently in Seattle at 2pm, it should incorporate forecasted traffic in Portland at 5pm: that is, congestion due to peak telecommuting time, rather than making computations based on Portland traffic at 2pm.
Other articles
- Data science apprenticeship
- Data science certification
- Previous digests
- Data science resources
- Competitions and Challenges
- Salary surveys
- Data science books
- How to detect spurious correlations, and how to find the real ones
- Data science job ads that do not attract candidates, versus those t…
- Data Science and Analytics Jobs
- Hadoop resources
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- Our Wiley Book on Data Science
- Data Science Top Articles
- Our Data Science Weekly Newsletter
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- What makes up data science?
- DSC webinar series
Related Content
We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning.
Welcome to the newly launched Education Spotlight page! View Listings
- Data Science
Job guarantee
- How it works Overview Job guarantee Payment options Scholarships
- Students Student Outcomes Student Stories Community
12 Data Science Projects To Try (From Beginner to Advanced)
In this article
What Is a Data Science Project?
Data science projects to try, datasets for data science project ideas, tips for creating interesting data science projects, data science projects faqs.

From breast cancer detection to user experience design, businesses across the globe are leveraging data science to solve a wide range of problems. Every mobile/web-based product or digital experience today demands the application of data science for personalization, customer experience, and so on. This opens up a world of opportunities for data science professionals.
To land a data science job, however, early career professionals need more than just a strong theoretical foundation. Hiring managers today are looking for data scientists who have the hands-on experience of delivering projects that solve real-world problems. Even before you land your first job, you need to have ‘experience’ demonstrating your ability to deliver them. No sweat. We’ve brought help.
A data science project is a practical application of your skills. A typical project allows you to use skills in data collection, cleaning , analysis, visualization, programming, machine learning, and so on. It helps you take your skills to solve real-world problems. On successful completion, you can also add this to your portfolio to show your skills to potential employers.
Whether you’re a complete beginner or one with advanced skills, you can gain hands-on experience by trying out projects on your own or working with peers. To help you get started, we’ve curated a list of the top 15 interesting data science projects to try. See what catches your fancy and get started!
Beginner Data Science Projects
“eat, rate, love”—an exploration of r, yelp, and the search for good indian food.

When it comes time to eat, many people turn to Yelp to choose the best options for the type of food they’re looking for. They search, eat, rate, and leave reviews for the restaurants they’ve visited. This makes Yelp a great source of data to run data science projects.
A Springboard Data Science Bootcamp graduate Robert Chen chose this data to explore if the best reviews led to the best Indian restaurants. Chen discovered while searching Yelp that there were many recommended Indian restaurants with similar scores. Certainly, not all the reviewers had the same knowledge of this cuisine, right? With this in mind, he took into consideration the following:
- The number of restaurant reviews by a single person of a particular cuisine (in this case, Indian food). He was able to justify this parameter by looking at reviewers of other cuisines, such as Chinese food.
- The apparent ethnicity of the reviewer in question. If the reviewer had an Indian name, he could infer that they might be of Indian ethnicity, and therefore more familiar with what constituted good Indian food.
- He used Python and R programming languages.
His modification to the data and the variables showed that those with Indian names tended to give good reviews to only one restaurant per city out of the 11 cities he analyzed, thus providing a clear choice per city for restaurant patrons.
Yelp’s data has become popular among newcomers to data science. You can access it here . Find out more about Robert’s project here .
Customer Segmentation with R, PCA, and K-Means Clustering

Marketers perform complex segmentation across demographic, psychographic, behavioral, and preference data for each customer to deliver personalized products and services. To do this at scale, they leverage data science techniques like supervised learning.
Data scientist Rebecca Yiu’s project on market segmentation for a fictional organization, using R, principal component analysis (PCA), and K-means clustering, is an excellent example of this. She uses data science techniques to identify the prospective customer base and applies clustering algorithms to group them. She classifies customers into clusters based on age, gender, region, interests, etc. This data can then be used for targeted advertising, email campaigns, and social media posts.
You can learn more about her data science project here .
Road Lane Line Detection

To follow lane discipline, self-driving cars need to detect the lane line. Data science and machine learning can play a crucial role in making this happen. Using computer vision techniques, you can build an application to autonomously identify track lines from continuous video frames or image inputs. Data scientists typically use OpenCV library, NumPy, Hough Transform, Spacial Convolutional Neural Networks (CNN), etc., to achieve this.
You can access a sample video for this project from this git repository here .
Intermediate Data Science Projects
Nfl third and goal behavior.

The intersection of sports and data is full of opportunities for aspiring data scientists . Divya Parmar, a lover of both, decided to focus on the NFL for his capstone project during Springboard’s Introduction to Data Science course. His goal was to determine the efficiency of various offensive plays in different tactical situations.
Parmar collected play-by-play data from Armchair Analysis, and used R and RStudio for analysis. He developed a new data frame and used conventional NFL definitions. Through this project, he learned to:
- Assess the problem
- Manipulate data
- Deliver actionable insights to stakeholders
You can access the dataset here .
Who’s a Good Dog? Identifying Dog Breeds Using Neural Networks

Image classification is one of the most popular and widely in-demand data science projects. Classifying dogs based on their breeds by looking into their image is a highly loved data science project. Garrick Chu , a graduate of Springboard’s Data Science Career Track, chose this for his final year submission.
One of Garrick’s goals was to determine whether he could build a model that would be better than humans at identifying a dog’s breed from an image. Because this was a learning task with no benchmark for human accuracy, once Garrick optimized the network to his satisfaction, he went on to conduct original survey research to make a meaningful comparison.
He worked with large data sets to effectively process images (rather than traditional data structures) with network design and tuning, avoiding over-fitting, transfer learning (combining neural nets trained on different data sets), and performing exploratory data analysis.
To do this, he leveraged neural networks with Keras through Jupyter notebooks. You can explore more of Garrick’s work here and access the data set he used here .
Uber’s Pickup Analysis

Is Uber Making NYC Rush-Hour Traffic Worse? —This was one of the four questions answered by FiveThirtyEight, a data-driven news website now owned by ABC. If you are looking to improve your data analysis and data visualization skills, this is a great data science project.
For this, FiveThirtyEight obtained Uber’s rideshare data and analyzed it to understand ridership patterns, how it interacts with public transport, and how it affects taxis. They then wrote detailed news stories supported by this data analysis. You can read their work of data journalism here . You can access the original data on Github .
Predicting Restaurant Success

Here is another Yelp-based project, but more complex than the one we discussed earlier. Data scientist Michail Alifierakis used Yelp data to build his “Restaurant Success Model” to evaluate the success/failure rates of restaurants. He uses a linear logistic regression model for its simplicity and interpretability, optimized for the precision of open restaurants using grid search with cross-validation.
This is a great data science use case for lenders and investors, helping them make profitable financial decisions. You can learn more about the project from here and take a look at the code on GitHub .
Predictive Policing

Many law enforcement agencies worldwide are moving towards data-driven approaches to forecasting and preventing crimes. They leverage data science technologies to automate the pattern detection process that will help to reduce the burden on crime analysts. Data scientist Orlando Torres launched a data science project on predictive policing, albeit to unexpected results. He used data from the open data initiative and trained the model on 2016 data to predict the crime incidents in a given zip code, day, and time in 2017. He used linear regression, random forest regressor, K-nearest neighbors, XGBoost, and deep learning model — multilayer perceptron.
With this data science project, he learned that it is very easy to lose explainability while building models. He writes, “if we start sending more police to the areas where we predict more crime, the police will find crime. However, if we start sending more police anywhere, they will also find more crime. This is simply a result of having more police in any given area trying to find crime.” Given the number of law enforcement agencies using data science for policing, it almost feels like a self-fulfilling prophecy.
You can read more about his project here .
Building Chatbots

Today, businesses are automating their customer services with chatbots. Creating your own chatbot can be a great data science project too. The two types of chatbots available today are domain-specific chatbots and open-domain chatbots. They both use Natural Language Processing (NLP) and Recurrent Neural Networks (RNN). For an intermediary data scientist, you can perhaps take this up a notch—try creating a sensitive chatbot with capabilities to detect user sentiment.
Patrick Meyer runs a data science project of this kind. He discusses using the polarity system to identify happy, neutral, and unhappy; Paul Ekman’s initial model with six emotions—anger, disgust, fear, joy, sadness, and surprise or his extended list of sixteen; Robert Plutchik’s wheel of emotions and Ortony, Clore, and Collins (OCC) model.
You can learn more about his detection techniques here . And access the dataset here .
Advanced Data Science Projects
Amazon vs. ebay analysis.

Finding the lowest price for a product on the Internet makes up a significant part of online shopping. Chase Roberts decided to make that easier. In support of a Chrome extension he was building, Roberts compared the prices of 3,500 products on eBay and Amazon. The results showed the potential for substantial savings. For his project, Roberts built a shopping cart with 3,520 products to compare prices on eBay vs. Amazon. Here’s what he found:
- If you chose the wrong platform to buy each of these items (by always shopping at whichever site has a more expensive price), this cart would cost you $193,498.45. (Or you could pay off your mortgage.) This is the worst-case scenario for the shopping cart.
- The best-case scenario for our shopping cart, assuming you found the lowest price between eBay and Amazon on every item, is $149,650.94. This is a $44,000 difference—or 23%!
You can read more about his project, starting with how he gathered the data and documenting the challenges he faced during this process.
Fake News Detection

A recent study revealed that false news spread faster and reached more people than the truth and around 52% of Americans shared that they regularly encountered fake news online. A four-person team from the University of California at Berkeley built a fake news classifier . For this, the team focussed on clickbait and propaganda, the two common forms of fake news. They then developed a classifier that would detect these two forms. Their process involved:
- Taking data from news sources listed on OpenSources
- Used NLP to do the preliminary processing of articles for content-based classification
- Trained various machine learning models to divide the news articles
- Developed a web application to act as the front end of their classifier.
You can learn and try out more about this here .
Audio Snowflake

When you think about interesting data science projects, chances are you think about how to solve a particular problem, as seen in the examples above. But what about creating a project for the sheer beauty of the data? For her Hackbright Academy project, Wendy Dherin did just that.
She developed Audio Snowflake to create a splendid visual representation of music as it played, capturing specific components like tempo, key, mood, and duration. Audio Snowflake mapped both quantitative and qualitative characteristics of songs to visual traits like saturation, color, rotation speed, and figures it produces.
Read more on this project here .
Visualizing Climate Change

2020 was recorded as the warmest year to date by NASA, and the last seven years have been the warmest seven years on record. Climate change is one of the most pressing issues humans face today. It is more important than ever to spread awareness and inform people of the magnitude of this problem. Data visualization can play a crucial role in that.
The data scientist Giannis Tolios did a project where he visualized the changes in global mean temperatures and the rise of CO2 levels in the atmosphere using Python . He uses various libraries such as Pandas, Matplotlib, and Seaborn for the data, visualizing it in line graphs and scatterplots. If climate change is a topic you want to work on, you can learn more about the project here .
Democratizing Data Science at Uber

One of the key challenges in data science is that it requires one to be a mathematician or a statistician even to make basic predictions and forecasts. Uber’s data science platform overcomes this challenge by automating forecasting using pre-built algorithms and tools, enabling everyone on the team to get predictions as long as they have data.
Director of Data Science at Uber, Franziska Bell , talks about how they plan to give the capabilities of a data scientist to every Uber employee. This way, Uber uses artificial intelligence, machine learning, and data science to solve real-world problems. Read more about it here .
Credit Card Fraud Detection

With online and digital transactions gaining more popularity today, their chances of being fraudulent are also on the rise. Therefore banks and financial institutions are looking to leverage data science techniques to identify fraudulent transactions and prevent them from being executed. By processing data across customer location, behavior, transaction value, network, payment method, etc., you can train the algorithm to detect anomalies. You can build your classification engine for fraud detection using decision trees , K-nearest neighbor, logistic regression , support vector machine, random forest, and XGBoost.
To get started, you can find datasets here .

Here are some online data sources which you can access and download for free for your data science projects:
VoxCeleb . A gender-balanced, audio-visual data set containing short clips of human speech from speakers of different ages, professions, accents, etc. They are extracted from interviews uploaded to YouTube. It can be used for various applications like speech separation, speaker identification, emotion recognition, etc.
Boston Housing Data . A fairly small data set based on the information collected by the U.S. Census Bureau data regarding housing in Boston. This data set can be used for assessment, focusing on the regression problem.
Kaggle . With over 50,000 public datasets on a wide range of topics, you can find all the data and code that you require to do your data science project ideas. They also offer competitive data sets that are clean, detailed, and curated.
National Centres for Environmental Information . The largest storehouse of environmental data in the world, this provides information on the oceanic, atmospheric, meteorological, geophysical, climatic conditions, and more.
Global Health Observatory . If you are interested in doing projects in the health industry, then this is the best place to get the data you need. It also has some of the latest COVID-19 data.
Google Cloud Public Datasets . A place where you can access data sets that are hosted by BigQuery , Cloud Storage , Earth Engine , and other Google Cloud services.
Amazon Web Services Open Data Registry . This has an extensive repository of data sets that you can either download and use or analyze on the Amazon Elastic Compute Cloud (Amazon EC2). You need to first create a free AWS account to get access to the data sets.

To help you navigate the world of data science projects, we asked Springboard mentors and instructors for their advice. Here’s what they had to say.
Choose the Right Problem
If you’re a data science beginner, it’s best to consider problems that have limited data and variables. Otherwise, your project may get too complex too quickly, potentially deterring you from moving forward. Choose one of the data sets in this post, or look for something in real life that has a limited data set. Data wrangling can be tedious work, so it’s critical, especially when starting out, to make sure the data you’re manipulating and the larger topic is interesting to you. These are challenging projects, but they should be fun!
Breaking Up the Project Into Manageable Pieces
Your next task is to outline the steps you’ll need to take in order to create your data science project. Once you have your outline, you can tackle the problem and develop a model to prove your hypothesis. You can do this in six steps:
- Generate your hypotheses
- Study the data
- Clean the data
- Engineer the features
- Create predictive models
- Communicate your results
Generate Your Hypotheses
After you have your problem, you need to create at least one hypothesis to help solve the problem. The hypothesis is your belief about how the data reacts to certain variables.
This is, of course, dependent on you obtaining the general demographics of specific neighborhoods. You will need to create as many hypotheses as you need to solve the problem.
Study the Data
Your hypotheses need to have data that will allow you to prove or disprove them. Look in the data set for variables that affect the problem. If you do not have the data, either dig deeper or change your hypothesis.
Clean the Data
As much as data scientists prefer to have clean, ready-to-go data, the reality is seldom neat or orderly. You may have outlier data that you can’t readily explain, like a sudden large, one-time purchase of an expensive item in a store that is in a lower-income neighborhood. Or maybe one store didn’t report data for a week.
These are all problems with the data that aren’t the norm. In these cases, it’s up to you as a data scientist to remove those outliers and add missing data so that the data is more or less consistent. Without these changes, your results will become skewed, and the outlier data will affect the results, sometimes drastically.
Engineer the Features
At this stage, you need to start assigning variables to your data. You need to factor in what will affect your data. Does a heatwave during the summer cause sales to drop? Does the holiday season affect sales in all stores and not just middle-to-high-income neighborhoods? Things like seasonal purchases become variables you need to account for.
Create Your Predictive Models
At some point, you’ll have to come up with predictive models to support your hypotheses. For example, you’ll have to write code to predict sales. You may explore whether an after-Christmas sale increases profits and, if so, by how much. You may find that a certain percentage of sales earns more money than other sales, given the volume and overall profit.
Communicate Your Results
In the real world, all the analysis and technical results you come up with are of little value unless you can explain to your stakeholders what they mean in a comprehensible and compelling way. Data storytelling is a critical and underrated skill that you must develop. To finish your project, you’ll want to create a data visualization or a presentation that explains your results to non-technical folks.
Get To Know Other Data Science Students

Karen Masterson
Data Analyst at Verizon Digital Media Services

Mikiko Bazeley
ML Engineer at MailChimp

Leoman Momoh
Senior Data Engineer at Enterprise Products
How Do You Measure the Success of Data Science Projects?
As a learner, the most critical measure of success is that you have put your skills and knowledge to practice. Good data science projects not only show that you can solve problems but also shows the potential employer how you approach problem-solving. As long as you can add your project to your portfolio, consider it successful.
How Can You Find Interesting Data Science Projects To Try?
This blog post should get you started on various projects you could take up. Online courses like the Springboard Data Science Bootcamp include real-world projects that amplify your portfolio. You can contribute to open-source projects. You can also participate in competitions on platforms like Kaggle and Driven Data to improve your model-building skills.
How Can You Showcase Your Data Science Projects?
You can: – Include it in your resume – Link them to your Linkedin profile – Maintain an active Github account – Create your portfolio website – Write case studies of your projects and publish them on a blog/Medium
Since you’re here… Are you a future data scientist? Investigate with our free guide to what a data scientist actually does . When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp that guarantees a job or your tuition back!
Download our guide to becoming a data scientist in six months
Learn how to land your dream data science job in just six months with in this comprehensive guide.
Related Articles
How much does a data scientist at facebook earn.

K Means Clustering Machine Learning Algorithm: Introduction and Implementation

3 Proven Steps For Career Transition from Data Analyst to Data Scientist

- Data Analytics Bootcamp
- Data Science Bootcamp
- Data Engineering Bootcamp
- Machine Learning Bootcamp
- Software Engineering Bootcamp
- UI/UX Design Bootcamp
- UX Bootcamp
- Cyber Security Bootcamp
- Tech Sales Bootcamp
- Free Learning Paths
- E-books and Guides
- Career Assessment Test
- Student Outcomes
- Compare Bootcamps
- About the Company
- Become a Mentor
- Hire Our Students
- Universities
- Student Beans
- Inclusion Scholarships

9 unusual problems that can be solved using Data Science

Too Long; Didn't Read
Companies mentioned.

@ technoreview

Web Development & Ecommerce Writing Contest
RELATED STORIES

- Solving Problems with Data Science

Aakash Tandel , Former Data Scientist
Article Categories: #Strategy , #Data & Analytics
Posted on December 3, 2018
There is a systematic approach to solving data science problems and it begins with asking the right questions. This article covers some of the many questions we ask when solving data science problems at Viget.
A challenge that I’ve been wrestling with is the lack of a widely populated framework or systematic approach to solving data science problems. In our analytics work at Viget, we use a framework inspired by Avinash Kaushik’s Digital Marketing and Measurement Model . We use this framework on almost every project we undertake at Viget. I believe data science could use a similar framework that organizes and structures the data science process.
As a start, I want to share the questions we like to ask when solving a data science problem. Even though some of the questions are not specific to the data science domain, they help us efficiently and effectively solve problems with data science.
Business Problem
What is the problem we are trying to solve?
That’s the most logical first step to solving any question, right? We have to be able to articulate exactly what the issue is. Start by writing down the problem without going into the specifics, such as how the data is structured or which algorithm we think could effectively solve the problem.
Then try explaining the problem to your niece or nephew, who is a freshman in high school. It is easier than explaining the problem to a third-grader, but you still can’t dive into statistical uncertainty or convolutional versus recurrent neural networks. The act of explaining the problem at a high school stats and computer science level makes your problem, and the solution, accessible to everyone within your or your client’s organization, from the junior data scientists to the Chief Legal Officer.
Clearly defining our business problem showcases how data science is used to solve real-world problems. This high-level thinking provides us with a foundation for solving the problem. Here are a few other business problem definitions we should think about.
- Who are the stakeholders for this project?
- Have we solved similar problems before?
- Has someone else documented solutions to similar problems?
- Can we reframe the problem in any way?
And don’t be fooled by these deceivingly simple questions. Sometimes more generalized questions can be very difficult to answer. But, we believe answering these framing question is the first, and possibly most important, step in the process, because it makes the rest of the effort actionable.
Say we work at a video game company — let’s call the company Rocinante. Our business is built on customers subscribing to our massive online multiplayer game. Users are billed monthly. We have data about users who have cancelled their subscription and those who have continued to renew month after month. Our management team wants us to analyze our customer data.
Well, as a company, the Rocinante wants to be able to predict whether or not customers will cancel their subscription . We want to be able to predict which customers will churn, in order to address the core reasons why customers unsubscribe. Additionally, we need a plan to target specific customers with more proactive retention strategies.
Churn is the turnover of customers, also referred to as customer death. In a contractual setting - such as when a user signs a contract to join a gym - a customer “dies” when they cancel their gym membership. In a non-contractual setting, customer death is not observed and is more difficult to model. For example, Amazon does not know when you have decided to never-again purchase Adidas. Your customer death as an Amazon or Adidas customer is implied.

Possible Solutions
What are the approaches we can use to solve this problem.
There are many instances when we shouldn’t be using machine learning to solve a problem. Remember, data science is one of many tools in the toolbox. There could be a simpler, and maybe cheaper, solution out there. Maybe we could answer a question by looking at descriptive statistics around web analytics data from Google Analytics. Maybe we could solve the problem with user interviews and hear what the users think in their own words. This question aims to see if spinning up EC2 instances on Amazon Web Services is worth it. If the answer to, “Is there a simple solution,” is, “No,” then we can ask, “ Can we use data science to solve this problem? ” This yes or no question brings about two follow-up questions:
- “ Is the data available to solve this problem? ” A data scientist without data is not a very helpful individual. Many of the data science techniques that are highlighted in media today — such as deep learning with artificial neural networks — requires a massive amount of data. A hundred data points is unlikely to provide enough data to train and test a model. If the answer to this question is no, then we can consider acquiring more data and pipelining that data to warehouses, where it can be accessed at a later date.
- “ Who are the team members we need in order to solve this problem? ” Your initial answer to this question will be, “The data scientist, of course!” The vast majority of the problems we face at Viget can’t or shouldn’t be solved by a lone data scientist because we are solving business problems. Our data scientists team up with UXers , designers , developers , project managers , and hardware developers to develop digital strategies and solving data science problems is one part of that strategy. Siloing your problem and siloing your data scientists isn’t helpful for anyone.
We want to predict when a customer will unsubscribe from Rocinante’s flagship game. One simple approach to solving this problem would be to take the average customer life - how long a gamer remains subscribed - and predict that all customers will churn after X amount of time. Say our data showed that on average customers churned after 72 months of subscription. Then we could predict a new customer would churn after 72 months of subscription. We test out this hypothesis on new data and learn that it is wildly inaccurate. The average customer lifetime for our previous data was 72 months, but our new batch of data had an average customer lifetime of 2 months. Users in the second batch of data churned much faster than those in the first batch. Our prediction of 72 months didn’t generalize well. Let’s try a more sophisticated approach using data science.
- Is the data available to solve this problem? The dataset contains 12,043 rows of data and 49 features. We determine that this sample of data is large enough for our use-case. We don’t need to deploy Rocinante’s data engineering team for this project.
- Who are the team members we need in order to solve this problem? Let’s talk with the Rocinante’s data engineering team to learn more about their data collection process. We could learn about biases in the data from the data collectors themselves. Let’s also chat with the customer retention and acquisitions team and hear about their tactics to reduce churn. Our job is to analyze data that will ultimately impact their work. Our project team will consist of the data scientist to lead the analysis, a project manager to keep the project team on task, and a UX designer to help facilitate research efforts we plan to conduct before and after the data analysis.

How do we know if we have successfully solved the problem?
At Viget, we aim to be data-informed, which means we aren’t blindly driven by our data, but we are still focused on quantifiable measures of success. Our data science problems are held to the same standard. What are the ways in which this problem could be a success? What are the ways in which this problem could be a complete and utter failure? We often have specific success metrics and Key Performance Indicators (KPIs) that help us answer these questions.
Our UX coworker has interviewed some of the other stakeholders at Rocinante and some of the gamers who play our game. Our team believes if our analysis is inconclusive, and we continue the status quo, the project would be a failure. The project would be a success if we are able to predict a churn risk score for each subscriber. A churn risk score, coupled with our monthly churn rate (the rate at which customers leave the subscription service per month), will be useful information. The customer acquisition team will have a better idea of how many new users they need to acquire in order to keep the number of customers the same, and how many new users they need in order to grow the customer base.

Data Science-ing
What do we need to learn about the data and what analysis do we need to conduct.
At the heart of solving a data science problem are hundreds of questions. I attempted to ask these and similar questions last year in a blog post, Data Science Workflow . Below are some of the most crucial — they’re not the only questions you could face when solving a data science problem, but are ones that our team at Viget thinks about on nearly every data problem.
- What do we need to learn about the data?
- What type of exploratory data analysis do we need to conduct?
- Where is our data coming from?
- What is the current state of our data?
- Is this a supervised or unsupervised learning problem?
- Is this a regression, classification, or clustering problem?
- What biases could our data contain?
- What type of data cleaning do we need to do?
- What type of feature engineering could be useful?
- What algorithms or types of models have been proven to solve similar problems well?
- What evaluation metric are we using for our model?
- What is our training and testing plan?
- How can we tweak the model to make it more accurate, increase the ROC/AUC, decrease log-loss, etc. ?
- Have we optimized the various parameters of the algorithm? Try grid search here.
- Is this ethical?
That last question raises the conversation about ethics in data science. Unfortunately, there is no hippocratic oath for data scientists, but that doesn’t excuse the data science industry from acting unethically. We should apply ethical considerations to our standard data science workflow. Additionally, ethics in data science as a topic deserves more than a paragraph in this article — but I wanted to highlight that we should be cognizant and practice only ethical data science.
Let’s get started with the analysis. It’s time to answer the data science questions. Because this is an example, the answer to these data science questions are entirely hypothetical.
- We need to learn more about the time series nature of our data, as well as the format.
- We should look into average customer lifetime durations and summary statistics around some of the features we believe could be important.
- Our data came from login data and customer data, compiled by Rocinante’s data engineering team.
- The data needs to be cleaned, but it is conveniently in a PostgreSQL database.
- This is a supervised learning problem because we know which customers have churned.
- This is a binary classification problem.
- After conducting exploratory data analysis and speaking with the data engineering team, we do not see any biases in the data.
- We need to reformat some of the data and use missing data imputation for features we believe are important but have some missing data points.
- With 49 good features, we don’t believe we need to do any feature engineering.
- We have used random forests, XGBoost, and standard logistic regressions to solve classification problems.
- We will use ROC-AUC score as our evaluation metric.
- We are going to use a training-test split (80% training, 20% test) to evaluate our model.
- Let’s remove features that are statistically insignificant from our model to improve the ROC-AUC score.
- Let’s optimize the parameters within our random forests model to improve the ROC-AUC score.
- Our team believes we are acting ethically.
This process may look deceivingly linear, but data science is often a nonlinear practice. After doing all of the work in our example above, we could still end up with a model that doesn’t generalize well. It could be bad at predicting churn in new customers. Maybe we shouldn’t have assumed this problem was a binary classification problem and instead used survival regression to solve the problem. This part of the project will be filled with experimentation, and that’s totally normal.

Communication
What is the best way to communicated and circulate our results.
Our job is typically to bring our findings to the client, explain how the process was a success or failure, and explain why. Communicating technical details and explaining to non-technical audiences is important because not all of our clients have degrees in statistics. There are three ways in which communication of technical details can be advantageous:
- It can be used to inspire confidence that the work is thorough and multiple options have been considered.
- It can highlight technical considerations or caveats that stakeholders and decision-makers should be aware of.
- It can offer resources to learn more about specific techniques applied.
- It can provide supplemental materials to allow the findings to be replicated where possible.
We often use blog posts and articles to circulate our work. They help spread our knowledge and the lessons we learned while working on a project to peers. I encourage every data scientist to engage with the data science community by attending and speaking at meetups and conferences, publishing their work online, and extending a helping hand to other curious data scientists and analysts.
Our method of binary classification was in fact incorrect, so we ended up using survival regression to determine there are four features that impact churn: gaming platform, geographical region, days since last update, and season. Our team aggregates all of our findings into one report, detailing the specific techniques we used, caveats about the analysis, and the multiple recommendations from our team to the customer retention and acquisition team. This report is full of the nitty-gritty details that the more technical folks, such as the data engineering team, may appreciate. Our team also creates a slide deck for the less-technical audience. This deck glosses over many of the technical details of the project and focuses on recommendations for the customer retention and acquisition team.
We give a talk at a local data science meetup, going over the trials, tribulations, and triumphs of the project and sharing them with the data science community at large.

Why are we doing all of this?
I ask myself this question daily — and not in the metaphysical sense, but in the value-driven sense. Is there value in the work we have done and in the end result? I hope the answer is yes. But, let’s be honest, this is business. We don’t have three years to put together a PhD thesis-like paper. We have to move quickly and cost-effectively. Critically evaluating the value ultimately created will help you refine your approach to the next project. And, if you didn’t produce the value you’d originally hoped, then at the very least, I hope you were able to learn something and sharpen your data science skills.
Rocinante has a better idea of how long our users will remain active on the platform based on user characteristics, and can now launch preemptive strikes in order to retain those users who look like they are about to churn. Our team eventually develops a system that alerts the customer retention and acquisition team when a user may be about to churn, and they know to reach out to that user, via email, encouraging them to try out a new feature we recently launched. Rocinante is making better data-informed decisions based on this work, and that’s great!
I hope this article will help guide your next data science project and get the wheels turning in your own mind. Maybe you will be the creator of a data science framework the world adopts! Let me know what you think about the questions, or whether I’m missing anything, in the comments below.
Related Articles

What Quarterback Kirk Cousins Can Teach Us About Brand Perception

How Does Viget Start New Projects?
Kate Trenerry

How much will my MVP cost to build?
Zach Robbins
The Viget Newsletter
Nobody likes popups, so we waited until now to recommend our newsletter, featuring thoughts, opinions, and tools for building a better digital world. Read the current issue.
Subscribe Here (opens in new window)
- Share this page
- Tweet this page
More From Forbes
How data science will help solve many of the world’s most pressing challenges.
- Share to Facebook
- Share to Twitter
- Share to Linkedin
Climate activists block Whitehall in central London. Reducing air pollution in the UK capital is one ... [+] of the tasks to have been tackled via data analysis. (Photo: Amer Ghazzal/Barcroft Media via Getty Images)
An NGO providing free legal advice to underprivileged communities in an African country is swamped with requests: they simply do not have the capacity to directly respond to every question through their volunteer legal network. How can they successfully maximize the help they are able to give?
Machines. Or rather, machine learning. By learning from past questions and answers, modern software can automate even something as seemingly complex as legal advice – increasing the productivity and reach of the organization many times over.
It’s just one example of how 21 st century technologies have the potential to tackle pressing issues in less privileged environments. But it’s a potential that remains largely unrealized.
Sustainable Development Goals
In 2015, the United Nations set out a plan to tackle some of the world’s most pressing global challenges by the year 2030. It identified 17 individual issues that are impacting the global community and environment – labeling them its Sustainable Development Goals (SDGs). The 17 SDGs covered a wide range of areas including reversing the impacts of climate change – arguably the most pressing global issue of our time, threatening as it does the lives and livelihoods of billions of people worldwide.
Governments are tasked with much of the work of meeting the UN’s SDGs through implementing relevant and effective policies, but many argue they are not doing enough, and decisions such as that of President Trump to pull the US out of the Paris Climate Agreement are very concerning.
Meet The Unknown Immigrant Billionaire Betting Her Fortune To Take On Musk In Space
Inside the $126,000 oscars gift bags: from an italian getaway to home renovation to plastic surgery, walgreens sparks calls for boycotts after refusing to dispense abortion pills in some states.
Though of critical importance, governments are not the only entities that can play a significant role in tackling these issues. Businesses, academics and researchers can also play their part through exploring new, efficient and innovative methods.
The role of data science
21 st century technologies – and data analysis tools in particular – have the biggest potential to effectively tackle the global issues identified by the UN. Not only do we have the largest amount of data ever available to us, we also have a much greater capacity to capture, analyze and utilize it to create products and services to tackle fundamental human issues.
Collecting strong data sets on a specific social, health or environmental issue will allow academics and researchers to truly understand the severity and impact of a particular issue. Collectively, academics, businesses, NGOs and governments can then mobilize their leadership, and entrepreneurial and innovative skills to create products and services that tackle the problems they identify – using the data sets to ensure the solutions are grounded in evidence.
This is something we have already been investing in at Imperial College Business School through our Gandhi Centre for Inclusive Innovation and hosting the Data Science for Social Good summer fellowship. This fellowship, run in collaboration with the Data Science for Social Good initiative based in the University of Chicago, is the first of its kind in the UK, and looks to provide organizations and non-profits with talent, capabilities and a focused effort to address critical, real-world problems that have the potential for high social impact.
Fellows from this summer's Data Science for Social Good Fellowship, hosted by Imperial College ... [+] Business School. (Photo: Imperial College Business School)
Reducing air pollution in London
One of the projects developed during this year’s Data Science for Social Good programme uses data science to help tackle air pollution in London. This is a major undertaking, with the Mayor of London, Sadiq Khan, recently announcing £6m of funding to tackle air pollution in the capital . One critical dimension is understanding how traffic disruption and policies affect congestion and, in turn, how this affects vehicle emissions and air quality. Road transport represents around half of London’s air pollution, and congestion is the key driver of acute pollution hotspots.
Currently, traffic statistics are obtained by individuals standing next to the road and counting vehicles, which is costly and time-consuming. The statistics are reviewed in annual averages, but they are not detailed enough to evaluate traffic or air pollution initiatives, routinely underestimate emissions from vehicles and cannot account for daily or seasonal variations. The underestimation of the pollution is predicted to be up to 30%, primarily because we do not have accurate junction-level data in real-time.
This Imperial project analysed live traffic in London via video data provided by over 900 Transport for London jam cameras across Greater London. The algorithm created generated an accurate count of unique vehicles by type (everything from a bike to truck) in near real time. More importantly, it captured the number of stop-start events of each vehicle (the main reason for underestimation of air pollution).
This method has the benefit of generating improved estimates of air pollution in London via accurate air quality models, and makes possible the planning of “green” routes, the designing of accurate emission zones, and optimizing red-lights/roundabouts at appropriate junctions. The work will be open for others to build on and enhance, to better assess this critical issue. Policymakers will be able to utilize this data to make London air quality healthier.
Climate change and poverty
This is just one example of how data science can help to identify issues in more detail, and allow organizations and entrepreneurs to create products and services to tackle climate action and pollution. Data science could also be implemented to tackle challenges related to other key global issues.
For instance, other members of the fellowship programme used data science to develop better ambulance routes, to ensure the most vulnerable people get medical assistance as quickly as possible. Others looked at providing personalized interventions and job recommendations to the long-term unemployed, taking into account contextual information about the individuals’ desires and restrictions, as well as their socioeconomic context.
Data science is, so far, a fairly unexplored method of tackling the world’s most pressing issues. More effective collation and analysis of data, as well as strong leadership to create transformative products and services, could be the most viable and effective way of solving such extreme challenges as climate change, air pollution and poverty. This effort to use innovative technologies for social good is something Imperial continues to explore, and other academic institutions must do the same if they want to have any chance of solving the world’s biggest problems.
This article was written by Francisco Veloso , the Dean of Imperial College Business School . His research focuses on high tech innovation and entrepreneurship. He has several dozen publications in leading academic journals and has won several awards for his contributions. He regularly contributes as a consultant and advisor to a range of start-ups, established firms, universities and government around the world. He was also a member of the Research, Innovation and Science Experts High-Level Advisory Body to European Commissioner Carlos Moedas.

- Editorial Standards
- Reprints & Permissions
- July 6, 2020
- BI and Analytics
- By Shanawaz sheriff
8 Major Challenges Faced By Data Scientists
Organizations across the globe are looking to organize, process and unlock the value of the torrential amounts of data they generate and transform them into actionable and high value business insights. Hence, hiring data scientists – highly skilled professional data science experts, has become super critical. Today, there is virtually no business function that cannot benefit from them. In fact, the Harvard Business Review has labeled data science as the “sexiest” career of the 21st century.
However, no career is without its own challenges, and being a data scientist, despite its “sexiness” is no exception. According to the Financial Times , many organizations are failing to make the best use of their data scientists by being unable to provide them with the necessary raw materials to drive results. In fact, according to a Stack Overflow survey, 13.2% of the data scientists are looking to jump ship in search of greener pastures – second only to machine learning specialists. Having helped several data scientists solve their data problems, we share some of their common challenges and how they can overcome them.
Challenges faced by Data Scientists
1. data preparation.
Data scientists spend nearly 80% of their time cleaning and preparing data to improve its quality – i.e., make it accurate and consistent, before utilizing it for analysis. However, 57% of them consider it as the worst part of their jobs, labeling it as time-consuming and highly mundane. They are required to go through terabytes of data, across multiple formats, sources, functions, and platforms, on a day-to-day basis, whilst keeping a log of their activities to prevent duplication.
One way to solve this challenge is by adopting emerging AI-enabled data science technologies like Augmented Analytics and Auto feature engineering. Augmented Analytics automates manual data cleansing and preparation tasks and enables data scientists to be more productive.
Learn More: Augmented Analytics – Everything You Need To Know
2) Multiple Data Sources
As organizations continue to utilize different types of apps and tools and generate different formats of data, there will be more data sources that the data scientists need to access to produce meaningful decisions. This process requires manual entry of data and time-consuming data searching, which leads to errors and repetitions, and eventually, poor decisions.
Organizations need a centralized platform integrated with multiple data sources to instantly access information from multiple sources. Data in this centralized platform can be aggregated and controlled effectively and in real-time, improving its utilization and saving huge amounts of time and efforts of the data scientists.
3) Data Security
As organizations transition into cloud data management, cyberattacks have become increasingly common. This has caused two major problems –
- Confidential data becoming vulnerable
- As a response to repeated cyberattacks, regulatory standards have evolved which have extended the data consent and utilization processes adding to the frustration of the data scientists.
Organizations should utilize advanced machine learning enabled security platforms and instill additional security checks to safeguard their data. At the same time, they must ensure strict adherence to the data protection norms to avoid time-consuming audits and expensive fines.
4) Understanding The Business Problem
Before performing data analysis and building solutions, data scientists must first thoroughly understand the business problem. Most data scientists follow a mechanical approach to do this and get started with analyzing data sets without clearly defining the business problem and objective.
Therefore, data scientists must follow a proper workflow before starting any analysis. The workflow must be built after collaborating with the business stakeholders and consist of well-defined checklists to improve understanding and problem identification.
5) Effective Communication With Non-Technical Stakeholders
It is imperative for the data scientists to communicate effectively with business executives who may not understand the complexities and the technical jargon of their work. If the executive, stakeholder, or the client cannot understand their models, then their solutions will, most likely, not be executed.
This is something that data scientists can practice. They can adopt concepts like “data storytelling” to give a structured approach to their communication and a powerful narrative to their analysis and visualizations.
Learn More: Use Data and Analytics to Tell a Story
6) Collaboration with Data Engineers
Organizations usually have data scientists and data engineers working on the same projects. This means there must be effective communication across them to ensure the best output. However, the two usually have different priorities and workflows, which causes misunderstanding and stifles knowledge sharing.
Management should take active steps to enhance collaboration between data scientists and data engineers. It can foster open communication by setting up a common coding language and a real-time collaboration tool. Moreover, appointing a Chief Data Officer to oversee both the departments has also proven to have improved collaboration between the two teams.
7) Misconceptions about the role
In big organizations, a data scientist is expected to be a jack of all trades – they are required to clean data, retrieve data, build models, and conduct analysis. However this is a big ask for any data scientist. For a data science team to function effectively, tasks need to be distributed among individuals pertaining to data visualization, data preparation, model building and so on.
It is critical for data scientists to have a clear understanding of their roles and responsibilities before they start working with any organization.
8) Undefined KPIs and metrics
The lack of understanding of data science among management teams leads to unrealistic expectations on the data scientist, which affects their performance. Data scientists are expected to produce a silver bullet and solve all the business problems. This is very counterproductive.
Therefore, every business should have:
- Well-defined metrics to measure the accuracy of analysis generated by the data scientists
- Proper business KPIs to analyze the business impact generated by the analysis
Despite all the challenges, data scientists are the most in-demand professionals in the market. With the data world changing at a rapid pace, being successful data scientists is not just about having the right technical skills but also about having a clear understanding of the business requirements, collaborating with different stakeholders, and convincing business executives to act upon the analysis provided.
If you’re a data scientist facing any of these challenges and would like to learn more about overcoming them, please feel free to get in touch with one of our data science and business intelligence experts for a personalized consultation . You might also be interested in exploring how we’re helping data scientists across the world with our BI and analytics solutions .
Further Insights:
- Extracting Actionable Insights From Data: What You Need To Know
- Accelerate Data Insights & ROI With Azure Synapse Analytics
- The Role of Data in a successful digital transformation journey
If you’d like to learn more about this topic, please feel free to get in touch with one of our AI and digital workplace consultants for a personalized consultation.
Shanawaz is leading the Data and Analytics practice in Acuvate. He has been doing IT consulting in the data and analytics space for large CPG and BFSI companies for more than a decade. He manages the Data & AI services portfolio and ensures the technical deliverables are top-notch.

Analytics Vidhya

Feb 21, 2021
Data Science Case Studies: Solved and Explained
Data science case studies solved and explained using python..
Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data science use cases in your portfolio. In this article, I’m going to introduce you to 3 data science case studies solved and explained using Python.
Data Science Case Studies
If you’ve learned data science by taking a course or certification program, you’re still not that close to finding a job easily. The most important point of your Data Science interview is to show how you can use your skills in real use cases. Below are 3 data science case studies that will help you understand how to analyze and solve a problem. All of the data science case studies mentioned below are solved and explained using Python.
Case Study 1: Text Emotions Detection
If you are one of them who is having an interest in natural language processing then this use case is for you. The idea is to train a machine learning model to generate emojis based on input text. Then this machine learning model can be used in training Artificial Intelligent Chatbots.
Use Case: A human can express his emotions in any form, such as the face, gestures, speech and text. The detection of text emotions is a content-based classification problem. Detecting a person’s emotions is a difficult task, but detecting the emotions using text written by a person is even more difficult as a human can express his emotions in any form.
Recognizing this type of emotion from a text written by a person plays an important role in applications such as chatbots, customer support forum, customer reviews etc. So you have to train a machine learning model that can identify the emotion of a text by presenting the most relevant emoji according to the input text.
Solution: Machine Learning Project on Text Emotions Detection .
Case Study 2: Hotel Recommendation System
A hotel recommendation system typically works on collaborative filtering that makes recommendations based on ratings given by other customers in the same category as the user looking for a product.
Use Case: We all plan trips and the first thing to do when planning a trip is finding a hotel. There are so many websites recommending the best hotel for our trip. A hotel recommendation system aims to predict which hotel a user is most likely to choose from among all hotels. So to build this type of system which will help the user to book the best hotel out of all the other hotels. We can do this using customer reviews.
For example, suppose you want to go on a business trip, so the hotel recommendation system should show you the hotels that other customers have rated best for business travel. It is therefore also our approach to build a recommendation system based on customer reviews and ratings. So use the ratings and reviews given by customers who belong to the same category as the user and build a hotel recommendation system.
Solution: Data Science Project on Hotel Recommendation System .
Case Study 3: Customer Personality Analysis
The analysis of customers is one of the most important roles that a data scientist has to do who is working at a product based company. So if you are someone who wants to join a product based company then this data science case study is best for you.
Use Case: Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviours and concerns of different types of customers.
You have to do an analysis that should help a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.
Solution: Data Science Project on Customer Personality Analysis .
So these three data science case studies are based on real-world problems, starting with the first; Text Emotions Detection, which is completely based on natural language processing and the machine learning model trained by you will be used in training an AI chatbot. The second use case; Hotel Recommendation System, is also based on NLP, but here you will understand how to generate recommendations using collaborative filtering. The last use case; customer personality analysis, is based on someone who wants to focus on the analysis part.
All these data science case studies are solved using Python, here are the resources where you will find these use cases solved and explained:
- Text Emotions Detection
- Hotel Recommendation System
- Customer Personality Analysis
I hope you liked this article on data science case studies solved and explained using the Python programming language. Feel free to ask your valuable questions in the comments section below.
More from Analytics Vidhya
Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com
About Help Terms Privacy
Get the Medium app

Aman Kharwal
I write stories behind the data📈 | instagram.com/amankharwal.official/
Text to speech

All Courses
- Interview Questions
- Free Courses
- Career Guide
- PGP in Data Science and Business Analytics
- PGP in Data Science and Engineering (Data Science Specialization)
- MS in Data Science Programme
- M.Tech in Data Science and Machine Learning
- MS in Data Science
- PGP – Artificial Intelligence for leaders
- PGP- Machine Learning
- PGP in Artificial Intelligence and Machine Learning
- MIT- Data Science and Machine Learning Program
- PGP in Cybersecurity
- Post Graduate Program in Cybersecurity
- Great Lakes Cybersecurity Course
- Post Graduate Diploma in Management
- PGP in Executive Management
- Master of Business Administration- Shiva Nadar University
- Executive Master of Business Administration – PES University
- PGP in cloud computing
- Advanced Certification in Cloud Computing
- Advanced Certificate Program in Full Stack Software Development
- PGP in in Software Engineering for Data Science
- Advanced Certification in Software Engineering
- PG Diploma in Artificial Intelligence – IIIT-Delhi
- PGP in Software Development and Engineering
- PGP in in Product Management and Analytics
- NUS Business School : Digital Transformation
- Design Thinking : From Insights to Viability
- PGP In Strategic Digital Marketing
- Master of Business Administration Degree Program
- Data Science
- Introduction to Data Science
- Data Scientist Skills
- Get Into Data Science From Non IT Background
- Data Scientist Salary
- Data Science Job Roles
- Data Science Resume
- Data Scientist Interview Questions
- Data Science Solving Real Business Problems
- Business Analyst Vs Data Scientis
Data Science Applications
- Must Watch Data Science Movies
- Data Science Projects
- Free Datasets for Analytics
- Data Analytics Project Ideas
- Mean Square Error Explained
- Hypothesis Testing in R
- Understanding Distributions in Statistics
- Bernoulli Distribution
- Inferential Statistics
- Analysis of Variance (ANOVA)
- Sampling Techniques
- Outlier Analysis Explained
- Outlier Detection
- Data Science with K-Means Clustering
- Support Vector Regression
- Multivariate Analysis
- What is Regression?
- An Introduction to R – Square
- Why is Time Complexity essential?
- Gaussian Mixture Model
- Genetic Algorithm
- Business Analytics
- What is Business Analytics?
- Business Analytics Career
- Major Misconceptions About a Career in Business Analytics
- Business Analytics and Business Intelligence Possible Career Paths for Analytics Professionals
- Business Analytics Companies
- Business Analytics Tools
- Business Analytics Jobs
- Business Analytics Course
- Difference Between Business Intelligence and Business Analytics
- Python Tutorial for Beginners
- Python Cheat Sheet
- Career in Python
- Python Developer Salary
- Python Interview Questions
- Python Project for Beginners
- Python Books
- Python Real World Examples
- Python 2 Vs. Python 3
- Free Online Courses for Python
- Flask Vs. Django
- Python Stack
- Python Switch Case
- Python Main
- Data Types in Python
- Mutable & Immutable in Python
- Python Dictionary
- Python Queue
- Iterator in Python
- Regular Expression in Python
- Eval in Python
- Classes & Objects in Python
- OOPs Concepts in Python
- Inheritance in Python
- Abstraction in Python
- Polymorphism in Python
- Fibonacci Series in Python
- Factorial Program in Python
- Armstrong Number in Python
- Reverse a String in Python
- Prime Numbers in Python
- Pattern Program in Python
- Palindrome in Python
- Convert List to String in Python
- Append Function in Python
- REST API in Python
- Python Web Scraping using BeautifulSoup
- Scrapy Tutorial
- Web Scraping using Python
- Jupyter Notebook
- Spyder Python IDE
- Free Data Science Course
- Free Data Science Courses
- Data Visualization Courses
How Data Science Solves Real Business Problems
- Automating the placement of a digital advertisement
- Use of data science and advanced analytics to revamp the search function
- Using data science for generating data-driven crime predictions
- Using data science for evading tax evasion
Statistics and data analysis have leveraged the power of data to explain the current situation in any business set-up and predict certain outcomes. With data science , this gets enhanced further. Data science solves real business problems by utilising data to construct algorithms and create programs that help prove optimal solutions to individual problems.
Data science solves real business problems using hybrid math and computer science models to get actionable insights. It takes the risk of going into the territory of uncharted ‘unstructured’ data and getting meaningful insights that help businesses make better decisions.
Let’s talk about how data science solves real business problems. We will take examples of a few companies and some concepts that are used in data science to solve real business problems.
Great Learning offers the best data science courses and postgraduate programs that you can choose from. Learn from industry experts through online mentorship sessions and dedicated career support.
Let’s start with Svorn brokers. This is a company whose vision speaks for itself. It works for both publishers and advertisers. If you are an advertiser, it will connect you to a passionate audience through their clean, certified exchange.
But what does it do?
Well, it deals between advertisers and channels like ESPN, Encyclopedia, Bustle, and StarTribune. As these deals happen numerous times in a day, Sovrn has access to a lot of data for insights. It uses this data to automate digital ad placement. Its interface is compatible with Google as well as Amazon’s server-to-server bidding platforms and can monetize by sending target campaigns to a particular set of customers.
AirBnB is the prime example of a technology company that has leveraged the power of data science to solve real business problems. It gets a million users each day who search for top-rated vacation rentals. Not just that, it has data from hosts, demand for rentals, hosts and what not! Airbnb realised the importance of this data and created a dynamic pricing system called Aerosolve.
Being an open source resource , Aerosolve’s predictive model considers a variety of attributes like an optimal price for a rental based on its location, the time of the year it is mostly booked, etc. It then uses the insights to help AirBnB hosts set their prices and gain maximum returns.
Data science solves real business problems not just at corporates or tech companies, but there are multiple ways in which the government agencies of the US. For instance, there is a software suite widely used by the American judicial system and law enforcement called the Northpointe software suite . Designed by an Ohio-based company, Equivant , it tries to simulate whether an offender’s risk of trespassing, based on its data-driven algorithms. The algorithms assess the risk on the basis of a questionnaire that asks questions on the offender’s employment status, education, etc.
The Internal Revenue Service in the US government has used data science to create evolved fraud-detection protocols in the digital times. Tax evasion costs the US government billions of US dollars a year, which has been one of the main reasons the IRS has stepped up its game. It has improved the efficiency by creating multidimensional taxpayers profiles by digging deep into the data the citizens provide at multiple avenues. For example the social media data, analysis of emails, recognizing the electronic payments, etc. Based on such profiles, the agency predicts individual tax returns, and those whose predicted and real returns don’t match, they get picked out for auditing.
In this article, we aimed to cover some of the ways in which data science solves business problems. There are a lot more areas in which this can be applied. So by no means is this list exhaustive.
But I am sure after reading this article, you would have definitely realized the massive growth of data science in US and around the world. You can read more about data science in the US in this ebook .
Why not invest some time learning more about data science with one of the top data science courses in US – The Post Graduate Program in Data Science & Business Analytics offered by the McCombs School of Business at The University of Texas at Austin.
Want to know what it is all about?
Sign up and get access to a Free Demo to know more about this data science program from one of the top universities in the world.
Who knows, a few months later you might just solve a real business problem with your data science skill-set!
Find Data Science & Business Analytics Course in Top cities in India

Top 7 Movies Every Data Scientist Must Watch


A Complete understanding of LASSO Regression

Python Program to Find the Factorial of a Number

Top 25 Data Science Books in 2023- Learn Data Science Like an Expert

Top Data Analytics Jobs in 2023
1 thought on “how data science solves real business problems”.
Hi Priyanka. Thank you for this article & quite interesting.
Regards, Ron
Leave a Comment Cancel Reply
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Table of contents

Crack dream jobs with FREE certificate courses on India's most trusted education platform

Learn data analytics or software development & get guaranteed* placement opportunities.
- 10 guaranteed* placement opportunities
- 3-6 Lakh Per Annum salary range.
- Suited for freshers & recent graduates
- Choose between classroom learning or live online classes
- 4-month full-time program
- Placement opportunities with top companies

Framing Data Science Problems the Right Way From the Start
Data science project failure can often be attributed to poor problem definition, but early intervention can prevent it.
- Data, AI, & Machine Learning
- Analytics & Business Intelligence
- Data & Data Culture

The failure rate of data science initiatives — often estimated at over 80% — is way too high. We have spent years researching the reasons contributing to companies’ low success rates and have identified one underappreciated issue: Too often, teams skip right to analyzing the data before agreeing on the problem to be solved. This lack of initial understanding guarantees that many projects are doomed to fail from the very beginning.
Of course, this issue is not a new one. Albert Einstein is often quoted as having said , “If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it.”
Get Updates on Leading With AI and Data
Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.
Please enter a valid email address
Thank you for signing up
Privacy Policy
Consider how often data scientists need to “clean up the data” on data science projects, often as quickly and cheaply as possible. This may seem reasonable, but it ignores the critical “why” question: Why is there bad data in the first place? Where did it come from? Does it represent blunders, or are there legitimate data points that are just surprising? Will they occur in the future? How does the bad data impact this particular project and the business? In many cases, we find that a better problem statement is to find and eliminate the root causes of bad data .
Too often, we see examples where people either assume that they understand the problem and rush to define it, or they don’t build the consensus needed to actually solve it. We argue that a key to successful data science projects is to recognize the importance of clearly defining the problem and adhere to proven principles in so doing. This problem is not relegated to technology teams; we find that many business, political, management, and media projects, at all levels, also suffer from poor problem definition.
Toward Better Problem Definition
Data science uses the scientific method to solve often complex (or multifaceted) and unstructured problems using data and analytics. In analytics, the term fishing expedition refers to a project that was never framed correctly to begin with and involves trolling the data for unexpected correlations. This type of data fishing does not meet the spirit of effective data science but is prevalent nonetheless. Consequently, defining the problem correctly needs to be step one. We previously proposed an Roger W. Hoerl ( @rogerhoerl ) teaches statistics at Union College in Schenectady, New York. Previously, he led the applied statistics lab at GE Global Research. Diego Kuonen ( @diegokuonen ) is head of Bern, Switzerland-based Statoo Consulting and a professor of data science at the Geneva School of Economics and Management at the University of Geneva. Thomas C. Redman ( @thedatadoc1 ) is president of New Jersey-based consultancy Data Quality Solutions and coauthor of The Real Work of Data Science: Turning Data Into Information, Better Decisions, and Stronger Organizations (Wiley, 2019). Add a comment cancel reply. You must sign in to post a comment. First time here? Sign up for a free account : Comment on articles and get access to many more articles.About the Authors
More Like This
Comments (2)
Tathagat Varma
- Submissions
- Artificial Intelligence
- Career Advice
- Computer Vision
- Data Engineering
- Data Science
- Machine Learning
- Programming
- Certificates
- Online Masters
- Cheat Sheets
- Publications
3 Hard Python Coding Interview Questions For Data Science
No mercy today! I have three hard-level Python coding interview questions that require you to be on top of your game in Python and solve business problems.

In today’s article, I’ll focus on Python skills for data science. A data scientist without Python is like a writer without a pen. Or a typewriter. Or a laptop. OK, how about this: A data scientist without Python is like me without an attempt at humor.
You can know Python and not be a data scientist. But the other way around? Let me know if you know someone who made it in data science without Python. In the last 20 years, that is.
To help you practice Python and interviewing skills, I selected three Python coding interview questions. Two are from StrataScratch , and are the type of questions that require using Python to solve a specific business problem. The third question is from LeetCode , and tests how good you are at Python algorithms.
Python Coding Interview Question #1: Math in Python

Take a look at this question by Google.

Link to the question: https://platform.stratascratch.com/coding/10067-google-fit-user-tracking
Your task is to calculate the average distance based on GPS data using the two approaches. One is taking into consideration the curvature of the Earth, the other is not taking it into consideration.
The question gives you formulas for both approaches. As you can see, this python coding interview question is math-heavy. Not only do you need to understand this level of mathematics, but you also need to know how to translate it into a Python code.
Not that easy, right?
The first thing you should do is recognize there’s a math Python module that gives you access to the mathematical functions. You’ll use this module a lot in this question.
Let's start by importing necessary libraries and sine, cosine, arccosine, and radian functions. The next step is to merge the available DataFrame with itself on the user ID, session ID, and day of the session. Also, add the suffixes to IDs so you can distinguish between them.
Then find the difference between the two step IDs.
The previous step was necessary so we can exclude all the sessions that have only one step ID in the next step. That’s what the questions tell us to do. Here’s how to do it.
Use the pandas idxmax() function to access the sessions with the biggest difference between the steps.
After we prepared the dataset, now comes the mathematics part. Create a pandas Series and then the for loop. Use the iterrows() method to calculate the distance for each row, i.e., session. This is a distance that takes the Earth's curvature into account, and the code reflects the formula given in the question.
Now, do the same thing but considering the Earth is flat. This is the only occasion being a flat-Earther is beneficial.
Turn the result into a DataFrame and start calculating the output metrics. The first one is the average distance with Earth's curvature. Then the same calculation without the curvature. The final metric is the difference between the two.
The complete code, and its result are given below.
Python Coding Interview Question #2: Graph Theory in Python

It’s a question by Delta Airlines. Let’s take a look at it.

Link to the question: https://platform.stratascratch.com/coding/2008-the-cheapest-airline-connection
This question asks you to find the cheapest airline connection with a maximum of two stops. This sounds awfully familiar, doesn’t it? Yes, it’s a somewhat modified shortest path problem : instead of a path, there’s cost instead.
The solution I’ll show you extensively uses the merge() pandas function. I’ll also use itertools for looping. After importing all the necessary libraries and modules, the first step is to generate all the possible combinations of the origin and destination.
Now, show only combinations where the origin is different from the destination.
Let’s now merge the da_flights with itself. I’ll use the merge() function, and the tables will be joined from the left on the destination and the origin. That way, you get all the direct flights to the first destination and then the connecting flight whose origin is the same as the first flight’s destination.
Then we merge this result with da_flights. That way, we’ll get the third flight. This equals two stops, which is the maximum allowed by the question.
Let’s now tidy the merge result by assigning the logical column names and calculate the cost of the flights with one and two stops. (We already have the costs of the direct flights!). It’s easy! The total cost of the one-stop flight is the first flight plus the second flight. For the two-stop flight, it’s a sum of the costs for all three flights.
I will now merge the DataFrame I created with the given DataFrame. This way, I’ll be assigning the costs of each direct flight.
Next, merge the above result with connections_2 to get the costs for the flights to destinations requiring one stop.
Do the same for the two-stop flights.
The result of this is a table giving you costs from one origin to a destination with direct, one-stop, and two-stop flights. Now you only need to find the lowest cost using the min() method, remove the NA values and show the output.
With these final lines of code, the complete solution is this.
Here’s the code output.
Python Coding Interview Question #3: Binary Tree in Python

Besides graphs, you’ll also work with binary trees as a data scientist. That’s why it would be useful if you knew how to solve this Python coding interview question asked by likes of DoorDash, Facebook, Microsoft, Amazon, Bloomberg, Apple, and TikTok.

Link to the question: https://leetcode.com/problems/binary-tree-maximum-path-sum/description/
The constraints are:

The first step towards the solution is defining a maxPathSum function. To determine if there is a path from the root down the left or right node, write the recursive function gain_from_subtree.
The first instance is the root of a subtree. If the path is equal to a root (no child nodes), then the gain from a subtree is 0. Then do the recursion in the left and the right node. If the path sum is negative, the question asks not to take it into account; we do that by setting it to 0.
Then compare the sum of the gains from a subtree with the current maximum path and update it if necessary.
Finally, return the path sum of a subtree, which is a maximum of the root plus the left node and the root plus the right node.
These are the outputs for Cases 1 & 2.

This time, I wanted to give you something different. There are plenty of Python concepts you should know as a data scientist. This time I decided to cover three topics I don’t see that often: mathematics, graph data structures, and binary trees.
The three questions I showed you seemed ideal for showing you how to translate these concepts into Python code. Check out “ Python coding interview questions ” to practice such more Python concepts. Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch , a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn .
- 15 Python Coding Interview Questions You Must Know For Data Science
- KDnuggets News, May 4: 9 Free Harvard Courses to Learn Data Science; 15…
- How to Answer Data Science Coding Interview Questions
- Top Python Data Science Interview Questions
- SQL and Python Interview Questions for Data Analysts
- 5 Python Interview Questions & Answers

Get the FREE ebook 'The Great Big Natural Language Processing Primer' and the leading newsletter on AI, Data Science, and Machine Learning, straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy
Top Posts Past 30 Days
Latest news, more recent posts, related posts.
- Data Science Internship Interview Questions
- 12 Most Challenging Data Science Interview Questions
- 3 More SQL Aggregate Function Interview Questions for Data Science
- Top 10 Advanced Data Science SQL Interview Questions You Must Know How to…
- 24 A/B Testing Interview Questions in Data Science Interviews and How to…
- Data science SQL interview questions from top tech firms
Get The Latest News!
Subscribe To Our Newsletter (Get The Great Big NLP Primer ebook)


Towards Data Science

The Job-Shop Scheduling Problem: Mixed-Integer Programming Models
Mathematical modeling and python implementation of the classical sequencing problem using pyomo.
The job-shop scheduling problem (JSSP) is a widely studied optimization problem with several industrial applications. The goal is to define how to minimize the makespan required to allocate shared resources (machines) over time to complete competing activities (jobs). As for other optimization problems, mixed-integer programming can be an effective tool to provide good solutions, although for large-scale instances one should probably resort to heuristics.
Throughout this article, one may find two of the most usual mixed-integer programming formulations for the JSSP with implementation in Python, using the pyomo library (Bynum et al., 2021). Those interested in details can follow along with the complete code available in this git repository .
If you are unfamiliar with mixed-integer programming or optimization in general, you might have a better experience after reading this introduction on the subject.
An introduction to mixed-integer linear programming: The knapsack problem
Learn how to solve optimization problems in python using scipy and pyomo.
towardsdatascience.com
Now let us dive in!
Problem statement
Suppose a set of jobs J needs to be processed in a set of machines M , each in a given order. For instance, job number 1 might need to be processed in machines (1, 4, 3, 2), whereas job number 2 in (2, 3, 4, 1). In this case, before going to machine 4, job 1 must have gone to machine 1. Analogously, before going to machine 1, job 2 must have been processed in machine 4.
Each machine can process only one job at a time. Operations are defined by pairs (machine, job) and each has a specific processing time p . Therefore, the total makespan depends on how one allocates resources to perform tasks.
The figure below illustrates an optimal sequence of operations for a simple instance with 5 machines and 4 jobs. Notice that each machine processes just one job at a time and each job is processed by only one machine at a time.
As for other optimization problems, we must convert these rules into mathematical equations to obtain smart allocations of resources. Therefore, in the following section, let us see two usual formulations for the JSSP.
Mixed-integer models
Following the study of Ku & Beck (2016), two formulations for the JSSP will be presented: the disjunctive model (Manne, 1960) and the time-index model (Bowman, 1959; Kondili, 1993). Those interested might refer to Wagner (1959) for a third formulation (rank-based). The disjunctive model is surely the most efficient of the three for the original problem. However, others might be easier to handle when incorporating new constraints that might occur in real-world problems.
In the disjunctive model, let us consider a set J of jobs and a set M of machines. Each job j must follow a processing order ( σ ʲ ₁, σ ʲ ₂, …, σ ʲₖ ) and each operation ( m , j ) has a processing time p . The decision variables considered are the time that job j starts on machine m , xₘⱼ ; a binary that marks precedence of job i before j on machine m , zₘᵢⱼ ; and the total makespan of operation, C , which is itself the minimization objective.
We need to create constraints to ensure that:
- The starting time of job j in machine m must be greater than the starting time of the previous operation of job j plus its processing time.
- Each machine processes just one job at a time. To do this, we state that if i precedes j in machine m , the starting time of job j in machine m must be greater than or equal to the starting time of job i plus its processing time.
- Of every pair of jobs i , j one element must precede the other for each machine m in M .
- The total makespan is greater than the starting time of every operation plus its processing time.
And we get the following formulation:
In which, V is an arbitrarily large value (big M) of the “either-or” constraint.
The next formulation explored will be the time-indexed model. It is limited in the sense that only integer processing times can be considered and one can notice that it produces a constraint matrix with several nonzero elements, which makes it computationally more expensive than the disjunctive model. Furthermore, as processing times increase, the number of decision variables increases as well.
In the time-indexed model, we will consider the same sets of jobs J and machines M , besides a set of discrete intervals T . The choice of the size of T might be oriented in the same way as the definition of V : the sum of all processing times. The same parameters of the order of jobs and processing times will be used too. However, in this approach, we only consider binary variables that mark it job j starts at machine m at instant t , xₘⱼₜ , besides the real-valued (or integer) makespan C .
Let us formulate the constraints:
- Each job j at machine m starts only once.
- Ensure that each machine processes just one job at a time. And this is the hard one in the time-indexed approach. To do this, we state that at most one job j can start at machine m during the time span between the current time t and pₘⱼ previous times. For each machine and time instant.
Implementation
Before diving into the models, let us create a few utility classes to handle the parameters of this problem. The first will be JobSequence a Python list child class with methods to find previous and following elements in a sequence. This will be useful when referring to the sequence of machines for each job.
Now let us create a white-label class for parameters. It must store the set of jobs J , the set of machines M , the sequence of operations of each job j in a dict of JobSequences , and the processing time of each pair m , j in a tuple-index dictionary p_times .
And at last, a class to generate random problem instances from a given number of machines, jobs, and interval of processing times.
Now we can instantiate random problems at ease to validate our models.
In the following steps, we will create three classes that inherit from pyomo’s ConcreteModel . The first will be a white-label class for the MIP models. The second and the third will be the disjunctive and time-indexed model classes respectively.
One can notice the sets of jobs J and machines M are stored in the instance attributes of the same name. The attribute p holds processing times, and V is the reasonable upper limit for makespan.
Let us now create the disjunctive model, the DisjModel class.
Instances of DisjModel carry attributes x , z , and C — of the variables previously described. The objective is quite simple: minimize one of the decision variables: C . And notice we still need to define rules for the constraints. They are defined in the same order previously listed when introducing the model. Let us now write them in pyomo style.
And we are ready to solve JSSP problems with the disjunctive model approach. Let us define the time-indexed model as well.
Once again, constraints were defined in the same order they were previously presented. Let us write them in pyomo style too.
And we are ready to test how these models perform in some randomly generated problems!
Let us instantiate a random 4x3 ( J x M ) problem and see how our models perform.
To solve these examples, I will use the open-source solver CBC. You can download CBC binaries from AMPL or from this link . You can also find an installation tutorial here . As the CBC executable is included in the PATH variable of my system, I can instantiate the solver without specifying the path to an executable file. If yours is not, parse the keyword argument “executable” with the path to your executable file.
Alternatively, one could have used GLPK to solve this problem (or any other solver compatible with pyomo ). The latest available GLPK version can be found here and the Windows executable files can be found here .
The solver had no trouble finding the optimal solution for the disjunctive model and proving optimality in less than one second.
However, we can see that even for this simple problem, the solver could not find the optimal solution for the time-indexed model within the limit of 20 seconds.
Amazing to see the difference in performance for two models with the same idea just by rearranging the mathematical equations.
By the way, those interested might find the complete code (plots included) in this repository .
Further reading
For larger instances, due to combinatorial aspects of this problem, even high-performance commercial solvers, such as Gurobi or Cplex, might face difficulties to provide good quality solutions and prove optimality. In this context, metaheuristics can be an interesting alternative. I would suggest the interested reader to look for the papers “ Parallel GRASP with path-relinking for job shop scheduling ” (Aiex et al., 2003) and “ An extended Akers graphical method with a biased random-key genetic algorithm for job-shop scheduling ” (Gonçalves & Resende, 2014). I recently tried to implement simplified versions of these algorithms and had some interesting results, although pure Python implementation is still time-expensive. You can find them in this repository .
Conclusions
In this article, two different mixed-integer programming approaches for the job-shop scheduling problem (JSSP) were implemented and solved using the Python library pyomo and the open-source solver CBC. The disjunctive model proved to be a better alternative for the original JSSP, although more complex real-world models might share similarities with the time-indexed formulation for incorporating additional rules. The complete code used in these examples is available for further use.
Aiex, R. M., Binato, S., & Resende, M. G. (2003). Parallel GRASP with path-relinking for job shop scheduling . Parallel Computing, 29(4), 393–430.
Bynum, M. L. et al., 2021. Pyomo-optimization modeling in python. Springer.
Gonçalves, J. F., & Resende, M. G. (2014). An extended Akers graphical method with a biased random‐key genetic algorithm for job‐shop scheduling . International Transactions in Operational Research, 21(2), 215–246.
Kondili, E., & Sargent, R. W. H. (1988). A general algorithm for scheduling batch operations (pp. 62–75). Department of Chemical Engineering, Imperial College.
Ku, W. Y., & Beck, J. C. (2016). Mixed integer programming models for job shop scheduling: A computational analysis . Computers & Operations Research, 73, 165–173.
Manne, A. S. (1960). On the job-shop scheduling problem. Operations research , 8 (2), 219–223.
Wagner, H. M. (1959). An integer linear‐programming model for machine scheduling. Naval research logistics quarterly , 6 (2), 131–140.
More from Towards Data Science
Your home for data science. A Medium publication sharing concepts, ideas and codes.
About Help Terms Privacy
Get the Medium app

Bruno Scalia C. F. Leite
Chemical Engineer, Researcher, Optimization Enthusiast, and Data Scientist passionate about describing phenomena using mathematical models.
Text to speech
- Partnerships
- White Papers
- Technology trends 2023-2024: AI and Big Data Analytics
- AI Solutions
- AI Software Development
- Machine Learning Consulting
- Predictive Analytics
- Recommendation Systems
- Natural Language Processing
- Customer Experience Consulting
- AI Call Center Solutions
- Consumer Sentiment Analysis
- Text Analysis
- Computer Vision
- Data Capture & OCR
- Automated Invoice Processing
- Intelligent Document Processing
- Big Data Development
- Modern Data Architecture
- Data Engineering Services
- Big Data Analytics
- Data Warehouse
- BI & Data Visualizations
- Investment Data Management Solution
- Marketing Campaign Performance Optimization
- Cloud Services
- AWS AI & Machine Learning
- AWS Data Analytics
- Azure Data Analytics Partner
- Databricks Managed Services
- Snowflake Deployment Services
- AWS Serverless
- Office 365 Migration Services
- Data Migration Consulting
- Custom Web Development
- Mobile App Development
- Intelligent AI Cooking Assistant
- Full-Cycle Web Application Development for a Retail Company
- Healthcare & Pharma
- Game & Entertainment
- Sport & Wellness
- Marketing & Advertising
- Media & Entertainment
- Retail & E-commerce
- Supply Chain & Logistics
- Reports & Research Papers
- Image Recognition Solutions
- Pose Estimation
- Computer Vision Models Gallery
- Visual Search
- Recommendation systems
- AI-Driven Mobile App Development
- AI Call Center Solutions
- Custom Web Development Services
- All Success Stories
6 Data Science Challenges in 2021 and How to Address Them
Data has become the new fuel for businesses. It is now an integral part of all the decision-making processes. Today, most industries are resorting to data and analytics to underscore their brand’s position on the market and increase revenue.
As the adoption of analytics methods like data science and big data analytics has increased , so have the challenges in data science that come with it. Most DS (data science) issues are not company-specific. These challenges may include finding the right talent or solving basic issues revolving around getting the raw data organized, unknown security vulnerabilities, and more.
In this blog post, we will discuss some of the key data science challenges in 2021 and solutions to address them.
1. Multiple Data Sources
Companies have started using various software and mobile applications like ERPs and CRMs to collect and manage information related to their customers, sales or employees. Data consolidation from disparate, unstructured or semi-structured information can be a complex process. This leads to non-uniformed formats as each of the tools collect information in their own ways. Moreover, this also means that there are a variety of sources to handle and extract data from.
Heterogeneous sources often make it difficult for data scientists to understand and gather meaningful insights. Hence, they end up spending more time on filtering it, which leads to errors and unreliable decision-making. In such cases, it is crucial to standardize data for accurate analysis. To have an understanding about what format to use for DS, you need to have insights on the essentials of big data. Therefore, it is important to know the 4 Vs of big data :
- Volume: people often ask, is big data a problem? No, it’s not. Even with the data exchange growing exponentially, one can handle it with the help of technology. You’ll just have to find the right technology vendor to help you cope with it.
- Velocity: with volume, the speed at which the information is transferred, also matters. The exchange happens in real-time. So, it is essential to analyze these data sets in real-time, too.
- Variety: data comes in all shapes and sizes. They can be structured, unstructured or semi-structured. As discussed above, setting a standardized format is a perfect way to handle the variety of data.
- Veracity: people ask how much can your data be trusted? Before starting big analysis, it is crucial to choose the right data relevant to your business case.
In addition to this, another solution to this problem is to list the data sources that a company uses and look for a centralized platform that allows integrating data from those sources. Next step is to create a data strategy and quality management plan as the data gathered from these sources will be dynamic. Prioritizing and integrating datasets in a centralized system saves time and effort as well as it helps in aggregating data at a single location in real-time. This ultimately helps in running algorithms efficiently.
2. Data Security
Data science in business is used to identify business opportunities, improve overall business performance and drive savvy decision-making. However, data security remains one of the top issues in data science that concerns businesses all over the world. Data security is an umbrella term that includes all security measures and tools applied to analytics and data processes. Few of the data security breaches involve:
- Attack on data systems
Information theft is the most common data security concern, especially for organizations that have access to sensitive data like financial information or customers’ personal information. With the increase in the amount of information exchanged over the Internet, the threat to data travelling over the network has increased exponentially. Hence, companies need to follow the three fundamentals of data security:
- Confidentiality
- Accessibility
Source: Unsplash
Using secure systems to access and store data is the first step towards ensuring the confidentiality of the accumulated information . With methods like data penetration testing, data encryption and pseudonymization as well as privacy policies, businesses can make sure that their information remains protected. DS services are not designed for granular access. This means only required personnel or team should have access to sensitive information, while the purpose of the data should be determined.
3. Lack of Clarity on Business Problem
First, one should study the business challenge for which you want to implement data science solutions . Opting for the mechanical approach of identifying datasets and performing data analysis before getting a clear picture of what business issue to solve, proves to be less effective. This is especially unsupportive when you are applying DS for effective decision-making. Moreover, even with a clear purpose in mind if your expectation from data science implementation is not aligned with the end-goals, the efforts are futile.
Strategizing a flawless workflow is a winning solution to identify the right use case to solve. To create a workflow, it is important to collaborate with all the departments and design a checklist that enhances problem identification. This helps in identifying a business issue and its effects in a multidisciplinary environment.
4. Undefined KPIs and Metrics
Data scientists can design machine learning models and get accurate results with the help of it. However, there are chances that the metrics used do not serve the purpose of implementing DS. Learning data science includes not only knowing development of algorithms, but also requires a keen understanding of other practices. This consists of a mix of metrics and KPIs that boost business growth.
Some of the methods to identify key metrics are:
- Clear goal and vision: a realistic goal, articulated enough to bring the success to the project. The goal should be quantifiable and should allow you to track the project’s progress. This helps specialists rectify any errors before it is too late.
- Reusable artifacts: reusability is a boon. It helps improve the overall productivity of the DS-based project. Also, if you leverage reusable artifacts, you save a lot of time and gain lucrative benefits. Few of the artifacts that can be re-used include frameworks, open-source software, artificial intelligence models, etc.
- Number of production deployments: after experimenting and creating the proof of concept, you’d want to deploy your ML models into production. If the models do not perform as expected, there are multiple iterations and modifications required to be done to ensure you get the desired results. It’s okay if you make small changes in production. This will help you gain insights into the bottlenecks at the end-process in the early stages of production.
- Delivering actionable insights: a successful DS-based project helps you get actionable insights that include improving processes like inventory, sales, production and others. They should guide you and take fact-based decisions that meet the end goal.
- Return on Investment (ROI): while investing in DS projects, you’d want to know if the results will maximize your ROI or at least minimize the loss. If the returns from your DS module implementation is not exceeding or at par with your investments – time and cost, then it is better to re-evaluate the entire process.
5. Difficulty in Finding Skilled Data Scientists
Talent shortage is another issue in data science that companies are facing. Businesses often struggle to find the right data team with in-depth knowledge and domain expertise. Along with a deep understanding of ML and AI algorithms, specialists are required to also know about the business perspective of DS. Ultimately, a DS project is successful when it enables organizations to tell their business story through their data. Hence, an important skill to look for in analysts and scientists is the art of storytelling through data, along with problem-solving capabilities.
While not all the departments understand the language of data, the expert team should be able to communicate with other teams, and do it efficiently. As different teams have different priorities and workflows, it is important for all of them to be on the same page. Professionals should be able to explain the technical complexities in a comprehensive way, so business owners can understand them easily. However, to find such a team is difficult. Reaching out to a data science company is a viable option as they not only have the technical expertise required but also understand the business aspect of the project, and are ready to commit to it.
6. Getting Value Out of Data Science
Data experts believe that to support a business, the data analytics process needs to be more agile and in-sync with business during the decision-making process. Implementing DS allows you to build a culture of collaboration amongst team members and most importantly, empowers your employees to make better decisions.
DS can be used for various purposes like:
- Understanding customers
- Targeting the right customers
- Improving the quality of products
- Making teams more effective
Depending on the business case, right datasets as well as robust ML and AI models, you can get abundant value out of your DS project.
In this era of digitalization and big data competition, it becomes necessary for companies to adapt to the changing market needs and develop a data science strategy in accordance with the business needs. When pursuing your analytics goals, professionals can be confronted by various types of DS challenges that hinder your progress. If you follow a well-planned workflow that allows you to strategize your business, analytical and technological capabilities, these problems can be efficiently addressed. Below are the summarized solutions that can help you with successful DS implementation:
- Create a list of possible initiatives with clear objectives
- Select a business use case that needs to be solved
- Analyze in-house capabilities
- Make a list of tech requirements
- Seek third-party expertise
- Prepare a realistic timeline.
A comprehensive plan helps you to tackle data science blues. Also, consulting with data science experts allows you to gain insights, which lead to a successful implementation of the project.
Author Bio:
Ripal Vyas is the Owner of Softweb Solutions Inc – An Avnet Company. Having solid experience in bringing the latest technologies to the Midwest, he is now raising awareness on the importance of IoT, deep learning, AI, advanced data analytics, and digital experiences across the U.S.
Empower Your Project with Skilled Data Science Team
Need to extend your in-house team with experienced data scientists, or looking for a committed team to take on your project? Get in touch with us at [email protected] .
Please leave this field empty.
Privacy Overview

- Planet Earth
- Strange News
What Problems Can Data Science Solve
Table of Contents:
Spell checks, especially for people writing in multiple languages – lot’s of progress to be made here, including automatically recognizing the language when you type, and stop trying to correct the same word every single time (some browsers have tried to change Ning to Nong hundreds of times, and I have no idea why after 50 failures they continue to try – I call this machine unlearning)
Please enable JavaScript
Video advice: Very Panel: What Problems Can You Solve with Data Science?
Data science isn’t just a trend or a buzzword — although it’s mistaken for both, mostly because it’s hard to define and is always evolving. Data science means a lot of different things to a lot of different people. And to be fair, it actually is a lot of different things. It’s an entire field of scientific methods and processes, a combination of computer science, applied mathematics, and statistics that turns data into insights into solutions.

Road constructions, HOV lanes, and traffic lights designed to optimize highway traffic. Major bottlenecks are caused by 3-lanes highways suddenly narrowing down to 2-lanes on a short section and for no reasons, usually less than 100 yards long. No need for big data to understand and fix this, though if you don’t know basic physics (fluids theory) and your job is traffic planning / optimization / engineering, then big data – if used smartly – will help you find the cause, and compensate for your lack of good judgement. These bottlenecks should be your top proprity, and not expensive to fix.
5 Steps on How to Approach a New Data Science Problem
85 percent of companies are trying to be data-driven. See how to approach a data science problem and what types of questions data science can answer.
- 5 Things You Need to Know to Truly Embrace a Data-Driven Culture
- 7 Ways to Motivate a Development Team
Liked this chapter?
WorkServicesAboutLibraryBlogThe Big PicCareersWorkServicesAboutInsightsLibraryCollection of guides and handbooksBlogActionable insights on product developmentThe Big PicBusiness news for tech peopleCareersLet’s talkLibrary / Leading an IT Team / Data Science Problems24 Mar, 20216 min. Many companies struggle to reorganize their decision making around data and implement a coherent data strategy. The problem certainly isn’t lack of data but inability to transform it into actionable insights. Here’s how to do it right. MarcinCTOMattCOO & Co-FounderBiankaEditorChapters in handbook:See all →More chapters are coming! Notify meCHAPTER 11/195 Things You Need to Know to Truly Embrace a Data-Driven CultureCHAPTER 13/197 Ways to Motivate a Development TeamLiked this chapter? Awesome! We’ll be adding new content on this topic soon. Want to be notified? What`s your email address?
8 Major Challenges Faced By Data Scientists
Having helped several data scientists solve their data problems, In this article, we share some of their common challenges and how they can overcome them.
Organizations around the world are searching to arrange, process and unlock the need for the torrential levels of data they cook and transform them into actionable and value business insights. Hence, hiring data scientists – highly trained professional data science experts, is becoming super critical. Today, there’s without any business function that can’t take advantage of them. Actually, the Harvard Business Review has labeled data science because the “sexiest” career from the twenty-first century. However, no career is without its very own challenges, and as being a data researcher, despite its “sexiness” isn’t any exception. Based on the Financial Occasions, many organizations are failing to help make the best utilization of their data scientists when you are not able to give them the required recycleables they are driving results. Actually, based on a Stack Overflow survey, 13. 2% from the data scientists are searching to leap ship looking for greener pastures – second simply to machine learning specialists. Getting helped several data scientists solve their data problems, we share a few of their common challenges and how they may overcome them.
Using Data Science to Solve Human Problems: Abe Gong Interview
We recently caught up with Abe Gong, Data Scientist at Jawbone and thought-leader in the Data Science community. We were keen to learn more about his background, his work at Jawbone and his latest side projects – including thought-provoking insights on how the ROI on Science is evolving . . .
A – I am a hybrid social/computer researcher – thinking about human problems, and just how the best computational systems can occasionally solve them. I studied communications at BYU, then public policy, political science, and sophisticated systems in the College of Michigan. I am presently an information researcher at Jawbone, focusing on the UP fitness tracker. Practically speaking, which means I recieve to invest time building data systems to nudge individuals to form good habits and live healthier.
9 unusual problems that can be solved using Data Science
9 unusual problems that can be solved using Data Science can be tackled using big data and data science. The technology can solve many problems as libraries developed in one language will become compatible with other languages. Using data science to predict earthquakes is a challenging problem which researchers have been trying to solve for years but with little success. Data science can be used to prevent illegal immigration, identify suspicious activities in crowded areas, predicting locations and movements of nuclear weapons in enemy countries, recognizing and tracking terrorists, detecting violence, flying drones, guiding missiles etc. Translation of one programming language into another. For ex:- conversion of java into python and vice versa. Such a technology can solve many problems as libraries developed in one language will become compatible with other languages thus creating an open programming environment where people with different programming skills can collaborate to create fabulous applications.
Video advice: Problems Solved by Data Science – Intro to Data Science
This video is part of an online course, Intro to Data Science. Check out the course here: https://www.udacity.com/course/ud359. This course was designed as part of a program to help you and others become a Data Analyst.

How Data Science Solves Real-World Problems at Airbnb & More
From Airbnb to sports analytics and nonprofits, learn about three real-world problems solved by data science.
Following the 2003 book Moneyball (and corresponding 2011 film) grew to become effective, teams have recognized their information is more effective compared to what they had ever imagined. In the last couple of years, the Proper Innovations Group in the talking to firm Booz Allen Hamilton is doing exactly that — trying to transform the way in which teams utilize data.
Get in Touch
At first glance, data science seems to be just another business buzzword — something abstract and ill-defined. While data can, in fact, be both of these things, it’s anything but a buzzword. Data science and its applications have been steadily changing the way we do business and live our day-to-day lives — and considering that 90% of all of the world’s data has been created in the past few years, there’s a lot of growth ahead of this exciting field.
Using Data Science to Predict and Prevent Real World Problems
Do you have an interest in data science but lack an understanding of what, exactly, it can be used to accomplish in the real world? Read this article for a few examples of just how helpful data science can be for predicting and preventing real world problems.
That approach may also send the right amount of merchandise right locations, for example retailers with national outlets. The information may show an immediate rise in people wanting workout clothes in Colorado, while such sales decline or remain flat in Arkansas. Retailers can use that information to help keep stores adequately stocked.
A venture-backed technology company called CIPIO helps gym owners convert data into actionable strategies. Records may indicate that a particular member has only attended yoga sessions, and their attendance has gradually become less consistent. The system could recommend that gym staff inform that person about a class that combines yoga with brief periods of intensive cardio. That suggestion could raise interest by presenting a different opportunity.
Solving Problems with Data Science
There is a systematic approach to solving data science problems and it begins with asking the right questions. This article covers some of the many questions we ask when solving data science problems at Viget.
We frequently use blogs and articles to flow our work. They assist spread our understanding and also the training we learned while focusing on a task to peers. I encourage every data researcher to interact using the data science community by attending and speaking at meetups and conferences, publishing the work they do online, and increasing a helping hands with other curious data scientists and analysts.
- What is the problem we are trying to solve?
- What are the approaches we can use to solve this problem?
- What is the best way to communicated and circulate our results?
Communication
Then try explaining the problem to your niece or nephew, who is a freshman in high school. It is easier than explaining the problem to a third-grader, but you still can’t dive into statistical uncertainty or convolutional versus recurrent neural networks. The act of explaining the problem at a high school stats and computer science level makes your problem, and the solution, accessible to everyone within your or your client’s organization, from the junior data scientists to the Chief Legal Officer.
How data science is used to solve real-world business problems
We delve into how data science provides business solutions.
Since many companies don’t store data within an organised manner, these exploratory projects frequently require a lot of data preparation. In these instances, the information researcher should do extensive try to turn disparate data sources right into a coherent dataset they are able to explore to find formerly unrecognised possibilities.
- Innovation – Replacing old solutions with new ones:
- Prototyping – Creating new services:
- Continuous Improvement:
- Data-Value Exploration:
- “Crisis” Problem-Solving
- Step 1: Finding the Business Case
- Step 2: Data Collection & Engineering
- Step 3: Data Modelling
- Step 4: Operations/Production
How do data scientists create value for businesses?
The business world leverages data science for a wide variety of purposes. Between finance, retail, manufacturing, and other industries, the number of ways that businesses can leverage data science is huge, and growing; however, all businesses ultimately use data science for the same reason—to solve problems. Possessing both technical and practical skills, business-focused data scientists understand how to identify which business-relevant problems can best be solved by their particular abilities.
How Data Science Will Help Solve Many Of The World’s Most Pressing Challenges
With modern data analysis, we can reduce air pollution, widen access to legal aid and lower unemployment.
An NGO supplying free legal counsel to underprivileged communities within an African country is swamped with demands: they just don’t be capable to directly react to every question through their volunteer legal network. Just how can they effectively increase the help they could give?
In 2015, the United Nations set out a plan to tackle some of the world’s most pressing global challenges by the year 2030. It identified 17 individual issues that are impacting the global community and environment – labeling them its Sustainable Development Goals (SDGs). The 17 SDGs covered a wide range of areas including reversing the impacts of climate change – arguably the most pressing global issue of our time, threatening as it does the lives and livelihoods of billions of people worldwide.
Data Science
The goal of the Data Science program is to prepare students for careers that explore patterns in large data sets and identify potential trends and insights. The program will teach students skills in programing, modeling, machine learning, data visualization, and database structures, and to assess how data can be used to solve novel problems. Students will also learn about the ethical, moral, and societal implications of data science. The program will focus on teaching students to uncover insights through manipulation of large datasets. Graduates will know how to effectively expose complicated problems related to the management, analysis, and dissemination of vast amounts of information; and once exposed, how to define, discuss, and solve problems.
Video advice: Solving real world data science tasks with Python Pandas!
In this video we use Python Pandas \u0026 Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.

What problems does data science solve?
Data science solves real business problems by utilising data to construct algorithms and create programs that help in proving optimal solutions to individual problems. Data science solves real business problems by using hybrid models of math and computer science to get actionable insights.
How can data science be used to solve real world problems?
Whereas traditional analysis uses structured data sets, data science dares to ask further questions , looking at unstructured “big data” derived from millions of sources and nontraditional mediums such as text, video, and images. This allows companies to make better decisions based on its customer data.
What are some real world problems that need to be solved?
- Climate Change.
- Health Care.
- Food Insecurity.
- Homelessness.
- Sustainability.
What problems can be solved?
Solutions to the World's Issues
- End poverty.
- End hunger and improve nutrition and sustainable agriculture.
- Promote well being for all ages.
- Ensure equitable and quality education.
- Achieve gender equality.
- Ensure water and sanitation for all.
- Ensure access to modern energy for all.
What are the top 10 problems in the world?
The 10 biggest problems in the world today, according to...
- Climate change and destruction of natural resources (45.2%)
- Large scale conflict and wars (38.5%) ...
- Religious conflicts (33.8%) ...
- Poverty (31.1%) ...
- Government accountability and transparency, and corruption (21.7%) ...
- Safety, security, and well being (18.1%) ...
Related Articles:
- How Can Science Help Solve Environmental Problems
- How To Solve Work Problems In Science
- How Can Science Help Us Understand And Solve Environmental Problems
- How To Solve Chemistry Problems
- How To Solve Gas Law Problems In Chemistry
- How To Solve Log Problems In Chemistry
You may also like

Is Civil Engineering A Good Career In Usa

How Does Chemical Thermodynamics Relate To Physics

How To Check Waitlist Position On Harmony School Of Innovation
Add comment, cancel reply.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.

Recent discoveries

Would You Like To Travel Into Space Ielts Speaking

How Much Space Is Required For A Workstation

______ Is Regarded As The Father Of Modern Geology

Will The Innovation For Homeland Security Work
- Animals 3041
- Astronomy 8
- Biology 2281
- Chemistry 482
- Culture 1333
- Entertainment 1139
- Health 8466
- History 2152
- Human Spaceflights 1209
- Physics 913
- Planet Earth 3239
- Science 2158
- Science & Astronomy 4927
- Search For Life 599
- Skywatching 1274
- Spaceflight 4586
- Strange News 1230
- Technology 3625
- Terms and Conditions 163
Random fact

Chandra Reveals Extended Hard X-ray Emission from the Galactic Nucleus

IMAGES
VIDEO
COMMENTS
Data at Work: 3 Real-World Problems Solved by Data Science By Patrick Smith At first glance, data science seems to be just another business buzzword — something abstract and ill-defined. While data can, in fact, be both of these things, it's anything but a buzzword.
33 unusual problems that can be solved with data science Automated translation, including translating one programming language into another one (for instance, SQL to Python - the converse is not possible)
Data science solutions can show developers opportunities where increased interest and sales are simply hidden within the product or service itself. Discovering this is a direct result of a focused ...
The research problems to handle noise and uncertainty in the data:- 4. Identify fake news in near real-time: This is a very pressing issue to handle the fake news in real-time and at scale as the fake news spread like a virus in a bursty way. The data may come from Twitter or fake URLs or WhatsApp.
A data science project is a practical application of your skills. A typical project allows you to use skills in data collection, cleaning, analysis, visualization, programming, machine learning, and so on. It helps you take your skills to solve real-world problems.
A true data science problem may: Categorize or group data Identify patterns Identify anomalies Show correlations Predict outcomes A good data science problem should be specific and conclusive. For example: As personal wealth increases, how do key health markers change? Where in California do most people with heart disease live?
Problem Solving as Data Scientist: a Case Study | by Pan Wu | Towards Data Science Write Sign up Sign In 500 Apologies, but something went wrong on our end. Refresh the page, check Medium 's site status, or find something interesting to read. Pan Wu 309 Followers Senior Data Science Manager @ Meta Follow More from Medium Zach Quinn in
:) Now that we have all the data in hand, we will move on to creating a scoring algorithm. Step 2: Scoring System The next part is to score the candidates on the following parameters: Rank (25 points) Number of problems solved (25 points) Reputation (25 points) Followers (15 points) Activity (5 points) Contributions (5 points)
Using data science to predict earthquakes is a challenging problem which researchers have been trying to solve for years but with little success. A solution to this problem can save thousands of innocent lives and revolutionize disaster management. 7.
Everstream Discover's tools assist companies in ensuring that their supply chains are free of goods made with forced and child labor. 4. Sophia Genetics. For weaving health data from varied ...
Break down problems into small steps One of the essential strategies for problem-solving is to break down the problem into the smallest steps possible — atomic steps. Try to describe every single step. Don't write any code or start your search for the magic formula. Make notes in plain language.
The Top 5 Data Science Challenges | Towards Data Science Write Sign up Sign In 500 Apologies, but something went wrong on our end. Refresh the page, check Medium 's site status, or find something interesting to read. Matt Przybyla 6.3K Followers Sr/MS Data Scientist. Top Writer in Artificial Intelligence, Technology, & Education.
Start by writing down the problem without going into the specifics, such as how the data is structured or which algorithm we think could effectively solve the problem. Then try explaining the problem to your niece or nephew, who is a freshman in high school.
9 Steps for Solving Data Science Problems | by Aayush Malik | Towards Data Science Write Sign up Sign In 500 Apologies, but something went wrong on our end. Refresh the page, check Medium 's site status, or find something interesting to read. Aayush Malik 93 Followers
Data science is, so far, a fairly unexplored method of tackling the world's most pressing issues. More effective collation and analysis of data, as well as strong leadership to create...
Challenges faced by Data Scientists 1. Data Preparation Data scientists spend nearly 80% of their time cleaning and preparing data to improve its quality - i.e., make it accurate and consistent, before utilizing it for analysis. However, 57% of them consider it as the worst part of their jobs, labeling it as time-consuming and highly mundane.
All of the data science case studies mentioned below are solved and explained using Python. Case Study 1: Text Emotions Detection If you are one of them who is having an interest in natural...
Data science solves real business problems by utilising data to construct algorithms and create programs that help prove optimal solutions to individual problems. Data science solves real business problems using hybrid math and computer science models to get actionable insights. It takes the risk of going into the territory of uncharted ...
The failure rate of data science initiatives — often estimated at over 80% — is way too high. We have spent years researching the reasons contributing to companies' low success rates and have identified one underappreciated issue: Too often, teams skip right to analyzing the data before agreeing on the problem to be solved. This lack of initial understanding guarantees that many projects ...
In the last 20 years, that is. To help you practice Python and interviewing skills, I selected three Python coding interview questions. Two are from StrataScratch, and are the type of questions that require using Python to solve a specific business problem. The third question is from LeetCode, and tests how good you are at Python algorithms.
In this article, two different mixed-integer programming approaches for the job-shop scheduling problem (JSSP) were implemented and solved using the Python library pyomo and the open-source solver CBC. The disjunctive model proved to be a better alternative for the original JSSP, although more complex real-world models might share similarities ...
Most DS (data science) issues are not company-specific. These challenges may include finding the right talent or solving basic issues revolving around getting the raw data organized, unknown security vulnerabilities, and more. In this blog post, we will discuss some of the key data science challenges in 2021 and solutions to address them. 1.
9 unusual problems that can be solved using Data Science. 9 unusual problems that can be solved using Data Science can be tackled using big data and data science. The technology can solve many problems as libraries developed in one language will become compatible with other languages. Using data science to predict earthquakes is a challenging ...
here are 20 data elements typically found in PHI and their associated fields in a data dictionary: 1. Patient ID: Unique identifier for each patient. Field name: PatientID. Data type: Alphanumeric. Length: Variable. 2. Patient Name: Full name of the patient. Field name: PatientName.