How to become a data scientist

Data science is a fancy word in the market, and all of us are obviously interested in knowing more about it. Some of the working professionals are putting hard efforts to get the required skill set which is needed to be put in their resume to be called “data scientist resumes” according to their “data scientist friends”.Having close to a decade of industry experience in data science and worked significantly on training and mentoring folks on data science, I also get these queries from all over. When it comes to data science learning, I am a non-believer in the “one for all” model. The reason being one’s learning journey in data science should be personalized depending on their current skill set. To give an example — If a master of statistics person asks me about the learning path for data science, I would advise him/her to get hands dirty on programming knowledge/coding skills/databases/SQL, etc. On the other hand, if a computer science graduate asks me a similar question, I would advise him/her to get a good grasp of statistics/mathematics/hypothesis/probability theory, etc. Though, before coming to this step of deciding on how to learn data science, my suggestion would be to ask a few questions to yourself, which will help you understand “If data science is for you??”.The very first thing you should observe is, does a correlation exist between what you do currently and what happens in the data science/analytics/machine learning space. Assuming you are starting from zero, let me put it simply —” Machine learning is a way to make machines learn from data". For this learning to happen, data and methodologies are needed

You can refer to this YouTube video from Andrew NG to better understand machine learningLecture 1.1 — Introduction What Is Machine Learning — [ Machine Learning | Andrew Ng ]
To summarize this part, as a data scientist, your life will revolve around “data” and “methods” used to make machines learn. Hence, if you aspire to be a data scientist, your affinity toward data and coding should be high.
Yes…But What next?????
 About the learning path of data science, it has to be personalized, however, if I have to give a generic structure around it which can help people to get kick-off their journey, I will be more than happy. One of the important things to ensure here is to cover the breadth of a few things listed below:
  • SQL — This is one of the most important skills you should have if you want to become a data scientist. To improve your SQL skills and even to learn SQL from the beginning, efforts are needed. There are a lot of websites available where you can run SQL queries and practice. w3schools is one of my favorites for beginners, there are many more though. The link for w3schools is here. If you consider yourself a level above beginner, then you can install any RDBMS on your computer and play around with data sets. The link for a good open-source RDBMS system MYSQL is here. This Installer will help you in installing all the needed components.
  • Coding/Algorithm — You can be either from a coding background or non-coding background. R language needs to be part of your data science resume in any case. For people from non-coding backgrounds, the good news is, R is relatively easy to learn. You can install R studio (one of the most sought-after tools in the industry) and start practicing the R language. Some useful posts on the process for installation of R Studio. Please refer to the link here for more detailed steps. It's quite easy to do. Also, if you do not want to install R and R studio for now, you can practice online as well. Refer to this link. This book available for free will help you in starting to get your hands dirty in R
  • Statistics — Statistics is one of the skills you must not ignore before jumping into a data science use case. To make your journey smoother and assuming you are a beginner, I advise you to read the ISLR book. Ensure you finish this book at least once before moving to the next step. The purpose of the ISLR book is to provide an introduction to statistical learning methods. It is aimed at upper-level undergraduate students, master's students, and Ph.D. students in the non-mathematical sciences. This book has R practice materials as well which will help you understand the statistical concepts and get better at R
  • Visualization — You should also be good at data exploration using different techniques like charts, graphs, distribution, etc. For this part try to get good command over R and Python libraries that support visualization. For example — ggplot2 in R and matplotlib/seaborn in python. If you can get your hands on specific visualization tools like Power BI, Tableau, etc. it's an added advantage.
  • Model building — Ahhh!! So, you are equipped with query language, R codes, and statistics understanding, hence you qualify to touch your first data science use case. Congratulations!!!! Do not stop learning in any of the above-mentioned fields from all the sources you have, however in parallel, start to make some simple machine-learning models like linear regression, logistic regression, decision trees, etc. You will find packages in R that will run these models for you. Please try to understand what is going on internally when your run these models on your data. For example, you should be able to explain R square and adjust R square if you are running a linear regression model. Do not depend too much on the “in-built library”. There are a few websites out there where you can find data to practice your learning. Please refer to link 1 and link 2
Once you start getting a grasp of how to run a machine learning model, then go to different forums where multiple people are working on the same dataset. Kaggle is one of the good platforms, you can start from there. Create a free account and start practicing on the data provided. The most important thing to learn on this platform is what others are doing with the same data. How are they approaching the same problem statement? How are they using the features? Are they able to think differently? How? Why? Please allow yourself to digest this learning. Remember, learning is a gradual process. If you follow the above steps properly and regularly, and you are able to answer on below points, then you can put data science as a skill set in your resume.
What have you done in entire model building process?Why you have done a particular step in model building, what is usability of this?How you improved your model?How is your model beneficial for business?
Learning is always a continuous gradual process and hence keep practicing, keep learning, and keep improving. There are always new challenges and concepts coming in the data science world., prepare yourself for that. Wish you all the best! Thanks for reading, share it with friends if you like the story. Join our data science community

Cheers

Launch your GraphyLaunch your Graphy
100K+ creators trust Graphy to teach online
𝕏
Unfold Data Science 2024 Privacy policy Terms of use Contact us Refund policy