top of page

How to be a good Data Scientist?

From the outside looking in, the world of data engineering, data analytics, and data science – in fact of all things data – can seem quite daunting. For those unfamiliar with the algorithms, the models, and the coding languages, it seems that these fields are “too technical” or “too scientific”.

 

While mathematics, statistics, computer science, and any other number of STEM subjects form the foundations of data science, there is still an entire creative and artistic side to this work.


A truly great data scientist is someone who embodies core aspects of all 3 pillars.

 

Three Core Areas of Data Science




 

Theoretical: Mathematics and statistics provide a strong foundation on which most data science models are built on. These are the deductive reasoning and logic skills that are needed to successfully navigate all the complexities of a data science project.

 

Technical: Knowing what to do is only will only take you to a certain point. Implementation is as critical a process of any data science project. In this rapidly evolving landscape, having the agile ability to leverage the variety of tools, software and coding languages at ones disposal is essential.

 

Creative: This is the pillar I would say is most crucial of all. Creative problem solving is the crux of any data science project. A truly brilliant data scientist is one who can extract the most interesting and relevant story from the raw data. Someone, who can go even beyond, and influence stakeholders in leverage their analysis to make data drive decisions.


The scientist is also a random variable.



 

While it may seem that the star data scientist is one who can code an optimizer in pyspark or who can explain the identify the benefits of hyperparameter tuning to improve a models performance, the star that will shine brightest is the one who treats their work like art. The one who will go to a detailed granularity and focus on altering their approach with great precision to carve out a truly valuable message from data in front of them.

 

Let’s consider two data scientists with the same raw dataset. On their analytics journey, they will come across many, many different decisions they need to make. Which variables to consider for their analysis? Should they exclude missing data points or impute them? If imputing, which methodology to use? In k-means clustering, which distance method is the most apt? And the list goes on… While some of these questions do have technical approaches, they do not always have the right or same answer.

 

Data science at its core is essentially a tool to understand the world better. It is without doubt a very technical and scientific field, however it is just as creative and artistic.

6 views
bottom of page