Questions to ask in your next data science interview 👍
Does the prospect of interviews give you the heebie-jeebies. Sometimes you can’t forecast what a person is going to ask you in an interview room in some instances but they are always some edge cases. But when it comes to the data field, we have come across resources and questions that can help you answer the question, “Do you have any questions for me?” In addition to that I have used the same rubric for some interviews I had with some people, a couple of months ago or weeks ago.
What’s the culture of organization(o)?
This question means what are the characteristics that make the (o) what is. It can have several definitions like most things. We like the definition from Bamboohr YT channel, it is how you treat people. From our research it is communication, are your ideas being put into consideration. If something is done about it then feedback is taken. Working with others and seeing how your efforts are contributing towards the (o) mission could also be something to take into consideration. As well as celebrating colleagues efforts towards making this possible even in a small way goes a long way. Another crucial one, that could have lead to the great resignation is, are you being given opportunities to grow or is there? In other words, is there a growth structure? A clear path of transition from junior to senior depending on your performance on the tasks you have been given and moving to a team leadership roles for example Lead data engineer.
What your day to day is like?
- What are your Key Performance Indicators?
Key Performance Indicators are a set of quantitative measures that indicate that you are productive. We watched a sit com that is related to this. Check it out here.
Just these things!? Well, it’s not that black and white. Not all the time will the statistical model be useful as an example. Sometimes it will be dashboard, sometimes it be adding a column to a dataset with quartile normalization for instance, labeling clients as churned or not churned given a couple of properties, statistical tests for example helping the marketing department review which marketing campaign is going well or finding where COVID-19 is most prevalent overtime. If you get thank you notes or people from different departments keep asking for the recommendation system data then you are on the right track. Overall, its about being a ninja at translating a business question into a data science question.
- Do you have all the tools you need to do the job?
Sometimes you could give a suggestion. “We need an in-memory data infrastructure to achieve this.” Then if the (o) doesn’t have an existing data infrastructure then you will be responsible for making that including your team members. That could require but not limited to a database or an object storage to do that. As well as docker, a container orchestrator and or workflow scheduler sometimes just kubernetes CRON jobs could do the job. Wait a minute, a data scientist should know all know to use all these! Well, yeah if your a Fullstack Data Scientist, Machine Learning Engineer and or Data Engineer given the tools mentioned. Here’s an example of a general purpose infrastructure that you might need.
Example of general purpose data science infrastructure from Effective Data Science Infrastructure by Ville Tuulos. What does yours look like you can write it down and compare with what you have. The x on local means that can be achieved by your local machine. EKS means Amazon Elastic Kubernetes service the tools are YAML files for example deployment and service YAMLs.
Do you have a data infrastructure team?
This is often a red flag if this is absent. If it’s not there you’ll be part of the team. Then comes the friction, am I supposed to be doing data science/data engineering/machine learning engineering? If this exists already the next steps would be talking to engineering teams and stakeholders. In addition, there will be clear cut objectives and expectations prior.
How long have been around?
In our opinion, figuring out data infrastructure takes time if the team is new. They may not have certain components of a data infrastructure and there could be a plausibility you may be joining at a point where they are trying out a couple and haven’t settled on something. That means it will take a bit of time to be immediately effective. Hacking something flexible that can accommodate changes is often good.
What is the data stack?
A data stack is like a kitchen for data. An example would be how you want to group genes together to study similarities and differences based on their FASTA sequences. The sequences on their own are not edible. In other words, not immediately useful to a non data person.
You’re probably storing loads of sequences which is normally dependent on what kind of study you are doing. If you are fortunate to apply a perturbation and record changes. You’ll need a warehouse to store that data in one place. You can call that in a data warehouse since you have a specific goal in mind and here we established that we are using FASTA files. You could use BigQuery in this step or object storage like Amazon S3.
Moving right along, now to solve the problem to transform the data into a feature table. You can organize the files into proteins or RNA sequences. This will reduce the dimensionality of the dataset. They are several functions you can use to transcribe or translate the sequences. Along the way, maybe a docker container can help here. What would you do?
Thus far, we’ve covered data extraction, loading and transforming. What follows? You got it, analysis. We have a feature table where we can use clustering algorithms like Hierarchial clustering, T-SNE to visualize the data and get the inferences we need to answer our question. Followed by showing your results in plot/dashboard can be coupled with a workflow orchestrator like apache airflow or Luigi. Adding email alerts or present the results in a slack channel could help too.
How do they work with data scientists?
This question revolves around how are people working, together or in isolation. If its the latter, that shouldn’t be the case because when you want to ask how to deploy something or how to access the data lake and or data warehouse. The engineering directory in the wiki(if that exists) should let you know about that.
When you’re building out a new product, do you have a process for instrumenting the logs, building out the data tables, and putting them into your data warehouse?
This is putting it all together in case something was missed in other questions. If the team has some stuff in the works that involves the whole data infrastructure team this question will shed some light. How maybe Software Developers make something and their logs are written somewhere, the DevOps team could help with that then the Data Engineer makes data pipelines to get data into data tables as well as offloading them into a data lake or data warehouse which has specific data.
An example of pulling everything together. The components can be changed though.
That’s all we wanted to share. We hope these questions help you in your next interview with whichever data team you want to be part of. Just know these are not hard requirements and you might have other goals for joining a team. Whatever it is. We wish you all the best.
Build a data science career by Jacqueline Nolis & Emily Robinson
Effective data science infrastructure by Villie Tuulos
Black tides Twitter Spaces Session
Panel Session PyConKe 2022