Top Data Science Trends in 2022 — According to an AI

January 10, 2022

Blog post by Milán Janosov

At this time of the year, we usually hear many expert opinions and predictions about the biggest trends for the coming new year. The same goes for my profession, data science as well. However, this time, instead of reading review articles, I decided to ask this crucial question to Open AI’s GPT-3 text completion engine. Here comes the answer, unchanged, unedited, and probably provoking a few questions.

The seed sentence was the following: “Top data science trends in 2022.”

GPT-3’s top 10 prediction:

The data science trends in 2022 are likely to be quite different from the trends today. The current trends will evolve, and new trends will emerge. At the same time, there will be changes in the approach to data science problems, as well as in the tools.

The major trends for 2022 are:

1. Data science will steadily move from the experimental phase to the practice.

Data Science has already gone through several phases:
– The experimental phase (2006–2017)
– The learning phase (2017–2022)
– The practice phase (2022–2027)

At the experimental phase, we worked with a few data scientists and a limited amount of data. At the learning phase, we will already have a data scientist in each company, and companies will have a lot of data. In the practice phase, data science will be used for solving practical problems and for making decisions.

2. Data science will become a part of overall decision-making process.

The data science will not only help to make decisions, but it will also help companies to make decisions faster.

3. There will be increased demand for data scientists.

The number of data scientists will continue to grow, but there will not be a shortage of data scientists. Good data scientists will be in high demand, but this will not be a major problem. There will be a good supply of data scientists.

4. There will be a steady demand for tools for data science.

The tools for data science will not change rapidly, and the demand for these tools will remain high. This will be a good time for companies that are developing tools for data science.

5. There will be increased demand for data science consultants.

The demand for data science consultants will continue to grow. There will be a lot of companies that will be interested in hiring data science consultants.

6. Data science training will be a part of the curriculum in universities.

There will be a lot of courses on data science in universities, as well as online courses. There will be more people who will be interested in learning data science.

7. There will be increased demand for data scientists who can work with different tools.

There will be a lot of tools for data science, and the data scientists will have to be able to switch between the tools.

8. There will be many open source data science tools.

There will be a lot of tools for data science, and some of them will be open source. Data scientists will be able to create their own tools, and they will be able to share them with other data scientists.

9. There will be increased use of Artificial Intelligence (AI) in Data Science.

The use of AI will increase, but it will not replace data science. The AI will help to make decisions faster and to improve the results of the data science.

10. There will be increased use of collaborative platforms for data science.

The data scientists will be able to share the data and to work on the data together. This will help to solve the problems faster and to improve the results.

Author’s notes:

  • While there are a few inconsistencies between the top 10 predictions, the different phases of the development of data science or the increasing role in higher education are quite interesting points, besides the well-expected predictions on the generic growth and expansion of the market.
  • To contrast these predictions, I also checked a snapshot of the current situation by visualizing data science trends on Twitter in December 2021. For this, first I collected a tweet sample of about 15k tweets containing the keyword #datascienece. Then I extracted the hashtags from each tweet and built this co-occurrence network of the 7200 unique hashtags. In this network, every node represents a hashtag, and they are linked if they cooccurred in the same tweet. The strength of the link is proportional to the number of co-occurrences. To decrease noise, I only kept edges with a weight of 2 or higher. This way I arrived at a network of 1,102 nodes and 7,088 links are visualized in the attached figure.

Figure 1: Data science trends on Twitter in December 2021 — tweet hashtag co-occurrence network. Each node represents a hashtag, while connections measure the number of times two tags were co-tweeted. Node size corresponds to node degree, while node color shows the network communities.