The Life of an AI Expert

Defining the AI expert, one might start with the definition of the base vectors, the prototypes of such professionals. Some candidate prototypes can be: the AI expert who works as an employee in a well-paid position at a tech giant, a famous professor at a top university being a well-known expert in the field of machine learning, or the leader of a company that specializes in a particular field of AI. As most of us, I am neither of the above, but I am somewhat a combination of the prototypes above. Therefore, I hope, through this article, I can offer some insights and present an example on the lifestyle of an AI expert.

To summarize the life of an AI expert in one word: analysis. In two words: constant analysis. The thirst for knowledge is a key motivating factor for me. To explain it with an analogy, understanding the physical, chemical, biological, ethological, sociological, artificial, and organic structures I meet with, is similar to what the Grand Unified Theory (GUT) means for a physicist. The main aim of GUT is to unify the fundamental interactions between particles into a single model.

In my case, the difference can be found in the scale. While GUT focuses on a model at a single scale, at the level of particles, I try to be multimodal and multiscale. As we change the scale from physics through chemistry, biology, human ethology, to sociology, the magnitude or aspect changes. Chemistry is based on the rules of physics, but we need special techniques to describe chemical processes.

Biology is based more or less on chemistry, but we use special techniques to describe the genotype and the phenotype of living beings. We arrived at the level of individuals. Ethology provides the techniques to study and analyze the behavior of these complex and selforganizing structures built up from atoms. Some of the high-level living beings are not operating individually and organize themselves into smaller or bigger groups, like an ant colony, a shoal of fish, a pack of wolves, a cow herd, a village of people, or an office of people.

Physics is the first gate to pass. The world that we live in is based on more or less simple rules of physics. To be more exact, on the scale that we live on, most of the physical phenomena can be described with linear or quadratic relations and relatively simple distributions.

However, physics becomes much more complex when we change the scale to the atomic or to the galaxy magnitude. How is it related to AI? There are already published results about estimating fluid dynamics with artificial neural networks. These days I work on the estimation of the properties of scattered light in the field of Raman spectroscopy to identify certain materials. The first thing to do here is to understand the underlying physical process in order to define an algorithm that catches the essence of the physical process.

Another project I work on these days can be found in the field of petrochemistry. For context, it is not biochemistry but it was a few hundred million years ago. These days, it is more organic chemistry according to that huge meteorite. The task in this case is to optimize the production process. I work closely with experts in this field. We work with a relatively complex model of the production line and later hopefully on the real-time process. The goal in this case is to optimize the models according to an objective function, which can be cost minimization, profit maximization, quality improvement, etc.

Communication is an essential tool for us that can be helpful when organizing individuals into groups or society. Its most adequate and long-lasting form is the written text. I had the chance to work on a software product that relies on the latest results of computational linguistics and semantic modeling. The goal of semantic modeling is to develop algorithms that model the meaning of written text of natural language. To explain it with an example, let us take a look at the following two questions.

When is the restaurant open? What are the opening hours of the restaurant? The questions are formulated differently but have the same meaning. This is what the algorithms should catch. This task is solved with the help of semantic encoders. Semantic encoders assign a semantic vector to a text (question, answer, sentence, paragraph, etc.) in a way the vector holds the meaning of the text. It means that two different texts with similar meanings should be mapped to similar semantic vectors. We apply such algorithms to implement a semantic search engine that helps companies with information as an asset to manage their company documents.

A less business-focused but more society-oriented software I worked on is the MediaBubble project. The project was funded by Google DNI. The main goal of MediaBubble is to help people extend their filter bubble. It is based on the phenomenon that most people – let us call them online news readers – acquire their daily set of information from a single news source.

The problem is that such readers can be easily influenced as they have no chance to deeper analyze the information, and no chance to fact-check. Our software worked on the Hungarian online media. We developed web scrapers that collected the news from the major news portals. Then semantic encoders and clustering techniques have been applied in order to conduct real-time topic detection. With the help of this information, we were able to present the reader articles outside their filter bubble. This way, the readers had the chance to extend their bubble and be less influenced.

The main drawback of written text is that it misses an important communication channel, the metacommunication. Fortunately, there are already existing techniques to analyze human meta-communication from multiple aspects. My company is involved in a project titled AIMLP. The project goal is to estimate leadership competencies based on video, audio, and EEG signals. The video signals are processed with emotion and gesture detection algorithms. I guess that facial expression detection algorithms are well known for everyone from the bounding boxes on faces indicating happiness, sadness, fear, disgust, neutral, etc.

Gesture detection is a plus from our side, as unlike facial expressions, gestures are more instinctive and cannot be controlled consciously. We consulted with experts in this field and identified the essential gestures that can be relevant in this use-case. Such gestures are hands in the pocket, touching the nose, hands being kept near the body, straight posture, bent posture, etc. Having the list of relevant gestures identified, we developed a video processing software that detects the mentioned gestures.

A commercial EEG device has also been involved in the project in order to detect focus, stress, or neutral mind states. The audio processing is done by the partner company Sestek, as we work on this project in a consortium in the frame of a grant. The consortium lead is the Singaporean company 8nalytics, as they have expertise in the field of behavioral science. If you are interested in our technology, you may take a look at the project.

The rest of my observations and conclusions in the field of ethology and sociology are credited to survival in an office environment. According to one of my professors, in such an environment, individuals follow a twofold strategy of cooperating and competing at the same time. It means some good guesses can be helpful to improve a career.

One may also think about the presence of the two forms of behavior regarding Evolutionary Stable Strategy (ESS).

ESS is a term in the field of biological modeling and evolutionary game theory.

As an example, imagine a bird population. There are two types of behavior when two males meet in the mating period. Option one is to pose, option two is to fight. If all the birds take option one, then the population will be weak the in long term, as nothing ensures that the strongest survive. If all the birds take option two, then no males will be left for reproduction. The truth is somewhere in the middle, as evolution may aim more for adaptability. There will be posers and there will be fighters.

The question lies in the ratio. What is the ideal ratio for a population to be the most effective? As with every concept, ESS can also be generalized for more options and can be applied to the human race in, e.g., an office environment. We just finished the proof of concept of an Industry 4.0 project. The goal in this case is to monitor the production line with a smart camera for process monitoring.

We embedded state-of-the-art computer vision algorithms in the system that conducts object detection. Such algorithms are capable of identifying a class of objects that have been shown the algorithm before. The actual classes are forged material, molded material, spring, and gumi ring.

The outcome of the machine learning algorithm is then used to record the point of time when a particular part has been mounted to the product, as we talk about an assembly line. This way we can understand the change in time and are also able to detect if a part has been skipped. A typical problem in our case is when a gumi ring is omitted from the product, which can lead to a serious drop in the pressure.

As the product is actually a train brake, it can be problematic, as the brake force significantly decreases in this case. Fortunately, the brake system is designed in a way that the other brake devices can compensate for the breakdown. And at last, some words about the project of my heart, the electronic nose. The idea of our electronic nose is to develop a device that is able to identify smells and odors. The original plan was to develop a breathalyzer to identify COVID-19 infection. Although the prototype of our device was ready, we did not manage to find a medical partner to conduct the trials, because the system was overloaded. As we had the device, we started to measure smells of objects or fluids that we found in the lab. The first trial was to identify if there is sugar in the coffee or not, which task we managed to solve. Then we did experiments in the food industry to identify spoiled meat, milk, etc. We also work with the police to find drugs and explosives.

You may take a look at my homepage for more information on our results in this topic. As you may have guessed, I am not an employee with a 9 to 5 type of job. I am an entrepreneur. I manage my own company. It means that while being an AI expert I also have to manage my team and build the business. High-quality software is to be delivered. We have responsibilities, bugs, and warranty periods. This is where I learned how to build complex software systems, run a business, manage employees, manage projects, and manage clients.

Artificial intelligence can be found in the intersection of mathematics and computer science. It means that if you want to be good at it, you have to be both a mathematician and a computer scientist. Mathematics is needed to know what to implement. Computer science is needed to be able to implement it. This is what my degree is based on.

But I did not stop learning. I think that lifelong learning is important. This is the reason why I decided to have a half status at the University. Here I work with students. They learn from me, and I learn from them. Fair deal.

The disadvantage of my approach to a hybrid life is the stress factor. Its advantage is that I have the opportunity to see the world from several different aspects. When doing the balance, I find it worth it.

You may know the classical slogan: data is the new oil. To train a machine learning algorithm, you need data. You need a lot of data and relevant data. The essential property of machine learning algorithms is that these algorithms provide good quality estimations in a familiar environment, on the domain where the training data resides. This property can be explained by the difference between interpolation and extrapolation. Interpolation always has the chance to provide better quality estimations than extrapolation. AI experts work in a similar way. You train them, and feed them with information. The more domains are covered by the training data, the better estimations and predictions can be provided by them. This mechanism is similar to image augmentation, where the images shown to a neural network are enriched artificially, which technique leads to an improvement in the final quality.

You may conclude that I work maybe too much. Yes. It is true. The reason is that I love my job. On the other hand, I have some experiences outside the matrix. I love my family. I try to spend as much time as possible with my wife and daughter. I am pretty much into art, and sometimes do art. Sometimes I do art with my daughter. These days more photography. Decades before painting and clay sculptures. Sometimes music, guitar or saxophone. I managed to finish writing my book, which is a series of interconnected short novels. The illustrations are to be done. A few weeks ago I bought a Kiel Boat (Hungarian notation). The plan is to reduce workload and do more sport.

We may finish with a citation from Einstein: “Imagination is more important than knowledge.” I agree. Imagination is crucial in the life of a researcher. I would add one thing: “The stable ground”. I think that one can be creative in a field if the essential knowledge is also available, is rich enough, and is represented properly.