Advancing Novel Textual Similarity-based Solutions in Software Development

Program for Development of Projects
in the field of Artificial Intelligence
(funded by Science Fund Republic of Serbia)

About project

News

The latest news from our project team will be published in this section.



4 scientific papers at the international IEEE conference TELFOR 2021

22.11.2021.

At the international conference "TELFOR 2021", researchers from the AVANTES project team will present 4 scientific papers from research on this project.
On Tuesday, November 23, 2021 (14:30, section: Software Tools 1), the paper "Software system for improving communication of children with disabilities in the Serbian language" (authors: Tamara Šekularac and Dr Dražen Drašković), will be presented.
On Wednesday, November 24, 2021 (17:00, section: Software Tools 6), an invited paper - "Semantic Similarity and Sentiment Analysis of Short Texts in Serbian", by Dr Vuk Batanović, will be presented. On the same day, the papers "US address classification based on text processing and machine learning" (17:20, section: Software Tools 6) by Ivana Munjas and Dr Vuk Batanović, and "Evaluation of text messages using convolutional neural networks" (8:30, section: Software tools 3) by Vladimir Otašević, Dr Dražen Drašković and Dr Boško Nikolić, will be presented.
All papers will be available on the IEEE Explore portal in December 2021, and abstracts can be read in the section "Results" > "Published Papers".



Our research team participates in the regional conference "DataScience 2021" and national summit

20.11.2021.

Six researchers from the AVANTES project team will take part in the "Data Science Conference 2021". A national "Data Science Summit" will be held in Belgrade on Monday, November 22, 2021, with the participation of researchers from relevant academic and scientific institutions, representatives of the IT industry, non-governmental organizations and international organizations, as well as representatives of the AI Institute. The topics of 11 round tables will include the following topics: NLP & Conversational AI, Ethical use of AI, Data-driven Sustainability, Model & Data Ops Implementation, Machine & Computer Vision, Collaboration possibilities between industry and academia, Commercial AI and ML, Open Data Projects and Possibilities, How to utilise AI Institute for ecosystem development, Impact of AI on Society, Seniority and benefits, where are the boundaries?

From 23th to 25th November, there will be a number of expert lectures by representatives of the IT industry from all over Europe, many workshops, and panel discussions.



AVANTES team in the network of European research institutions

06.10.2021.

Dražen Drašković, PhD, a member of our research team, received a Horizon 2020 cascading grant for the EUROPEAN FEDERATION OF DATA DRIVEN INNOVATION HUBS project (“EUHubs4Data”). In the two-year project for the period Jan. 2022-Dec. 2023, a research laboratory for data processing - "Belgrade Data Innovation Hub" will be formed at the University of Belgrade - School of Electrical Engineering, and two experiments on big data will be performed in collaboration with two other European research institutions. We hope that the national project AVANTES and the H2020 BEL-DIH are only the first step towards other national and European projects, in which our team will collaborate in the future.



Interview with our youngest member of the research team

01.10.2021.

Our youngest member of the AVANTES research team, Ms Marija Kostić is attending an internship for master and PhD students at the Google Research and Development Center in Amsterdam.In addition to engaging in a national research project in the field of artificial intelligence, she will complete her two-year master's degree in "Advanced Information Technology in Digital Transformation" (Master 4.0 program) this year, whereby she will achieve her second master's degree in information technologies.You can read the interview of the journalist of the weekly newspaper "Blic Žena" with Marija on this link.

TV show about science

27.08.2021.

In the show "From Science to Economy", we talked about our project AVANTES. You can watch the show at THIS LINK.

Annotation of the corpus

30.07.2021.

An annotation of the corpus for categorization of comments from the program code was published in the section "Resources".

Invitation to participate in the workshop

17.02.2021.

With the desire to make a short presentation about the goals and planned activities of our research team on the AVANTES project, but also other research and development projects in the field of natural language processing, machine learning, data analysis, we'll organize an online workshop on Thursday, February 25, 2021, at 12:00.

The workshop will consist of two sessions:
1) Presentation of the AVANTES project and research work in the above areas.
2) Round table, with discussions - representatives of ministries, NGOs, IT companies and academia.

We invite all interested researchers to apply via the following link.

Participation in the national conference "Serbian AI Meeting"

31.01.2021.

Members of the AVANTES project team participated in the national conference "Serbian AI Meeting" on December 18, 2020. More than 100 researchers from Serbia and our researchers, who work abroad, at universities, scientific institutes and well-known research and development centers of IT companies took part in this year's conference. The thematic areas that were covered: general artificial intelligence, formal logic and reasoning, machine learning and natural language processing. Slides of all lecturers can be found at the following link.

You can watch the full video of the event at the following link.



Vuk Batanovic defended his PhD dissertation

25.01.2021.

A member of our AVANTES team, Vuk Batanović, at the end of December 2020, defended his doctoral dissertation on "A methodology for solving semantic tasks in the processing of short texts written in natural languages with limited resources" under the mentorship of Prof. Boško Nikolić, PhD, and Prof. Miloš Cvetanović, PhD.

Congratulations to Vuk Batanović, PhD, on his dedicated work during his doctoral studies, and the preparation of his dissertation, and we wish him much success in his further research work.



Scientific article published in the prestigious scientific journal "PLOS ONE"

16.11.2020.

Scientific-research paper entitled "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts" (authors: Vuk Batanović, Miloš Cvetanović, Boško Nikolić), was published in the prestigious scientific journal PLOS ONE. The abstract of the paper and the link to the scientific paper can be viewed in section Results - Published papers.

Virtual PSSOH conference

27.10.2020.

Members of our project team, Zaharija Radivojevic and Vuk Batanovic, participated in the third PSSOH conference, entitled "Application of free software and open hardware" organized by the University of Belgrade - School of Electrical Engineering, where they presented the results of their work, as part of AVANTES project activities .

Exhibition dedicated to scientific projects in the field of artificial intelligence

10.10.2020.

An exhibition dedicated to scientific projects in the field of artificial intelligence, organized by the Science Fund of the Republic of Serbia with the Center for the Promotion of Science, opened on the Sava Promenade at the Kalemegdan Fortress on Friday, October 9. Each scientific project is presented with a poster that describe the research. Members of the AVANTES project team attended the opening ceremony of the exhibition and talked to visitors. The exhibition is open until October 23.

Kick-off project team meeting

31.8.2020.

Мembers of the project team held the kick-off meeting where they took over their tasks for the period of 3 next months.



Project AVANTES highly ranked

15.8.2020.

Within the Program for the development of projects in the field of artificial intelligence, the Science Fund of the Republic of Serbia will finance 12 projects. Out of 70 project proposals in the public call closed on 31.1.2020, 6 projects were selected from basic research and 6 from applied research. Our scientific team and project proposal achieved an excellent result of 91 points on the final ranking list of projects and were ranked in a high second place out of 12 projects that will be funded over the next two years.



Project information

Acronym: AVANTES

The result of close cooperation between researchers from seemingly distant scientific fields will be a new system which will facilitate the work of software engineers, as well as linguists who study the Serbian language.

Period: Sept 2020 - Sept 2022

Budget: 198,261.12 €

An interdisciplinary research team will develop an intelligent tool for recognizing the semantic similarity between parts of a software system written in programming languages and comments in natural languages. The system will be able to recognize code clones, , while a special research focus will be directed towards solving the problem of cross-level semantic textual similarity primarily in Serbian, with comparison with the results obtained for the English language. Within the scope of this project, new methods for program code analysis will be used, which include the use of machine learning techniques and artificial intelligence.

In addition to the tool for determining the similarity between the parts of the software and the comments, a group of software engineers and linguists will develop a new semantic search algorithm for exploring code using natural language input. One of the goals is also to establish a collection of data and models for automatic Serbian language processing.

The AVANTES project is of great importance for Serbia because researchers will create complex datasets and introduce innovations into existing technologies for processing the Serbian language, for which far fewer resources are currently available than for other international languages such as English. This will facilitate not only the work of software engineering in Serbia, but also of linguists who study the Serbian language.

Team members

The team is multidisciplinary and consists of researchers from the School of Electrical Engineering, University of Belgrade, the Faculty of Philology, University of Belgrade and the Innovation Center of the School of Electrical Engineering.



Boško Nikolić, PhD

Principal Investigator

Zaharije Radivojević, PhD

Member of the project team

Dražen Drašković, PhD

Member of the project team

Vuk Batanović, PhD

Member of the project team

Vladimir Jocović, PhD candidate

Member of the project team

Tamara Šekularac, PhD candidate

Member of the project team

Marko Mićović, PhD candidate

Member of the project team

Uroš Radenković, PhD candidate

Member of the project team

Jelica Cincović, PhD candidate

Member of the project team

Adrian Milaković, PhD candidate

Member of the project team

Dušan Stojković, PhD candidate

Member of the project team

Aleksa Srbljanović, BSc EE

Member of the project team

Maja Miličević Petrović, PhD

Member of the project team

Radoslava Trnavac, PhD

Member of the project team

Tanja Samardžić, PhD

Member of the project team

Borko Kovačević, PhD

Member of the project team

Resources

This section will show the resources that will be published during the project.

Corpus annotation

objavljeno 30.07.2021.

Corpus annotation for categorization of comments from program code

Published papers

Papers from conferences and scientific journals will be published in this section.

  • V.Batanović et al., "Open Resources and Technologies for Serbian Language Processing"

    V.Batanović, N.Ljubešić, T. Samardžić, M. Miličević Petrović, "Open Resources and Technologies for Serbian Language Processing" (in Serbian), PSSOH conference, Belgrade, Oct. 2020
    Link: https://zenodo.org/record/4113230#.X6GcaohKiUk
    Abstract: The openness of language resources and tools is of great importance for increasing the quality and speed of development of technologies for computer processing of natural languages. This paper presents open resources for the processing of the Serbian language. Hand-annotated corpora are described, as well as a wider range of tools and computer models, including a web service that makes them easy to use.

  • V. Batanović, M.Cvetanović, B.Nikolić, "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts", PLoS ONE 15(11): e0242050. https://doi.org/10.1371/journal.pone.0242050
    Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242050
    Abstract:
    Choosing a comprehensive and cost-effective way of articulating and annotating the sentiment of a text is not a trivial task, particularly when dealing with short texts, in which sentiment can be expressed through a wide variety of linguistic and rhetorical phenomena. This problem is especially conspicuous in resource-limited settings and languages, where design options are restricted either in terms of manpower and financial means required to produce appropriate sentiment analysis resources, or in terms of available language tools, or both. In this paper, we present a versatile approach to addressing this issue, based on multiple interpretations of sentiment labels that encode information regarding the polarity, subjectivity, and ambiguity of a text, as well as the presence of sarcasm or a mixture of sentiments. We demonstrate its use on Serbian, a resource-limited language, via the creation of a main sentiment analysis dataset focused on movie comments, and two smaller datasets belonging to the movie and book domains. In addition to measuring the quality of the annotation process, we propose a novel metric to validate its cost-effectiveness. Finally, the practicality of our approach is further validated by training, evaluating, and determining the optimal configurations of several different kinds of machine-learning models on a range of sentiment classification tasks using the produced dataset.

  • S. Tubić, M. Cvetanović, Z. Radivojević, S. Stojanović, "Annotated functional decomposition", COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, pp. 1-13, March, 2021
    Link: https://onlinelibrary.wiley.com/doi/10.1002/cae.22394
    Abstract: Experiences gained from the domain‐specific courses showed that students focus mostly on how to implement solutions and less on what must be considered within the solution. In the case of information systems-related courses, students focus on system development using specific languages and frameworks while often disregard the required logical checks and constraints. This paper introduces annotated functional decomposition (AFD) to assist students in overcoming the challenge of understanding the logic of an information system. AFD leverages methodological concepts from computational thinking and represents a problem decomposition approach that is extended with additional levels of decomposition. These levels of decomposition are orthogonal and implemented with annotations that enrich a decomposition with information regarding control and data flow, as well as reuse and implementation details. AFD could be exercised with a supporting AFD Tool developed as an Eclipse IDE plugin that performs syntax and semantic checks along with the generation of UML sequential diagrams. The AFD Tool and its source code are available free of charge. Quantitative and qualitative evaluations of AFD Tool usage during an information systems course revealed that students who used AFD achieved higher average grades than those who used UML for solving the same problems, and moreover that students perceived AFD as easy to understand and use.

  • M. Kotlar, M. Punt, Z. Radivojević, M. Cvetanović, V.Milutinović, "Novel Meta-Features for Automated Machine Learning Model Selection in Anomaly Detection", IEEE ACCESS (Volume: 3), pp. 89675 - 89687, June, 2021
    Link: https://ieeexplore.ieee.org/document/9461173
    Abstract: A growing number of research papers shed light on automated machine learning (AutoML) frameworks, which are becoming a promising solution for building complex machine learning models without human expertise and assistance. The key challenge in enabling AutoML frameworks to build an efficient model for anomaly detection tasks is to determine the best underlying model for a given task and optimization metric. The meta-learning approaches based on a set of meta-features that describes data properties can enable efficient model selection in AutoML frameworks. The existing meta-learning approaches based on statistical and information-theoretic meta-features require large amounts of data and computational resources to extract data properties. This paper proposes a novel set of meta-features for model selection in anomaly detection tasks based on domain-specific properties of data which overcomes the shortcomings of existing meta-features by introducing simple but effective meta-features that can be efficiently extracted or estimated by using a low amount of data. Experiments with 63 datasets from different repositories with varying schemas show that the proposed set of meta-features achieves an accuracy of 87% for model selection, while the achieved accuracy for simple meta-features is 74%, for statistical meta-features 68%, for information theory meta-feature 70%, and for a comprehensive set of meta-features by pyMFE 73%. This demonstrates that the proposed set can be adopted by AutoML frameworks across a diverse range of domains.

  • V. Batanović, "Semantic Similarity and Sentiment Analysis of Short Texts in Serbian", 29th Telecommunications Forum "TELFOR 2021", IEEE Serbia & Montenegro, November 2021
    Link: https://www.telfor.rs/program-naucne-sekcije
    Abstract: This paper presents an overview of the open access datasets in Serbian that have been manually annotated for the tasks of semantic textual similarity and short-text sentiment classification. In addition, it describes several kinds of statistical models that have been trained and evaluated on these datasets and discusses their results.

  • V.Otašević, D.Drašković, B.Nikolić, "Evaluation of text messages using convolutional neural networks", 29th Telecommunications Forum "TELFOR 2021", IEEE Serbia & Montenegro, November 2021
    Link: https://www.telfor.rs/program-naucne-sekcije
    Abstract: The impossibility of defining the numerical value of text is a major issue when it is necessary to analyze users’ comments and feedbacks about provided services. This paper presents a tool able to overcome this issue. The developed tool, which evaluates text messages, is based on a convolutional neural network. The main goal of this paper is to present the result of the research and tool’s performance in solving the problem of determining the value of text messages.

  • I. Munjas, V.Batanović, "US address classification based on text processing and machine learning", 29th Telecommunications Forum "TELFOR 2021", IEEE Serbia & Montenegro, November 2021
    Link: https://www.telfor.rs/program-naucne-sekcije
    Abstract: Addresses represent a crucial type of textual data for real estate companies. In order to identify, fix, or remove incorrect entries, we categorize addresses into one of six predefined classes. In this context, we explore the effects of different text processing and classification methods. The best results are obtained by using non-linear classifiers with a combination of unigram and bigram features.

  • T.Šekularac, D.Drašković, "Software system for improving communication of children with disabilities in the Serbian language", 29th Telecommunications Forum "TELFOR 2021", IEEE Serbia & Montenegro, November 2021
    Link: https://www.telfor.rs/program-naucne-sekcije
    Abstract: This paper presents a software system designed as communication aid for children with disabilities. The system generates digitized speech using a dictionary of symbols in the matrix form, which can be extended and modified, as well as the corresponding audio recordings and visual representations of symbols. The creators of this system managed to eliminate the shortcomings of the existing software solutions, namely the lack of flexibility and accessibility, and inability to cater for individual differences. The system is realized as a multiplatform application using Xamarin technology.

Contact Us

Location:

Belgrade 11000, Bulevar kralja Aleksandra 73

Loading
Your message has been sent. Thank you!