User Tools

Site Tools


dlcp21:abstracts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
dlcp21:abstracts [22/06/2021 00:03] – [Identifying partial differential equations of land surface schemes in INM climate models with neural networks] admindlcp21:abstracts [22/06/2021 16:14] – [Performance of convolutional neural networks processing simulated IACT images in the TAIGA experiment] admin
Line 114: Line 114:
  
 The problem of forecasting the evolution of the Covid-19 pandemic is extremely relevant because of the need for planning hospital beds demand and containment policies. Given the limitation of the available temporal datasets of the Covid-19 evolution, the complexity of machine learning algorithms to be created for pandemic time series data must not be high, and their efficiency mainly depends on the correct selection of highly-meaningful features. We view as one such feature the number of tweets where Internet users report having Covid-19. The task to extract such tweets from the internet is complicated by the lack of a Russian tweet dataset for training the machine learning models for extracting this category of tweets. In this work we present a corpus of about 10000 tweets labelled with the following classes: the Ill class when the authors declare that they are ill; the Recovered class when the authors declare that they have been ill and recovered, and the Others class of all other cases. Using this corpus, a Data-Driven model based on XLM-RoBERTa language model has been trained. It demonstrates the F1-macro accuracy of 0.85 for binary task (class 1 – Ill / Recovered, class 2 – Others), and 0.60 for the five-class task (Ill with high/low confidence, Recovered with high/low confidence, Others). These results outperform the RuDR-BERT model by 5 to 7%. The XLM-RoBERTa model thus obtained has been applied to the binary classification task for the unlabeled data of 486 000 tweets. Based on the model’s predictions, a curve has been plotted of the COVID cases in Moscow from January 1, 2020 to March 1, 2021. This plot has been compared to the official statistic on the confirmed cases, and correlation analysis of these two curves, shifted from one another by 1 to 5 days, has been performed. This analysis shows the highest correlation to be when the true curve is shifted 4 days to the future relative to the predicted one. Therefore, the data from the corpus collected are 4 days ahead of the official statistic. Thus, the numbers of tweets collected have proven to be a helpful input feature to use within pandemic forecasting models. The problem of forecasting the evolution of the Covid-19 pandemic is extremely relevant because of the need for planning hospital beds demand and containment policies. Given the limitation of the available temporal datasets of the Covid-19 evolution, the complexity of machine learning algorithms to be created for pandemic time series data must not be high, and their efficiency mainly depends on the correct selection of highly-meaningful features. We view as one such feature the number of tweets where Internet users report having Covid-19. The task to extract such tweets from the internet is complicated by the lack of a Russian tweet dataset for training the machine learning models for extracting this category of tweets. In this work we present a corpus of about 10000 tweets labelled with the following classes: the Ill class when the authors declare that they are ill; the Recovered class when the authors declare that they have been ill and recovered, and the Others class of all other cases. Using this corpus, a Data-Driven model based on XLM-RoBERTa language model has been trained. It demonstrates the F1-macro accuracy of 0.85 for binary task (class 1 – Ill / Recovered, class 2 – Others), and 0.60 for the five-class task (Ill with high/low confidence, Recovered with high/low confidence, Others). These results outperform the RuDR-BERT model by 5 to 7%. The XLM-RoBERTa model thus obtained has been applied to the binary classification task for the unlabeled data of 486 000 tweets. Based on the model’s predictions, a curve has been plotted of the COVID cases in Moscow from January 1, 2020 to March 1, 2021. This plot has been compared to the official statistic on the confirmed cases, and correlation analysis of these two curves, shifted from one another by 1 to 5 days, has been performed. This analysis shows the highest correlation to be when the true curve is shifted 4 days to the future relative to the predicted one. Therefore, the data from the corpus collected are 4 days ahead of the official statistic. Thus, the numbers of tweets collected have proven to be a helpful input feature to use within pandemic forecasting models.
 +
 +===== Performance of convolutional neural networks processing simulated IACT images in the TAIGA experiment =====
 +
 +**Stanislav Polyakov**, SINP MSU, Russia \\ 
 +Alexander Kryukov,  SINP MSU, Russia \\ 
 +Evgeny Postnikov SINP MSU, Russia
 +
 +//Short presentation (15 min)//
 +
 +Extensive air showers created by high-energy particles interacting
 +with the Earth atmosphere can be detected using imaging atmospheric
 +Cherenkov telescopes (IACTs). The IACT images can be analyzed to
 +distinguish between the events caused by gamma rays and by hadrons and
 +to infer the parameters of the event such as the energy of the primary
 +particle. We use convolutional neural networks (CNNs) to analyze Monte
 +Carlo-simulated images of the telescopes of the TAIGA experiment. The
 +analysis includes selection of the images corresponding to the showers
 +caused by gamma rays and estimates of the energy of the gamma rays. We
 +compare performance of the CNNs using images from a single telescope
 +and the CNNs using images from two telescopes as inputs. \\ 
 +//Keywords: deep learning; convolutional neural networks; gamma astronomy;
 +extensive air shower; IACT; stereoscopic mode; TAIGA//
 ===== EVALUATION OF MACHINE LEARNING METHODS FOR RELATION EXTRACTION BETWEEN DRUG ADVERSE EFFECTS AND MEDICATIONS IN RUSSIAN TEXTS OF INTERNET USER REVIEWS ===== ===== EVALUATION OF MACHINE LEARNING METHODS FOR RELATION EXTRACTION BETWEEN DRUG ADVERSE EFFECTS AND MEDICATIONS IN RUSSIAN TEXTS OF INTERNET USER REVIEWS =====
  
dlcp21/abstracts.txt · Last modified: 22/06/2021 19:54 by admin