Usage

Currently, we provide support to about 550 users involved in more than 100 research projects led by the communities of INFN and the University of Padua. The trend indicates that the number of users and projects grows over time.

Resource usage in the last month

As an indication of the usage of CloudVeneto, we highlight brief descriptions of some of the projects.

AI for Efficient Cherenkov Image Analysis

Cherenkov telescope images of cosmic and gamma ray showers are ideal for AI analysis to classify events, estimate gamma ray energy, and determine direction. Deep learning, especially convolutional neural networks, is being explored for its ability to detect rare events, outperforming traditional methods in identifying challenging cases like multiple gamma rays or heavy nuclei.

Dr. Rubèn Lopez Coto

CMS experiment: precision measurements and new physics searches at LHC

The CMS experiment at CERN is searching for new particles like the Higgs boson and measuring particle-antiparticle asymmetries. The Padua CMS group is involved in these studies, analyzing large data sets from the detector. CloudVeneto's computing resources are used for data reconstruction, simulations, statistical analysis, and machine learning models to distinguish signals from background processes.

Dr. Jacopo Pazzini

Simulation of the interaction of laser beams with glass material

The QuantumFuture group at DEI has used CloudVeneto to study and simulate the interaction of high-power laser beams in the 10-micron band with glass material to optimize the process of drilling glass vials for medical purposes.

Dr. Marco Avesani

Bootstrap analysis for code validation

For the validation of BOOGIE2, the software for predicting blood groups from genetic data, a platform for bootstrap analysis was set up to generate 10,000 virtual patients to test the aforementioned code. CloudVeneto was used for this purpose.

Dr. Ivan Mičetić

 

Blast (sequence alignments) of plant genomes

This project, in collaboration with the E. Mach Institute, analyzed sequence similarity in a database of 300,000 plant protein sequences from commercially important plants like apple, strawberry, and coffee. Self-alignment studies were conducted to cluster biologically significant sequences. CloudVeneto's platform was used to parallelize the task, reducing computation time.

Dr. Ivan Mičetić

 

Simulation of chemical compound recognition with target proteins

The project on the CloudVeneto platform simulates the recognition process between chemical compounds and target proteins using a molecular docking approach. Our laboratory (MMS) has archived a public chemical library of around 5 million compounds for drug candidate screening (MMsINC). Each molecular docking simulation generates approximately 5 plausible ligand-protein complexes per target protein, resulting in around 25 million complexes per study.

Prof. Stefano Moro

Nuclear Reaction Dynamic's calculation using AMD code

Antisymmetrized Molecular Dynamics (AMD) is a code used to simulate nuclear reaction dynamics, incorporating particle structures and correlations. The NUCLEX collaboration utilizes the CloudVeneto infrastructure, running AMD in a virtual machine cluster with parallel processing via OpenMPI. This setup reduces computation time from several months to just 5-7 days for 50,000 events.

Dr. Tommaso Marchi, Dr. Magda Cicerchia

Data analysis software for CTA's LST telescopes

The Padua group, involved in building components for the Large-Sized Telescopes of CTA project, is refining the data analysis software. Using CloudVeneto resources, tools for event reconstruction are being developed, simulating how the telescope observes atmospheric showers and defining methods to determine the direction and energy of gamma rays. CloudVeneto will be used to analyze the first scientific data from the telescope inaugurated in La Palma.

Dr. Rubèn Lopez Coto

Innovative data analysis systems for CMS

The CMS experiment at CERN produces tens of PBs of data annually. The Padua CMS group aims to redefine high-energy physics computing by integrating modern Big Data technologies. Additionally, innovative real-time data acquisition techniques using fast data streaming systems based on Apache Kafka are developed. CloudVeneto provides dedicated clusters for these activities, enhancing data analysis and acquisition efficiency.

Dr. Jacopo Pazzini

Post-processing and analysis of data from a quantum random number generator

The QuantumFuture group at DEI has utilized CloudVeneto for a project involving a quantum random number generator with data rates of tens of Gbps. On CloudVeneto, post-processing and analysis of data from the physical generator were performed. We appreciated having total control over the resources and the ability to dynamically manage the number of machines and resources.

Dr. Marco Avesani

Prediction of predisposition to obesity

Using genomic data from the Personal Genome Project, on CloudVeneto, we tested and optimized various classifiers (SVM and random forest with their respective parameters) with the aim of predicting body mass index and consequently predisposition to obesity.

Dr. Ivan Mičetić

Disorder prediction over large databases

The Computational Biology Lab of the Department of Biomedical Sciences maintains a database of structural annotations for disordered regions in protein sequences, originally covering 80 million sequences. An update is underway to recalculate annotations for over 130 million sequences. CloudVeneto's platform is used to parallelize the annotation pipeline, reducing computation time and speeding up the release cycle.

Dr. Ivan Mičetić

Analysis of differential RNAseq data and pathway mapping

We have a cloud-hosted platform in place. The installation was smooth, and the service published on bioinformatics is progressing well, catering to the scientific community interested in differential RNAseq data analysis and pathway mapping.

Prof. Stefano Toppo

Currently, we provide support to about 550 users involved in more than 100 research projects led by the communities of INFN and the University of Padua.

Currently, we provide support to about 550 users involved in more than 100 research projects led by the communities of INFN and the University of Padua.

The trend indicates that the number of users and projects grows over time.

Some information about the current VCPUs usage

Using AI for efficient analysis of Cherenkov Telescope Images

The images of atmospheric showers produced by cosmic rays and gamma rays captured by Cherenkov telescopes are well-suited for analysis by artificial intelligence for the purpose of:

  1. Classifying an event as originating from a gamma ray or a proton (or heavier nucleus)
  2. Determining the most probable energy of the primary gamma ray
  3. Determining the direction of the primary gamma ray with good precision.

Several deep learning methodologies are under study, particularly convolutional neural networks trained on simulated samples, whose performance will be compared with more traditional analysis methods. The advantage of the “deep learning” approach is the ability to conduct searches for rare and specific events such as multiple gamma rays (bosonic condensates) or images of showers produced by “heavy nuclei” in cosmic rays, which are difficult to classify with analytical methods.

Dr. Rubèn Lopez Coto

Development of data analysis software for the Large-Sized Telescopes of the Cherenkov Telescope Array

 

CTA is a new project for particle astrophysics. The Padua group is involved in the construction of the Large Sized Telescopes (LST): in addition to building telescope components, Padua, along with the Barcelona group, is responsible for refining the data analysis software. The first telescope was recently inaugurated in the Canary Islands, on the island of La Palma. Through CloudVeneto resources, we are currently preparing tools for event reconstruction, which involves simulating how the telescope observes atmospheric showers and defining strategies for determining the direction and energy of gamma rays impacting the atmosphere. Soon, when experimental data becomes available, we will also utilize CloudVeneto resources for the analysis of the initial scientific data.

Dr. Rubèn Lopez Coto

CMS Experiment: Precision Measurements and Searches for New Physics in Proton-Proton Collisions at the LHC 2 Collider

 

The CMS (Compact Muon Solenoid) experiment focuses on detecting particles produced in proton collisions generated by the Large Hadron Collider (LHC) at CERN, with the aim of conducting an extensive campaign of measurements. Among these, notable examples include searches for ‘new’ particles, such as the discovery of the Higgs boson, or precision measurements of fundamental properties of nature, as in the case of measuring particle-antiparticle asymmetries.

The CMS group at the Department of Physics and Astronomy of Padua and the INFN section of Padua is actively involved in many of these studies, based on the analysis of vast amounts of data collected by the detector’s sensors.

For the reconstruction and subsequent analysis of this data, the use of computing tools capable of tackling computationally intensive tasks is necessary. These tasks include simulating numerous interactions between particles and the detector’s response, reconstructing events with highly complex topologies, performing statistical analysis for estimating confidence intervals, or developing classifiers based on neural networks for discriminating possible signals from the expected background from known processes in the detector.

To this end, the resources provided by the CloudVeneto infrastructure through dedicated clusters are extensively utilized, including the creation of clusters with ‘elastic’ resource allocation, for the utilization of the experiment’s reconstruction software (CMSSW, based on Python and C++) and development environments for multivariate algorithms, such as TMVA, TensorFlow, and Theano.

Dr. Jacopo Pazzini

Development of innovative systems for data analysis and acquisition using ‘Big Data’ techniques for the CMS experiment

 

The CMS (Compact Muon Solenoid) experiment at CERN studies the outcomes of proton-proton collisions produced by the Large Hadron Collider: having to deal with the analysis of signals from over 70 million detector readout channels every 25 ns, tens of PB of data are produced annually. Analyzing these large datasets to search for rare and highly elusive signals requires intensive use of extensive computing resources, both for the selection and processing of data collected ‘online’ during acquisition, and in the subsequent ‘offline’ data analysis.

 

The CMS group at the Department of Physics and Astronomy of Padua, and the INFN section of Padua collaborate with CERN with the aim of redefining the computing paradigm in high-energy physics analyses through the integration of modern technologies for processing large datasets, commercially known as Big Data. To achieve this goal, software infrastructures (Apache Spark, Apache Mesos, Kubernetes) are employed to optimize the utilization of available computing resources, resulting in a reduction of several orders of magnitude in data processing time. Additionally, the group is involved in developing innovative techniques for real-time (online) acquisition and processing of the enormous amount of signals directly from experiment sensors through the integration of software systems capable of fast data streaming (based on Apache Kafka) towards computing clusters based on Apache Spark.

The resources of the CloudVeneto infrastructure are used in both of these activities through the creation of dedicated Apache Spark/Mesos and Kubernetes clusters.

Dr. Jacopo Pazzini

Simulation of the recognition process between chemical compounds and target proteins

The project we are carrying out using the CloudVeneto platform involves simulating the recognition process between chemical compounds and target proteins through a molecular docking approach. Currently, in our laboratory (MMS), we have virtually archived a chemical library of approximately 5 million compounds commonly used in screening for identifying new drug candidates.

The archive, called MMsINC, is in the public domain. During the molecular docking simulation, typically 5 plausible ligand-protein complexes are produced for each selected target protein, resulting in a maximum total of approximately 25 million complexes per case study. In our laboratory, this screening campaign is managed through the distribution of various processes.

We are also planning to launch a web service, which we have already named MMSDockCloud, to serve as an access portal to the computation service described above.

Prof. Stefano Moro

projects