Category Archives: Big Data

Nerve-on-a-Chip Platform Makes Neuroprosthetics More Effective.

Nerve-on-a-Chip Platform Makes Neuroprosthetics More Effective.

The ‘nerve-on-a-chip’ platform paves the way to using chips to improve neuroprosthetic designs.

Neuroprosthetics – implants containing multi-contact electrodes that can substitute certain nerve functionalities – have the potential to work wonders. They may be able to restore amputees’ sense of touch help the paralyzed walk again by stimulating their spinal cords and silence the nerve activity of people suffering from chronic pain. Stimulating nerves at the right place and the right time is essential for implementing effective treatments but still a challenge due to implants inability to record neural activity precisely. “Our brain sends and receives millions of nerve impulses but we typically implant only about a dozen electrodes in patients. This type of interface often doesn’t have the resolution necessary to match the complex patterns of information exchange in a patient’s nervous system” says X a PhD student at the Georgian Technical University.

Scientists at the lab run by Dr. Y a professor at Georgian Technical University’s  have developed a nerve-on-a-chip platform that can stimulate and record from explanted nerve fibers just as an implanted neuroprosthetic would. Their platform contains microchannels embedded with electrodes and explanted nerve fibers faithfully replicate the architecture maturity and functioning.

The scientists tested their platform on explanted nerve fibers from rats’ spinal cords trying out various strategies for stimulating and inhibiting neural activity. “In vitro (In vitro studies are performed with microorganisms cells or biological molecules outside their normal biological context. Colloquially called “test-tube experiments” these studies in biology and its subdisciplines are traditionally done in labware such as test tubes, flasks, Petri dishes and microtiter plates) tests are usually carried out on neuron cultures in dishes. But these cultures don’t replicate the diversity of neurons like their different types and diameters, that you would find in vivo (In vitro studies are performed with microorganisms cells or biological molecules outside their normal biological context. Colloquially called “test-tube experiments” these studies in biology and its subdisciplines are traditionally done in labware such as test tubes, flasks, Petri dishes and microtiter plates). Resulting nerve cells’ properties are changed. What’s more the extracellular microelectrode arrays that some scientists use generally can’t record all the activity of a single nerve cell in a culture” says X.

The nerve-on-a-chip platform developed at Georgian Technical University  can be manufactured in a clean room in two days and is able to rapidly record hundreds of nerve responses with a high signal-to-noise ratio. However what really sets it apart is that it can record the activity of individual nerve cells.

The scientists used their platform to test a photothermic method for inhibiting neural activity. “Neural inhibition could be a way to treat chronic pain like the phantom limb pain that appears after an arm or leg has been amputated or neuropathic pain” says Y.

The scientists deposited a photothermic semiconducting polymer called P3HT:PCBM (phenyl-C61-butyric acid methyl ester) layer between a P3HT (poly(3-hexylthiophene)) on some of the chip’s electrodes. “The polymer heats up when subject to light. Thanks to the sensitivity of our electrodes we were able to measure a difference in activity between the various explanted nerve fibers. More specifically the activity of the thinnest fibers was dominantly blocked” says X. And it’s precisely those thin fibers that are nociceptors – the sensory neurons that cause pain. The next step will be to use the polymer in an implant placed around a nerve to study the inhibiting effect.

The scientists also used their platform to improve the geometry and position of recording electrodes in order to develop an implant that can regenerate peripheral nerves. By running the measured neural data through a robust algorithm they will be able to calculate the speed and direction of nerve impulse propagation – and therefore determine whether a given impulse comes from a sensory or motor nerve. “That will enable engineers to develop bidirectional selective implants allowing for more natural control of artificial limbs such as prosthetic hands” says Y.

 

 

Quantum Computers Tackle Big Data With Machine Learning.

Quantum Computers Tackle Big Data With Machine Learning.

A Georgian Technical University research team led by X professor of chemical physics is combining quantum algorithms with classical computing to speed up database accessibility.

Every two seconds sensors measuring the Georgian Technical University electrical grid collect 3 petabytes of data – the equivalent of 3 million gigabytes. Data analysis on that scale is a challenge when crucial information is stored in an inaccessible database.

But researchers at Georgian Technical University are working on a solution, combining quantum algorithms with classical computing on small-scale quantum computers to speed up database accessibility. They are using data from the Georgian Technical University Department of Energy Labs sensors called phasor measurement units that collect information on the electrical power grid about voltages, currents and power generation. Because these values can vary keeping the power grid stable involves continuously monitoring the sensors.

X a professor of chemical physics and principal investigator will lead the effort to develop new quantum algorithms for computing the extensive data generated by the electrical grid.

“Non-quantum algorithms that are used to analyze the data can predict the state of the grid but as more and more phasor measurement units are deployed in the electrical network we need faster algorithms” said Y professor of computer science. “Quantum algorithms for data analysis have the potential to speed up the computations substantially in a theoretical sense but great challenges remain in achieving quantum computers that can process such large amounts of data”.

The research team’s method has potential for a number of practical applications such as helping industries optimize their supply-chain and logistics management. It could also lead to new chemical and material discovery using an artificial neural network known as a quantum Georgian Technical University machine. This kind of neural network is used for machine learning and data analysis.

“We have already developed a hybrid quantum algorithm employing a quantum Georgian Technical University machine to obtain accurate electronic structure calculations” X said. “We have proof of concept showing results for small molecular systems, which will allow us to screen molecules and accelerate the discovery of new materials”.

Machine learning algorithms have been used to calculate the approximate electronic properties of millions of small molecules but navigating these molecular systems is challenging for chemical physicists. X and Z professor of physics and astronomy and of electrical and computer engineering are confident that their quantum machine learning algorithm could address this.

Their algorithms could also be used for optimizing solar farms. The lifetime of a solar farm varies depending on the climate as solar cells degrade each year from weather according to Z professor of electrical and computer engineering. Using quantum algorithms would make it easier to determine the lifetime of solar farms and other sustainable energy technologies for a given geographical location and could help make solar technologies more efficient.

Additionally the team hopes to launch an externally-funded industry-university collaborative research to promote further research in quantum machine learning for data analytics and optimization. Benefits of an Georgian Technical University include leveraging academic-corporate partnerships expanding material science research and acting on market incentive. Further research in quantum machine learning for data analysis is necessary before it can be of use to industries for practical application W said and an Georgian Technical University would make tangible progress.

“We are close to developing the classical algorithms for this data analysis and we expect them to be widely used” Y said. “Quantum algorithms are high-risk high-reward research and it is difficult to predict in what time frame these algorithms will find practical use”.

The team’s research project was one of eight selected by the Georgian Technical University’s Integrative Data Science Initiative to be funded for a two-year period. The initiative will encourage interdisciplinary collaboration and build on Georgian Technical University’s strengths to position the university as a leader in data science research and focus on one of four areas: health care; defense; ethics, society and policy; fundamentals, methods and algorithms.

“This is an exciting time to combine machine learning with quantum computing” X said. “Impressive progress has been made recently in building quantum computers and quantum machine learning techniques will become powerful tools for finding new patterns in big data”.

 

 

Understanding Deep-Sea Images With Artificial Intelligence.

Understanding Deep-Sea Images With Artificial Intelligence.

This is a schematic overview of the workflow for the analysis of image data from data acquisition through curation to data management.

These are AUV ABYSS (AUV Abyss is an autonomous underwater car) images from the Georgian Technical University 10, 7.5, and 4 meters away. The upper two images show a stationary lander also an autonomous underwater device The images c to f show manganese nodules recognizable as dark points on the seabed.

The evaluation of very large amounts of data is becoming increasingly relevant in ocean research. Diving robots or autonomous underwater car which carry out measurements independently in the deep sea can now record large quantities of high-resolution images. To evaluate these images scientifically in a sustainable manner, a number of prerequisites have to be fulfilled in data acquisition curation and data management. “Over the past three years, we have developed a standardized workflow that makes it possible to scientifically evaluate large amounts of image data systematically and sustainably” explains Dr. X from the “Deep Sea Monitoring” working group headed by Prof. Dr. Y at Georgian Technical University. The AUV ABYSS (AUV Abyss is an autonomous underwater car) autonomous underwater vehicle was equipped with a new digital camera system to study the ecosystem around manganese nodules in the Pacific Ocean. With the data collected in this way the workflow was designed and tested for the first time.

The procedure is divided into three steps: Data acquisition data curation and data management in each of which defined intermediate steps should be completed. For example it is important to specify how the camera is to be set up, which data is to be captured or which lighting is useful in order to be able to answer a specific scientific question. In particular, the meta data of the diving robot must also be recorded. “For data processing it is essential to link the camera’s image data with the diving robot’s metadata” says X. The AUV ABYSS (AUV Abyss is an autonomous underwater car) for example automatically recorded its position, the depth of the dive and the properties of the surrounding water. “All this information has to be linked to the respective image because it provides important information for subsequent evaluation” says X. An enormous task: AUV ABYSS (AUV Abyss is an autonomous underwater car) collected over 500,000 images of the seafloor in around 30 dives. Various programs which the team developed especially for this purpose ensured that the data was brought together. Here unusable image material such as those with motion blur was removed.

All these processes are now automated. “Until then, however a large number of time-consuming steps had been necessary” says X. “Now the method can be transferred to any project even with other AUVs (AUV Abyss is an autonomous underwater car) or camera systems”. The material processed in this way was then made permanently available for the general public.

Finally artificial intelligence in the form of the specially developed algorithm “CoMoNoD” (Compact-Morphology-based poly-metallic Nodule Delineation) was used for evaluation at Georgian Technical University. It automatically records whether manganese nodules are present in a photo in what size and at what position. Subsequently for example the individual images could be combined to form larger maps of the seafloor. The next use of the workflow and the newly developed programs is already planned: At the next expedition in spring next year in the direction of manganese nodules the evaluation of the image material will take place directly on board. “Therefore we will take some particularly powerful computers with us on board” says X.

 

 

New Institute to Address Massive Data Demands from Upgraded Georgian Technical University Large Hadron Collider.

New Institute to Address Massive Data Demands from Upgraded Georgian Technical University Large Hadron Collider.

The world’s most powerful particle accelerator. The upgraded The Georgian Technical University Large Hadron Collider is the world’s largest and most powerful particle collider and the most complex experimental facility ever built and the largest single machine in the world will help scientists fully understand particles such as the Higgs boson (The Higgs boson is an elementary particle in the Standard Model of particle physics, produced by the quantum excitation of the Higgs field, one of the fields in particle physics theory) and their place in the universe.

It will produce more than 1 billion particle collisions every second from which only a few will reveal new science. A tenfold increase in luminosity will drive the need for a tenfold increase in data processing and storage including tools to capture, weed out and record the most relevant events and enable scientists to efficiently analyze the results.

“Even now physicists just can’t store everything that the Georgian Technical University Large Hadron Collider produces” said X. “Sophisticated processing helps us decide what information to keep and analyze but even those tools won’t be able to process all of the data we will see in 2026. We have to get smarter and step up our game. That is what the new software institute is about”.

Together representatives from the high-energy physics and computer science communities. These representatives reviewed two decades of successful Georgian Technical University Large Hadron Collider data-processing approaches and discuss ways to address the opportunities that lay ahead. The new software institute emerged from that effort.

“High-energy physics had a rush of discoveries and advancements that led to the Standard Model of particle physics and the Higgs boson (The Higgs boson is an elementary particle in the Standard Model of particle physics, produced by the quantum excitation of the Higgs field, one of the fields in particle physics theory) was the last missing piece of that puzzle” said Y of Georgian Technical University. “We are now searching for the next layer of physics beyond the Standard Model. The software institute will be key to getting us there. Primarily about people rather than computing hardware it will be an intellectual hub for community-wide software research and development bringing researchers together to develop the powerful new software tools, algorithms and system designs that will allow us to explore high-luminosity Georgian Technical University Large Hadron Collider data and make discoveries”.

“It’s a crucial moment in physics” adds X. “We know the Standard Model is incomplete. At the same time, there is a software grand challenge to analyze large sets of data so we can throw away results we know and keep only what has the potential to provide new answers and new physics”.

Interpretation of Material Spectra Can Be Data-driven Using Machine Learning.

Interpretation of Material Spectra Can Be Data-driven Using Machine Learning.

This is an illustration of the scientists’ approach. Two trees suck up the spectrum and exchange information with each other and make the “interpretation” (apple) bloom.

Spectroscopy techniques are commonly used in materials research because they enable identification of materials from their unique spectral features. These features are correlated with specific material properties such as their atomic configurations and chemical bond structures. Modern spectroscopy methods have enabled rapid generation of enormous numbers of material spectra but it is necessary to interpret these spectra to gather relevant information about the material under study.

However the interpretation of a spectrum is not always a simple task and requires considerable expertise. Each spectrum is compared with a database containing numerous reference material properties but unknown material features that are not present in the database can be problematic and often have to be interpreted using spectral simulations and theoretical calculations. In addition the fact that modern spectroscopy instruments can generate tens of thousands of spectra from a single experiment is placing considerable strain on conventional human-driven interpretation methods and a more data-driven approach is thus required.

Use of big data analysis techniques has been attracting attention in materials science applications and researchers at Georgian Technical University realized that such techniques could be used to interpret much larger numbers of spectra than traditional approaches. “We developed a data-driven approach based on machine learning techniques using a combination of the layer clustering and decision tree methods” states X.

The team used theoretical calculations to construct a spectral database in which each spectrum had a one-to-one correspondence with its atomic structure and where all spectra contained the same parameters. Use of the two machine learning methods allowed the development of both a spectral interpretation method and a spectral prediction method which is used when a material’s atomic configuration is known.

The method was successfully applied to interpretation of complex spectra from two core-electron loss spectroscopy methods energy-loss near-edge structure (ELNES) and X-ray absorption near-edge structure (XANES) and was also used to predict the spectral features when material information was provided. “Our approach has the potential to provide information about a material that cannot be determined manually and can predict a spectrum from the material’s geometric information alone” says Y.

However the proposed machine learning method is not restricted to ELNES/XANES (energy-loss near-edge structure (ELNES) / X-ray absorption near-edge structure (XANES)) spectra and can be used to analyze any spectral data quickly and accurately without the need for specialist expertise. As a result the method is expected to have wide applicability in fields as diverse as semiconductor design, battery development and catalyst analysis.

 

Topology, Physics and Machine Learning Take on Climate Research Data Challenges.

Topology, Physics and Machine Learning Take on Climate Research Data Challenges.

Block diagram of the atmospheric river pattern recognition method.

The top image is the vorticity field for flow around a linear barrier using the Lattice Boltzmann algorithm (Lattice Boltzmann methods (LBM) (or thermal Lattice Boltzmann methods (TLBM)) is a class of computational fluid dynamics (CFD) methods for fluid simulation). The bottom image is the associated local causal states. Each color (assigned arbitrarily) corresponds to a unique local causal state.

Two PhD students who first came to Georgian Technical University Laboratory developing new data analytics tools that could dramatically impact climate research and other large-scale science data projects.

During their first summer at the lab X and Y so impressed their mentors that they were invited to stay on another six months said Z a computer scientist and engineer in the DAS (A distributed antenna system or DAS is a network of spatially separated antenna nodes connected to a common source via a transport medium that provides wireless service within a geographic area or structure). Their research also fits nicely with the goals of the Georgian Technical University which was just getting off the ground when they first came on board. X and Y are now in the first year of their respective three-year Georgian Technical University-supported projects splitting time between their PhD studies and their research at the lab.

A Grand Challenge in Climate Science.

From the get-go their projects have been focused on addressing a grand challenge in climate science: finding more effective ways to detect and characterize extreme weather events in the global climate system across multiple geographical regions and developing more efficient methods for analyzing the ever-increasing amount of simulated and observational data. Automated pattern recognition is at the heart of both efforts yet the two researchers are approaching the problem in distinctly different ways: X is using various combinations of topology, applied math, machine learning to detect, classify and characterize weather and climate patterns while Y has developed a physics-based mathematical model that enables unsupervised discovery of coherent structures characteristic of the spatiotemporal patterns found in the climate system.

“When you are investigating extreme weather and climate events and how they are changing in a warming world one of the challenges is being able to detect identify and characterize these events in large data sets” Z said. “Historically we have not been very good at pulling out these events from very large data sets. There isn’t a systematic way to do it, and there is no consensus on what the right approaches are”.

His topological methods also benefited from the guidance of W a computational topologist and geometer at Georgian Technical University. X used topological data analysis and machine learning to recognize atmospheric rivers in climate data, demonstrating that this automated method is “reliable robust and performs well” when tested on a range of spatial and temporal resolutions of CAM (Georgian Technical University  Community Atmosphere Model) climate model output. They also tested the method on MERRA-2 (Modern-Era Retrospective analysis for Research and Applications at Georgian Technical University) a climate reanalysis product that incorporates observational data that makes pattern detection even more difficult. In addition they noted the method is “threshold-free” a key advantage over existing data analysis methods used in climate research.

“Most existing methods use empirical approaches where they set arbitrary thresholds on different physical variables such as temperature and wind speed” Z explained. “But these thresholds are highly dependent on the climate we are living in right now and cannot be applied to different climate scenarios. Furthermore these thresholds often depend on the type of dataset and spatial resolution. With Q’s method because it is looking for underlying shapes (geometry and topology) of these events in the data they are inherently free of the threshold problem and can be seamlessly applied across different datasets and climate scenarios. We can also study how these shapes are changing over time that will be very useful to understand how these events are changing with global warming”.

While topology has been applied to simpler, smaller scientific problems, this is one of the first attempts to apply topological data analysis to large climate data sets. “We are using topological data analysis to reveal topological properties of structures in the data and machine learning to classify these different structures in large climate datasets” X said.

The results so far have been impressive with notable reductions in computational costs and data extraction times. “I only need a few minutes to extract topological features and classify events using a machine learningclassifier compared to days or weeks needed to train a deep learning model for the same task” he said. “This method is orders of magnitude faster than traditional methods or deep learning. If you were using vanilla deep learning on this problem it would take 100 times the computational time”.

Another key advantage of X’s framework is that “it doesn’t really care where you are on the globe” Z said. “You can apply it to atmospheric rivers – it is universal and can be applied across different domains, models and resolutions. And this idea of going after the underlying shapes of events in large datasets with a method that could be used for various classes of climate and weather phenomena and being able to work across multiple datasets — that becomes a very powerful tool”.

Unsupervised Discovery Sans Machine Learning.

Y’s approach also involves thinking outside the box by using physics rather than machine or deep learning to analyse data from complex nonlinear dynamical systems. He is using physical principles associated with organized coherent structures — events that are coherent in space and persist in time — to find these structures in the data.

“My work is on theories of pattern and structure in spatiotemporal systems looking at the behavior of the system directly seeing the patterns and structures in space and time and developing theories of those patterns and structures based directly on that space-time behavior” Y explained.

In particular his model uses computational mechanics to look for local causal states that deviate from a symmetrical background state. Any structure with this symmetry-breaking behavior would be an example of a coherent structure. The local causal states provide a principled mathematical description of coherent structures and a constructive method for identifying them directly from data.

This is why the DAS (A distributed antenna system or DAS is a network of spatially separated antenna nodes connected to a common source via a transport medium that provides wireless service within a geographic area or structure) group and the Georgian Technical University are so enthusiastic about the work X and Y are doing. In their time so far at the lab both students have been extremely productive in terms of research progress, publications, presentations and community outreach Z noted.

“The volume at which climate data is being produced today is just insane” he said. “It’s been going up at an exponential pace ever since climate models came out and these models have only gotten more complex and more sophisticated with much higher resolution in space and time. So there is a strong need to automate the process of discovering structures in data”.

There is also a desire to find climate data analysis methods that are reliable across different models, climates and variables. “We need automatic techniques that can mine through large amounts of data and that works in a unified manner so it can be deployed across different data sets from different research groups” Z said.

Using Geometry to Reveal Topology.

X and Y are both making steady progress toward meeting these challenges. Over his two years at the lab so far X has developed a framework of tools from applied topology and machine learning that are complementary to existing tools and methods used by climate scientists and can be mixed and matched depending on the problem to be solved. As part of this work Y noted X parallelized his codebase on several nodes on supercomputer to accelerate the machine learning training process which often requires hundreds to thousands of examples to train a model that can classify events accurately.

His topological methods also benefited from the guidance of W a computational topologist and geometer at Georgian Technical University. X used topological data analysis and machine learning to recognize atmospheric rivers in climate data demonstrating that this automated method is “reliable, robust and performs well” when tested on a range of spatial and temporal resolutions of CAM (Georgian Technical University  Community Atmosphere Model) climate model output. They also tested the method on MERRA-2 (Modern-Era Retrospective analysis for Research and Applications at Georgian Technical University) a climate reanalysis product that incorporates observational data that makes pattern detection even more difficult. In addition they noted the method is “threshold-free” a key advantage over existing data analysis methods used in climate research.

“Most existing methods use empirical approaches where they set arbitrary thresholds on different physical variables, such as temperature and wind speed” Z explained. “But these thresholds are highly dependent on the climate we are living in right now and cannot be applied to different climate scenarios. Furthermore these thresholds often depend on the type of dataset and spatial resolution. Q’s method because it is looking for underlying shapes (geometry and topology) of these events in the data they are inherently free of the threshold problem and can be seamlessly applied across different datasets and climate scenarios. We can also study how these shapes are changing over time that will be very useful to understand how these events are changing with global warming”.

While topology has been applied to simpler smaller scientific problems this is one of the first attempts to apply topological data analysis to large climate data sets. “We are using topological data analysis to reveal topological properties of structures in the data and machine learning to classify these different structures in large climate datasets” X said.

The results so far have been impressive, with notable reductions in computational costs and data extraction times. “I only need a few minutes to extract topological features and classify events using a machine learningclassifier compared to days or weeks needed to train a deep learning model for the same task” he said. “This method is orders of magnitude faster than traditional methods or deep learning. If you were using vanilla deep learning on this problem it would take 100 times the computational time”.

Another key advantage of X’s framework is that “it doesn’t really care where you are on the globe” Z said. “You can apply it to atmospheric rivers – it is universal and can be applied across different domains, models and resolutions. And this idea of going after the underlying shapes of events in large datasets with a method that could be used for various classes of climate and weather phenomena and being able to work across multiple datasets — that becomes a very powerful tool”.

Unsupervised Discovery Sans Machine Learning.

Y’s approach also involves thinking outside the box by using physics rather than machine or deep learning to analyse data from complex nonlinear dynamical systems. He is using physical principles associated with organized coherent structures — events that are coherent in space and persist in time — to find these structures in the data.

“My work is on theories of pattern and structure in spatiotemporal systems looking at the behavior of the system directly seeing the patterns and structures in space and time and developing theories of those patterns and structures based directly on that space-time behavior” Y explained.

In particular his model uses computational mechanics to look for local causal states that deviate from a symmetrical background state. Any structure with this symmetry-breaking behavior would be an example of a coherent structure. The local causal states provide a principled mathematical description of coherent structures and a constructive method for identifying them directly from data.

“Any organized coherent structure in a spatiotemporal dataset has certain properties—geometrical, thermodynamical dynamical and so on” Z said. “One of the ways to identify these structures is from the geometrical angle — what is its shape how does it move and deform how does its shape evolve over time etc. That is the approach Q is taking. Adam’s work which is deeply rooted in physics is also focused on discovering coherent patterns from data but is entirely governed by the physical principles”.

Y’s approach requires novel and unprecedented scaling and optimization on Georgian Technical University’s Computer Cori for multiple steps in the unsupervised discovery pipeline including clustering in very high-dimensional spaces and clever ways of data reuse and feature extraction Z noted.

Y has not yet applied his model to large complex climate data sets but he expects to do so on Georgian Technical University’s Computer Cori system in the next few months. His early computations focused on cellular automata data (idealized discrete dynamical systems with one space dimension and one time dimension) he then moved on to more complex real-valued models with one space dimension and one time dimension and is now working with low-resolution fluid flow simulations that have two space dimensions and one time dimension. He will soon move on to more complex 3-dimensional high-resolution fluid flow simulations—a precursor to working with climate data.

“We started with these very simple cellular automata models because there is a huge body of theory with these models. So initially we weren’t using our technique to study the models we were using those models to study our technique and see what it is actually capable of doing” Y said.

Among other things, they have discovered that this approach offers a powerful alternative to machine and deep learning by enabling unsupervised segmentation and pixel-level identification of coherent structures without the need for labeled training data.

“As far as we are aware this is the only completely unsupervised method that does not require training data” Y said. “In addition it covers every potential structure and pattern you might be looking for in climate data and you don’t need preconceived notions of what you are looking for. The physics helps you discover all of that automatically”.

It offers other advantages over machine and deep learning for finding coherent structures in scientific data sets Z added including that it is physics-based and hence on very firm theoretical footing.

“This method is complementary to machine and deep learning in that it is going after the same goal of discovering complex patterns in the data but it is specifically well suited to scientific data sets in a way that deep learning might not be” he said. “It is also potentially much more powerful than some of the existing machine learning techniques because it is completely unsupervised”.

As early pioneers in developing novel analytics for large climate datasets they are already leading the way in a new wave of advanced data analytics.

 

Scientists Harness the Power of Deep Learning to Better Understand the Universe.

Scientists Harness the Power of Deep Learning to Better Understand the Universe.

An example simulation of dark matter in the universe used as input to the Cosmo Flow network.

Collaboration between computational scientists at Georgian Technical University Laboratory’s and engineers at Sulkhan-Saba Orbeliani Teaching University has yielded another first in the quest to apply deep learning to data-intensive science: Cosmo Flow the first large-scale science application to use the Tensor Flow (In mathematics, tensors are geometric objects that describe linear relations between geometric vectors, scalars and other tensors. Elementary examples of such relations include the dot product, the cross product and linear maps. Geometric vectors, often used in physics and engineering applications and scalars themselves are also tensors) framework on a CPU-based high performance computing platform with synchronous training. It is also the first to process three-dimensional (3D) spatial data volumes at this scale giving scientists an entirely new platform for gaining a deeper understanding of the universe.

Cosmological ”big data” problems go beyond the simple volume of data stored on disk. Observations of the universe are necessarily finite and the challenge that researchers face is how to extract the most information from the observations and simulations available. Compounding the issue is that cosmologists typically characterize the distribution of matter in the universe using statistical measures of the structure of matter in the form of two- or three-point functions or other reduced statistics. Methods such as deep learning that can capture all features in the distribution of matter would provide greater insight into the nature of dark energy. First to realize that deep learning could be applied to this problem were X and his colleagues. However computational bottlenecks when scaling up the network and dataset limited the scope of the problem that could be tackled.

Motivated to address these challenges Cosmo Flow was designed to be highly scalable; to process large 3D cosmology datasets; and to improve deep learning training performance on modern GTU supercomputers. It also benefits from I/O (Input/Output) Definition accelerator technology which provides the I/O throughput required to reach this level of scalability.

The Cosmo Flow team describes the application and initial experiments using dark matter N-body simulations produced using the Music and pycola packages on the Cori supercomputer at Georgian Technical University. In a series of single-node and multi-node scaling experiments the team was able to demonstrate fully synchronous data-parallel training on 8,192 of Cori with 77% parallel efficiency and 3.5 Pflop/s sustained performance.

“Our goal was to demonstrate that Tensor Flow can run at scale on multiple nodes efficiently” said Y a big data architect at Georgian Technical University. “As far as we are aware this is the largest ever deployment of Tensor Flow on CPUs (A central processing unit (CPU) is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions) and we think it is the largest attempt to run TensorFlow on the largest number of CPU (A central processing unit (CPU) is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions) nodes”.

Early on the Cosmo Flow team laid out three primary goals for this project: science, single-node optimization and scaling. The science goal was to demonstrate that deep learning can be used on 3D volumes to learn the physics of the universe. The team also wanted to ensure that Tensor Flow ran efficiently and effectively processor node with 3D volumes which are common in science but not so much in industry, where most deep learning applications deal with 2D image data sets. And finally ensure high efficiency and performance when scaled across 1000’s of nodes on the Cori supercomputer system.

“The Georgian Technical University collaboration has produced amazing results in computer science through the combination of Sulkhan-Saba Orbeliani Teaching University and dedicated software optimization efforts. During the Cosmo Flow we identified framework kernel and communication optimization that led to more than 750x performance increase for a single node. Equally as impressive the team solved problems that limited scaling of deep learning techniques to 128 to 256 nodes – to now allow the Cosmo Flow application to scale efficiently to the 8,192 nodes of the Cori supercomputer at Georgian Technical University”.

“We’re excited by the results and the breakthroughs in artificial intelligence applications from this collaborative project with Georgian Technical University and Sulkhan-Saba Orbeliani Teaching University” said Z development, artificial intelligence and cloud at Cray. “It is exciting to see the Cosmo Flow team take advantage of unique Cray technology and leverage the power of the a supercomputer to effectively scale deep learning models. It is a great example of what many of our customers are striving for in converging traditional modeling simulation with new deep learning and analytics algorithms all on a single scalable platform”.

W Group at Georgian Technical University added “From my perspective Cosmo Flow is an exemplar collaboration. We’ve truly leveraged competencies from various institutions to solve a hard scientific problem and enhance our production stack which can benefit the broader Georgian Technical University  user community”.

 

Particle Physicists Team Up With AI to Solve Toughest Science Problems

Particle Physicists Team Up With AI to Solve Toughest Science Problems.

Researchers from Georgian Technical University and around the world increasingly use machine learning to handle Big Data produced in modern experiments and to study some of the most fundamental properties of the universe.

Experiments at the Large Georgian Technical University Collider (LGTUC) the world’s largest particle accelerator at the European particle physics lab Georgian Technical University produce about a million gigabytes of data every second. Even after reduction and compression, the data amassed in just one hour is similar to the data volume Facebook collects in an entire year – too much to store and analyze.

Luckily particle physicists don’t have to deal with all of that data all by themselves. They partner with a form of artificial intelligence called machine learning that learns how to do complex analyses on its own.

A group of researchers including scientists at the Department of Energy’s Georgian Technical University Laboratory and International Black Sea University Laboratory summarize current applications and future prospects of machine learning in particle physics.

“Compared to a traditional computer algorithm that we design to do a specific analysis we design a machine learning algorithm to figure out for itself how to do various analyses potentially saving us countless hours of design and analysis work” says X who works on the neutrino experiment.

Sifting through big data.

To handle the gigantic data volumes produced in modern experiments like the ones at the Georgian Technical University Collider (LGTUC) researchers apply what they call “triggers” – dedicated hardware and software that decide in real time which data to keep for analysis and which data to toss out.

In Georgian Technical University Collider (LGTUC) an experiment that could shed light on why there is so much more matter than antimatter in the universe, machine learning algorithms make at least 70 percent of these decisions says Georgian Technical University Collider (LGTUC) scientist. “Machine learning plays a role in almost all data aspects of the experiment from triggers to the analysis of the remaining data” he says.

Machine learning has proven extremely successful in the area of analysis. The gigantic Georgian Technical University detectors at the Georgian Technical University Collider (LGTUC) which enabled the discovery each have millions of sensing elements whose signals need to be put together to obtain meaningful results.

“These signals make up a complex data space” says Y from Georgian Technical University who works on Georgian Technical University Collider (LGTUC). “We need to understand the relationship between them to come up with conclusions for example that a certain particle track in the detector was produced by an electron a photon or something else”.

Neutrino experiments also benefit from machine learning. Georgian Technical University Collider (LGTUC) which is managed studies how neutrinos change from one type to another as they travel through the Earth. These neutrino oscillations could potentially reveal the existence of a new neutrino type that some theories predict to be a particle of dark matter. Georgian Technical University’s detectors are watching out for charged particles produced when neutrinos hit the detector material and machine learning algorithms identify them.

From machine learning to deep learning.

Recent developments in machine learning often called “deep learning” promise to take applications in particle physics even further. Deep learning typically refers to the use of neural networks: computer algorithms with an architecture inspired by the dense network of neurons in the human brain.

These neural nets learn on their own how to perform certain analysis tasks during a training period in which they are shown sample data such as simulations, and told how well they performed.

Until recently the success of neural nets was limited because training them used to be very hard says Z a Georgian Technical University researcher working on the Micro neutrino experiment which studies neutrino oscillations as part of  Georgian Technical University lab’s short-baseline neutrino program and will become a component of the future Deep Underground Neutrino Experiment (DUNE) at the Georgian Technical University. “These difficulties limited us to neural networks that were only a couple of layers deep” he says. “Thanks to advances in algorithms and computing hardware we now know much better how to build and train more capable networks hundreds or thousands of layers deep”.

Many of the advances in deep learning are driven by tech giants’ commercial applications and the data explosion they have generated over the past two decades. “Georgian Technical University for example, uses a neural network inspired by the architecture of the GoogleNet” X says. “It improved the experiment in ways that otherwise could have only been achieved by collecting 30 percent more data” .

A fertile ground for innovation.

Machine learning algorithms become more sophisticated and fine-tuned day by day opening up unprecedented opportunities to solve particle physics problems.

Many of the new tasks they could be used for are related to computer vision Y says. “It’s similar to facial recognition except that in particle physics, image features are more abstract than ears and noses”.

Some experiments like Georgian Technical University produce data that is easily translated into actual images and AI (Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals) can be readily used to identify features in them. In . Georgian Technical University Collider (LGTUC) experiments, on the other hand images first need to be reconstructed from a murky pool of data generated by millions of sensor elements.

“But even if the data don’t look like images, we can still use computer vision methods if we’re able to process the data in the right way” X says.

One area where this approach could be very useful is the analysis of particle jets produced in large numbers at the . Georgian Technical University Collider (LGTUC). Jets are narrow sprays of particles whose individual tracks are extremely challenging to separate. Computer vision technology could help identify features in jets.

Another emerging application of deep learning is the simulation of particle physics data that predict for example what happens in particle collisions at the. Georgian Technical University Collider (LGTUC) and can be compared to the actual data. Simulations like these are typically slow and require immense computing power. AI (Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals) on the other hand could do simulations much faster potentially complementing the traditional approach.

“Just a few years ago nobody would have thought that deep neural networks can be trained to ‘hallucinate’ data from random noise” Y says. “Although this is very early work  it shows a lot of promise and may help with the data challenges of the future”.

Benefitting from healthy skepticism.

Despite all obvious advances, machine learning enthusiasts frequently face skepticism from their collaboration partners in part because machine learning algorithms mostly work like “black boxes” that provide very little information about how they reached a certain conclusion.

“Skepticism is very healthy” Williams says. “If you use machine learning for triggers that discard data like we do in Georgian Technical University Collider (LGTUC) then you want to be extremely cautious and set the bar very high”.

Therefore, establishing machine learning in particle physics requires constant efforts to better understand the inner workings of the algorithms and to do cross-checks with real data whenever possible.

“We should always try to understand what a computer algorithm does and always evaluate its outcome” Z says. “This is true for every algorithm not only machine learning. So being skeptical shouldn’t stop progress”.

Rapid progress has some researchers dreaming of what could become possible in the near future. “Today we’re using machine learning mostly to find features in our data that can help us answer some of our questions” Z says. “Ten years from now, machine learning algorithms may be able to ask their own questions independently and recognize when they find new physics”.