Qualcomm - Edge AI and Vision Alliance

The History of AI: How Generative AI Grew from Early Research

Brian Dipert — Fri, 29 Sep 2023 19:19:55 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

From how AI started to how it impacts you today, here’s your comprehensive AI primer

When you hear Artificial Intelligence (AI): Do you think of the Terminator or Data on “Star Trek: The Next Generation”? While neither example of artificial general intelligence is available today, AI has given rise to a form of machine intelligence. And it’s trained on huge publicly available data, proprietary data and/or sensor data.

As we embark on our AI on the Edge series, we’d like to explain exactly what we mean by “AI” — especially since this broad category can include everything from machine learning to neural networks and deep learning. Don’t be embarrassed: “What is AI, exactly?” requires a more technical and nuanced answer than you might expect.

Alan Turing, an outstanding mathematician, introduced the concept of “Computing Machinery and Intelligence.”

History of AI: The origins of AI research

Answering “what is AI” is much easier when you know the history of AI.

By the 1950s, the concept of AI had taken its first steps out of science fiction and into the real world as we began to build capable electronic computers. Researcher Alan Turing began to explore the mathematical possibility of building AI. He suggested that machines, like humans, can use information and reasoning to solve problems and make decisions.

These concepts were introduced in his famous 1950 paper titled “Computing Machinery and Intelligence,” in which he discussed the potential for intelligent machines and proposed a test of their intelligence, now called the Turing Test. The test posits that if a machine can carry on a conversation (over a text interface) that is indistinguishable from a conversation with a human being, then it is reasonable to say that the machine is “thinking.” Using this simplified test, it is easier to argue that a “thinking machine” is at least plausible.

The world’s first programmable, electronic, digital computers were limited in terms of performance.

AI’s proof of concept

In the 1950s, computers were still very limited and very expensive to own and operate, limiting their use in further AI research. Yet, researchers were not deterred. Five years later, a proof of concept was initiated with a program called Logic Theorist, likely the first AI program to be written. In 1956, the program was shown at the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI). This historic conference brought together top researchers from various fields for an open-ended discussion on AI, the term which was coined at the event by host John McCarthy, then a mathematics professor at Dartmouth.

From 1957 to 1974, AI research flourished as computers could store more information and became faster, cheaper and more accessible. Machine learning algorithms also improved and people became better at knowing which algorithm to apply to their problem. But mainstream applications were few and far between, and AI research money began to dry up. The optimistic vision of AI researchers like Marvin Minsky in the ‘60’s and ‘70’s looked to be going nowhere.

Deep learning took off due to increased processing capabilities, the abundance of data, and improved AI algorithms.

Modern day leaps in AI

Continued improvements in computing and data storage reinvigorated AI research in the 1980s. New algorithms and new funding fed an AI renaissance. During this period, John Hopfield and David Rumelhart popularized “deep learning” techniques which allowed computers to learn using experience.

This milestone was followed with certain landmark events. In 1997, IBM’s chess playing computer, Deep Blue, defeated reigning world chess champion and grandmaster Gary Kasparov. It was the first time a reigning world chess champion had lost to a computer. In the same year, speech-recognition software, developed by Dragon Systems, became widely available. In 2005, a Stanford robot vehicle won the DARPA Grand Challenge by driving autonomously for 131 miles along an unrehearsed desert trail. And just two years later, a vehicle from Carnegie Mellon University won the DARPA Urban Challenge by autonomously navigating 55 miles in an urban environment while avoiding traffic hazards and following all traffic laws. Finally, in February 2011, in a “Jeopardy!” quiz show exhibition match, IBM’s question answering system, named Watson, defeated the two greatest “Jeopardy!” champions of the day.

Exciting as they were, these public demonstrations weren’t mainstream AI solutions. The DARPA challenges, though, did spur autonomous vehicle research that continues to this day.

What really kicked off the explosion in AI applications was the use of math accelerators like graphics processing units (GPUs), digital signal processors (DSP), field programmable gate arrays (FPGA) and neural processing units (NPUs), which increased processing speeds by orders of magnitude over mere CPUs.

While CPUs can process tens of threads, math accelerators like DSPs, GPUs, and NPUs process hundreds or thousands of threads all in parallel. At the same time, AI researchers also got access to vast amounts of training data through cloud services and public data sets.

In 2018, large language models (LLMs) trained on vast quantities of unlabeled data became the foundation models that can be adapted to a wide range of specific tasks. More recent models, such as GPT-3 released by OpenAI in 2020, and Gato released by DeepMind in 2022, pushed AI capabilities to new levels. These generative AI models have made AI more useful for a much wider range of applications. Where previous uses of AI were mostly about recognition such as detecting bad parts in a product line, classification such as recognizing faces in a video feed, and prediction such as determining the path of an autonomous vehicle, generative AI can be used to create new text, images, or other content based on input prompts.

Digital neurons, inspired by biological neurons, are the building blocks of digital neural networks.

How AI works and AI technology definitions

The fundamental approach of modern AI is inspired by the way that animal brains (including human brains) function using a digital neuron modeled after those of the biological brain. Collections of these digital neurons process an input in different layers with the results of each layer feeding the next layer. This structure is called a neural network. Each neuron has multiple inputs that are each given a specific weight. The weighted inputs are summed together, and the output is fed to an activation function. An activation function, such as the popular rectified linear unit, known as a ReLU, introduces the property of nonlinearity to a deep learning model. The outputs of the activation function are the inputs into the next layer of the neural network. The collective weights and any bias applied to the summation function represent the parameters of the model.

Neural network architectures vary in the number of interconnected neurons per layer and the number of layers, which all impact accuracy at the cost of performance, power and size.

A deep neural network consists of multiple hidden layers between the input and output layers.

The deep in deep learning

The “deep” in “deep learning” refers to the use of many layers in the network. Major increases in computing power, especially as delivered by GPUs, NPUs and other math accelerators, by around a thousand-fold or more make the standard backpropagation algorithm feasible for training networks that are many layers deeper and have reduced training times from many months to days.

The values of digital neuron parameters are determined through a learning process. Humans learn throughout life from experiences and our senses. Because AI itself does not have life experiences or senses, it must learn through a digital imprint that’s called training.

Neural networks “learn” (or are trained) by processing examples.

In supervised learning, the examples contain known “inputs” and “outputs,” which form probability-weighted associations between the two in the digital neuron that are stored within the data structure of the neural network itself (called the “model”).

The training of a neural network from a given example is usually conducted by determining the difference between the processed output of the network (often a prediction) and a desired output. Minimizing the difference between the prediction and the desired output is then used to adjust the network iteratively until it converges to the desired accuracy. This algorithm is called backpropagation.

The more data you feed the neural network, the more examples it accumulates knowledge about. That said, the neural network model itself needs to be relatively large to represent complex sets of information.

Also, a significant number of examples need to be used in training large models to make them more capable and accurate.

The trained neural network model is then used to interpret new inputs and create new outputs. This application of the model processing new data is commonly called inference. Inference is where an AI model is applied to real-world problems.

Training is often performed with 32-bit or 16-bit floating point math, but inference models can often be scaled down to 8-bit or 4-bit integer precision to save memory space, reduce power and improve performance without significantly affecting accuracy of the model. This scaling down is known as quantization, and going from 32-bit to 8-bit reduces the model size by one-fourth.

A variety of neural network architectures have been introduced over time, offering benefits in performance, efficiency and/or capabilities.

Well-studied neural network architectures, like convolutional neural networks (CNNs), recurrent neural networks (RNNs) and long short-term memory (LSTM) have been used to detect, classify, and predict and have been widely deployed for voice recognition, image recognition, autonomous vehicles and many other applications.

A recently-popular class of latent variable models called diffusion models, can be used for a number of tasks including image denoising, inpainting, super-resolution upscaling, and image generation. This technique helped start the popularization of generative AI. For example, an image generation model would start with a random noise image and then, after having been trained to reverse the diffusion process on numerous images, the model would be able to generate new images based on text input prompts. A good example is OpenAI’s text-to-image model DALL-E 2. Other popular examples of text-to-image generative AI models include Stable Diffusion and ControlNet. These models are known as language-vision models or LVMs.

Many of the latest LLMs such as Llama 2, GPT-4 and BERT use the relatively new neural network architecture called Transformer, which was introduced in 2017 by Google. These complex models are leading to the next wave of generative AI where AI is used to create new content. The research into AI is ongoing and you should expect continual changes in architectures, algorithms and techniques.

AI is being seamlessly integrated into our daily activities, like medical diagnostics, to enhance lives and improve outcomes.

Real-time AI for everyone, everywhere

Over the years there have been several leaps forward in the development of modern AI.

It started with the idea that you could train neural networks with a process called deep learning that employed deeper layers of neurons to store more substantial information and represent more complex functions. Training these neural network models required a lot of computation, but advancements in parallel computing and more sophisticated algorithms have addressed this challenge.

Running neural network training and inference through a DSP, FPGA, GPU, or NPU, made the development and deployment of deep neural networks more practical. The other big breakthrough for large-scale AI was access to large amounts of data through all the cloud services and public data sets.

A complex and nuanced AI model requires lots of generalized data, which can be in the form of text, speech, images and videos. All these data types are fodder for neural network training. Using these vast troves of rich content to train neural networks has made the models smarter and more capable.

Compressing that knowledge into more compact models is allowing them to be shared beyond the cloud and placed into edge devices. The democratization of AI is happening now.

Pat Lawlor
Director, Technical Marketing, Qualcomm Technologies, Inc.

Jerry Chang
Senior Manager, Marketing, Qualcomm Technologies

The post The History of AI: How Generative AI Grew from Early Research appeared first on Edge AI and Vision Alliance.

Qualcomm Launches Its Next Generation XR and AR Platforms, Enabling Immersive Experiences and Slimmer Devices

Brian Dipert — Wed, 27 Sep 2023 19:20:44 +0000

Meta to commercialize both in 2023.

Highlights:

Snapdragon® XR2 Gen 2 Platform delivers significant performance innovations with 2.5x higher GPU performance and 8x better AI¹.
Snapdragon AR1 Gen 1 Platform is the first dedicated processor for sleek smart glasses.
Both platforms deliver on-device AI, enabling more complex, immersive, and personalized experiences.
Qualcomm remains the spatial computing platform of choice for leading XR players.

Sep 27, 2023 | SAN DIEGO | Qualcomm Technologies, Inc. today announced two new spatial computing platforms – Snapdragon XR2 Gen 2 and Snapdragon AR1 Gen 1 – that will enable the next generation of mixed reality (MR), virtual reality (VR) devices and smart glasses.

Snapdragon XR2 Gen 2 Platform: The platform brings premium MR and VR technology into a single chip architecture to unlock next level immersive experiences in thinner and more comfortable headsets, that don’t require an external battery pack.

Engineered to deliver a lag free experience with breathtaking visuals and fully immersive sound, the platform allows users to blend virtual content with their physical surroundings and transition seamlessly between MR and VR experiences.

Snapdragon AR1 Gen 1 Platform: The platform is uniquely designed with power optimizations for a thermal budget to enable sleek, lightweight smart glasses that offer an unmatched experience.

The platform enables the user to capture, share or live-stream hands-free, directly from the glasses. In addition, on-device AI enables personal assistant experiences such as audio quality enhancement, visual search, and real-time translation. Lastly, support for a visual heads-up display can enable content consumption, including video, that blend seamlessly in the users’ field of view.

The merger of physical and digital spaces is a significant opportunity for Qualcomm Technologies. These next generation platforms were developed in close collaboration with Meta and will commercially debut on Meta devices in 2023: Meta Quest 3 powered by Snapdragon XR2 Gen 2 Platform, and Ray-Ban Meta smart glasses collection powered by Snapdragon AR1 Platform. With other manufacturers to follow next year, leading XR players continue to choose Snapdragon as the preferred platform.

“At Meta, we’re focused on developing the technologies of the future in mixed reality and smart glasses, as well as the foundational innovations that will one day power our vision for AR glasses,” said Andrew “Boz” Bosworth, Meta’s CTO and Head of Reality Labs. “Building this future computing platform requires an industry-leading partner and this is where our long-standing collaboration with Qualcomm Technologies is critical. Together, we are defining next-generation technologies that deliver massive breakthroughs in power, performance, and AI. The latest Snapdragon XR2 Gen 2 and Snapdragon AR1 Platforms, which power Meta Quest 3 and our next-generation Ray-Ban Meta smart glasses, are another testament to the strength of this collaboration and we are thrilled for users around the world to experience them.”

“Qualcomm Technologies has a relentless commitment to build technologies and solutions that will transform the future of computing. The Snapdragon XR2 Gen 2 and Snapdragon AR1 Platforms are the latest purpose-built processors that are designed to power the next generation of MR and VR devices and sleek smart glasses for all,” said Hugo Swart, vice president and GM of XR, Qualcomm Technologies, Inc. “The commercial debut of these two platforms with Meta is a further step forward in realizing our joint vision – unlocking premium, all-in-one XR devices and smart glasses that are affordable to users around the globe.”

For more information visit our websites: Snapdragon XR2 Gen 2 and Snapdragon AR1 Gen 1.

About Qualcomm

Qualcomm is enabling a world where everyone and everything can be intelligently connected. Our one technology roadmap allows us to efficiently scale the technologies that launched the mobile revolution – including advanced connectivity, high-performance, low-power compute, on-device intelligence and more – to the next generation of connected smart devices across industries. Innovations from Qualcomm and our family of Snapdragon platforms will help enable cloud-edge convergence, transform industries, accelerate the digital economy, and revolutionize how we experience the world, for the greater good.

¹Compared to Snapdragon XR2 Gen 1

The post Qualcomm Launches Its Next Generation XR and AR Platforms, Enabling Immersive Experiences and Slimmer Devices appeared first on Edge AI and Vision Alliance.

“Generative AI: How Will It Impact Edge Applications and Machine Perception?,” An Embedded Vision Summit Expert Panel Discussion

Brian Dipert — Thu, 14 Sep 2023 08:00:28 +0000

Sally Ward-Foxton, Senior Reporter at EE Times, moderates the “Generative AI: How Will It Impact Edge Applications and Machine Perception?” Expert Panel at the May 2023 Embedded Vision Summit. Other panelists include Greg Kostello, CTO and Co-Founder of Huma.AI, Vivek Pradeep, Partner Research Manager at Microsoft, Steve Teig, CEO of Perceive, and Roland Memisevic, Senior Director at Qualcomm AI Research.

Seemingly overnight, ChatGPT has spurred massive interest in—and excitement around—generative AI, and has become the fastest growing application in history. How will generative AI transform how we think about AI, and how we use it? What types of commercial applications are best suited for solutions powered by today’s generative AI technology?

Will recent advances in generative AI change how we create and use discriminative AI models, like those used for machine perception? Will generative AI obviate the need for massive reservoirs of hand-labeled training data? Will it accelerate our ability to create systems that effortlessly meld multiple types of data, such as text, images and sound?

With state-of-the-art generative models exceeding 100 billion parameters, will generative models ever be suitable for deployment at the edge? If so, for what use cases? This lively and insightful panel discussion explores these and many other questions around the rapidly-evolving role of generative AI in edge and machine-perception applications.

The post “Generative AI: How Will It Impact Edge Applications and Machine Perception?,” An Embedded Vision Summit Expert Panel Discussion appeared first on Edge AI and Vision Alliance.

Applying Advanced Compilation Technology to the AI Stack for Better Performance, Productivity and Power Efficiency

Brian Dipert — Thu, 24 Aug 2023 14:02:53 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

How Qualcomm is improving on-device performance with AI compilers

Artificial intelligence (AI) is having its big moment with the rise of generative AI models, but it often comes at an increasing cost: energy consumption and computing. To showcase how large models can be implemented more efficiently, Qualcomm Technologies, Inc. has recently presented the world’s first demo of Stable Diffusion running on an Android phone. However, research in the field of energy efficiency for AI is on-going. Taking the next step towards energy-efficient, on-device AI, Qualcomm is looking at new forms of compilers which can improve model performance.

Power and thermal efficiency are essential for implementing AI models on smaller and smaller devices, but the challenge becomes more complex as the amount of AI models has been growing along several factors, including AI model type, application domain, processor type and number of devices. The work of the Qualcomm AI Research team covers more areas of power efficiency, from quantization and conditional compute to neural architecture search and compilation. Below, we will focus on our AI compiler work for efficient hardware execution.

Figure 1: The Qualcomm AI Research team is working on multiple axes to efficiently run AI models.

What do compilers need?

A compiler analyzes source code — typically text from a programmer, written in a particular programming language — and produces an executable program. When they were invented in the 20th century, their main job was to read the input program, understand its meaning, and produce an executable program that runs efficiently on a single processor and does what the input program specified.

Nowadays, AI programs (models) are expressed in an even higher-level form, as a graph of so-called layers (you may have heard of convolution layers, fully connected layers etc.). These layers are often themselves expressed in C or C++.

Computing platforms have evolved into hierarchies of parallel processing engines. We call these processing elements (PEs). PEs share a limited power budget and as many as possible need to be kept busy at the same time, otherwise you might as well have fewer PEs. In other words, parallelism is required — operations that can be run simultaneously. However, the path from PEs to dynamic random-access memory (DRAM), which is where input data comes from, is relatively long and slow.

Figure 2: Parallelism and data locality are needed so that data transfers between remote and local memories can be reduced.

Optimization

To tackle this, architectures introduce local memory, which is on the chip and closer to the processing elements in limited amounts (due to cost constraints). When data is moved to local memory, we want all its uses to happen before it is evicted to make room for other data. This way, we avoid back-and-forth transfers between DRAM and the local memories. Data movement engines called DMAs also accelerate these data transfers when they can be done in bulk. In the program optimization jargon, avoiding back-and forth transfers and transferring data in bulk is called data locality.

Hence, a large part of the optimizing work is to increase the parallelism and data locality of the input program. This is all achieved by the Qualcomm Polyhedral Mapper, developed by our research team.

The Qualcomm Polyhedral Mapper

Polyhedral AI compiler optimization can be summarized as shown in Figure 3 below. We model the program as high-dimensional mathematical shapes called “polyhedra” and the optimization process is then a search for good choices of reshaping of the polyhedra to the PEs of the machine. This search can be done very fast using math solvers; optimization becomes solving certain equations fast. The abstraction of high-dimensional polyhedra is well-suited to the very deep loop nests that are implicit in modern ML models. These deep loop nests present many combinatorial choices for reshaping. We can solve for these choices using polyhedra and math solvers, much faster than other kinds of compilers.

We start with an “affine” scheduling transformation, which preconditions our loops for optimization, by finding and exposing independence between computations. This reshaping is finding different orientations of the computations to the machine. It simplifies the decomposition of loop computations into bigger packets named tiles. Then the decomposition step starts, where we optimize the shape and size of tiles for the targeted PEs and memory, after which we determine a distribution of tiles to PEs. Finally, we determine how data should be ordered in local memory and generate bulk transfers between remote and local memory. In addition, it’s important to detect when PEs need to wait for each other and insert just enough synchronization to make the parallel code correct. To prevent PEs from waiting for data to be transferred, data gets prefetched ahead of time using a technique called multi-buffering. This optimization process is performed hierarchically to fit the targeted hierarchy of PEs.

Finally, the code gets generated, first by separating the sequential code from the parallel code, then by converting the polyhedral representation back to loops.

Even having met these challenges, AI compilers need to keep up with the growing variety of machine types, user applications, and size of AI models. Below, we are singling out two ways to achieve this by tackling the tractability (scalability) of compilers through hierarchical mapping and by tackling hardware targets/features through (auto-) vectorization.

Figure 3: Polyhedral AI compiler optimization is an iterative process that allows for parallelism and data locality.

Improving hierarchical mapping

To achieve better tractability of AI compilers, we developed the focalization method. This method allows the compiler to consider a set of tile dimensions independently of other dimensions.

The effect of focalization can be significant for some loop codes. Let’s look at some results (Figure 4) where other tractability-helping techniques have been disabled. The effect is quite dramatic for some programs when targeting just two levels of hierarchy (2-level focalization), as you can see on the left-hand optimization time chart (lower is better).

The charts on the right-hand side in Figure 4 look at the time it takes to optimize a fully connected layer and a linear algebra kernel named doitgen as a function of the depth of the targeted machine (in terms of its levels of PEs).

On the fully connected layer, we see a reduction in the optimization time from ten times to two times. For doitgen, the optimization time goes above threshold after three levels, whereas the two-level focalization version scales well to five levels.

Figure 4: Using focalization, AI compile time reduces significantly.

Improving automated vectorization

Next, we look at automated vectorization. Two underlying processes are key for this: the Qualcomm Polyhedral Mapper finds independent operations suitable for vectorization and its underlying compiler generates optimized calls to vector intrinsics.

Automatic optimization of a sequential code to three levels of parallelism (including vector) sped up the compile process 620 times and led to a third of the hand-tuned performance. That’s one third of the performance achieved quickly and with almost zero effort.

The Qualcomm Polyhedral Mapper can dramatically increase AI model developer productivity. By supporting more optimizations, we increase the performance and power efficiency of AI models running on Qualcomm platforms. By improving the tractability of these optimizations, we increase the performance of larger, more complex models.

Figure 5: By automating optimization, AI compiler allows programmers to write fewer lines of code (LOCs) and get substantial performance improvement.

Improved AI compilers are key to increased runtime and power efficiency as well as to making machine learning ubiquitous. Not only can we create more AI models more quickly, but we can also run them at high performance on power-constrained devices.

Figure 6: Vectorization and hierarchical mapping are just two of the challenges that we are solving with polyhedral AI compilers.

Toward the future of polyhedral AI compilers

In this blog post, I have presented only two of the many solutions we are working on for improving polyhedral AI compilers. With the iterative improvement of tractable compiler optimizations, as with hierarchical mapping and vectorization, the Qualcomm Polyhedral Mapper can achieve better overall compiler performance and productivity.

Benoit Meister
Principal Engineer, Qualcomm Technologies

The post Applying Advanced Compilation Technology to the AI Stack for Better Performance, Productivity and Power Efficiency appeared first on Edge AI and Vision Alliance.

5 Benefits of On-device Generative AI

Brian Dipert — Wed, 16 Aug 2023 18:13:05 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

On-device AI processing offers must-have benefits for privacy, performance, personalization, cost, and energy

In the mid-’90s, the World Wide Web ushered in the era of massive remote data center computing now known as the cloud. And this shift paved the way to advancements in scientific modeling, design and simulation, research, and the world’s recent obsession with generative artificial intelligence (AI).

As discussed in our previous OnQ post Hybrid AI trends by the numbers: Costs, resources, parameters and more, these advancements are accompanied by increasing data center capital and operating costs: prohibitive ones that are increasingly creating a need — and an opportunity — to offload some workloads to edge devices like tablets, smartphones, personal computers (PCs), vehicles, and extended reality (XR) headsets. But the benefits of migrating workloads to these devices extend well beyond just the cost savings to data centers.

On-device AI is not new for us. For more than a decade, Qualcomm Technologies has been researching and working with customers, including original equipment manufacturers and application developers, to enhance the user experience through AI. Today it’s commonly used in radio frequency signal processing, battery management, audio processing, computational photography, video enhancement, and a variety of other on-device applications.

Extending on-device AI support to generative AI through optimized and/or specialized neural network models can further enhance the user experience through increased privacy and security, performance, and personalization while lowering the required costs and energy consumption.

On-device AI has several key benefits.

1. AI privacy and security

The transfer, storage, and use of data on multiple platforms and cloud services increases the potential for data tracking, data manipulation, and data theft.

On-device AI inherently helps protect users’ privacy since queries and personal information remain solely on the device. This is important for consumer data, as well as providing an additional level of protection for medical, enterprise, government, and other sensitive applications.

For example, a programming assistant app for generating code could run on the device without exposing confidential information to the cloud.

On-device AI provides low latency, high performance, and reliability for edge devices.

2. AI performance

AI performance can be measured in many ways, including processing performance and application latency. On-device processing performance of mobile devices has increased by double-digits with each technology generation and is projected to continue this trend, allowing for the use of larger generative AI models over time, especially as they become more optimized.

For generative AI, application latency is also critical. While consumers are more accommodating in waiting for the generation of a report, a commercial chatbot must respond in near real-time for a positive user experience. Processing generative AI models on device avoids the potential for latency caused by congested networks or cloud servers, while increasing the reliability by being able to execute a query anywhere and anytime.

With sensor and contextual data, on-device AI enables personalized experiences.

3. AI and personalization

Along with increased privacy, a strong benefit to consumers of on-device generative AI will be enhanced personalization. On-device generative AI will enable the customization of models and responses to the user’s unique speech patterns, expressions, reactions, usage patterns, environment, and even external data, such as from a fitness tracker or medical device, for full contextual awareness. This capability allows generative AI to essentially create a unique digital persona or personas for each user over time. The same can be done for a group, organization, or enterprise to create common and cohesive responses.

Smartphones are a user’s most personal device, and generative AI will make the entire user experience all-the-more personal.

On-device AI can offload computing from the cloud, saving cost and enabling scale.

4. Cost of AI

As cloud providers struggle with the equipment and operating costs associated with running generative AI models, they are beginning to charge consumer fees for services that were initially free. These fees are likely to continue increasing to meet the rising costs or until alternative business models can be found to offset the costs. Running generative AI on device can not only reduces the cost to consumers, it can also reduce the costs to cloud service providers and networking service providers while allowing valuable resources to be used for other high-value and high-priority tasks.

Efficient on-device AI processing can save energy and offload energy demands from the cloud.

5. AI and energy

The cost of running generative AI models on device versus the cloud translates directly to the amount of power required to run these models. Inference processing of large generative AI models may require the use of several AI accelerators, such as graphics processing units (GPUs) or tensor processing units (TPUs), and possibly even several servers. According to TIRIAS Research Principal Analyst, Jim McGregor, the idle power consumption of a single fully populated AI-accelerated server can approach one kilowatt of power while the peak power consumption can approach several kilowatts of power. This number multiplies by the number of servers required to run a generative AI model and the number of times a model is run, which as stated previously, is increasing exponentially. Added to this is the cost of power required to transfer the data over complex networks to and from the cloud. As a result, the amount of power consumption is also on an exponential growth trend.

Edge devices with efficient AI processing offer leading performance per watt, especially when compared with the cloud. Edge devices can run generative AI models at a fraction of the energy, especially when considering not only processing but also data transport. This difference is significant in energy costs as well as helping cloud providers offload data center energy consumption to meet their environmental and sustainability goals.

Pushing the boundaries of technology

The evolution of mobile technology pushed the boundaries of efficient processing for applications, images, videos, and sensors, and enabled the use of multiple user interfaces. Generative AI will further push the boundaries of on-device processing and will continue to enhance the personal computing experience. Qualcomm Technologies is working to enhance the performance of future smartphone, PC, vehicle and internet of things platforms while working with partners to bring generative AI on device through an open ecosystem. Look for more details in our future AI on the Edge OnQ posts.

Pat Lawlor
Director, Technical Marketing, Qualcomm Technologies

Jerry Chang
Senior Manager, Marketing, Qualcomm Technologies

The post 5 Benefits of On-device Generative AI appeared first on Edge AI and Vision Alliance.

Generative AI Trends by the Numbers: Costs, Resources, Parameters and More

Brian Dipert — Wed, 16 Aug 2023 17:44:37 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

We did the math: And here’s the cost of today’s AI Big Bang

As indicated in our whitepaper, The Future of AI is Hybrid (part 1): Unlocking the generative AI future with on-device and hybrid AI, a main motivation for on-device and hybrid artificial intelligence (AI) is cost. That cost is driven by the increasing size of AI models, the cost of running these models on cloud resources, and the increasing use of AI across devices and applications. Let’s put that into perspective.

Foundation models, such as general-purpose large language models (LLMs) like GPT-4 and PaLM 2, have achieved unprecedented levels of language understanding, generation capabilities, and world knowledge. Most of these models are quite large and are growing rapidly. GPT-1 was first launched by OpenAI in 2018 with the second, third and fourth version in 2019, 2020 and 2023 respectively.

During that time, the number of parameters in the GPT models increased from more than 117 million to 1.5 billion, 175 billion and approximately 1.8 trillion (estimated for GPT-4). (1) This is an unprecedented growth rate.

Although the size of state-of-the-art models continue to grow rapidly, another trend is toward much smaller models that still provide high-quality outputs. For example, Llama 2 with 7 or 13 billion parameters performs very well against much bigger models in generative AI benchmarks.² Running these smaller models in the cloud will help reduce the cloud resources required.

The cost of cloud computing to ISPs

The current cloud-based computing architecture for LLM inferencing leads to higher operating costs for internet search companies (ISPs), big and small. Consider a future with internet search augmented by generative AI LLMs, like GPT-3 running with 175 billion parameters. Generative AI searches can provide a much better user experience and results, but the cost per query is estimated to increase by 10 times compared to traditional search methods, if not more according to a report by Reuters.³

With more than 10 billion search queries per day currently, even if LLM-based searches take just a small fraction of queries, the incremental cost could be multiple billions of dollars annually. (4)

That cost translates to expensive graphics-processing-unit and tensor-processing-unit-accelerated servers, the infrastructure to support these high-performance servers, and the energy costs of running these servers. According to TIRIAS Research, AI infrastructure costs could exceed $76 billion by 2028.⁵ There is currently no effective business model to pass this cost on to consumers. So, new business models are required, or the costs need to be reduced significantly.

The cost of increased use

Use of these models is also increasing exponentially. Generative AI models, such as GPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts today. Future generative AI models will be able to create complete movies, gaming environments, and even metaverses.

These models are disrupting traditional methods of search, content creation, and recommendation systems — offering significant enhancements in utility, productivity and entertainment with use cases across industries, from the commonplace to the creative.

Architects and artists can explore new ideas, while engineers can create code more efficiently. Virtually any field that works with words, images, video, and automation can benefit.

ChatGPT has captured our imagination and engaged everyone’s curiosity to become the fastest growing app in history. ChatGPT reached more than 100 million active users in January 2023 just two months after it launched and now boasts more than 1.5 billion monthly visits, making OpenAI one of the top-20 websites in the world according to Reuters.⁶

Innovation is becoming more and more difficult to keep up with due to generative AI developments. According to a major aggregator site, there are more than 5,000 generative AI apps and features available.⁷ AI is having a Big Bang moment, akin to the launch of television, the worldwide web, or the smartphone. And this is just the beginning.

UBS estimates the market size that ChatGPT is operating in at $1 trillion to the entire ecosystem. (8)

The cost-efficient path to reaching AI’s full potential

The most efficient solution to reduce the tremendous costs of running generative AI in the cloud is to move models to edge devices. Transitioning models to the edge not only reduces the stress on the cloud infrastructure, but it also reduces execution latency while increasing data privacy and security.

AI models with more than 1 billion parameters are already running on phones with performance and accuracy levels similar to those of the cloud, and models with 10 billion parameters or more are slated to run on devices in the coming months. For example, at Mobile World Congress (MWC) 2023, Qualcomm Technologies demonstrated Stable Diffusion running completely on a smartphone powered by a Snapdragon 8 Gen 2 mobile platform.

Over time, advances in model optimization combined with increased on-device AI processing capabilities will allow many generative AI applications to run on the edge. In a hybrid AI solution, distributing AI processing between the cloud and devices will allow generative AI to scale and reach its full potential — on-device AI processing and cloud computing will complement each other.

The hybrid AI approach is applicable to virtually all generative AI applications and device segments — including phones, laptops, extended reality headsets, vehicles and the internet of things. The approach is crucial for generative AI to meet enterprise and consumer needs globally.

For more information check out previous OnQ blog posts on how on-device and hybrid is allowing AI to scale and how Qualcomm Technologies’ on-device leadership is enabling the global ecosystem for hybrid AI.

References

1. GPT-4. Wikipedia. Retrieved on July 25, 2023 on https://en.wikipedia.org/wiki/GPT-4.

2. Introducing Llama 2/Inside the Model/Benchmarks. Meta. Retrieved on July 25, 2023 on https://ai.meta.com/llama/.

3. Dastin, L. et.al. (Feb. 22, 2023). Focus: For tech giants, AI like Bing and Bard poses billion-dollar search problem. Reuters. Retrieved on July 19, 2023 on https://www.reuters.com/technology/tech-giants-ai-like-bing-bard-poses-billion-dollar-search-problem-2023-02-22/.

4. (Feb. 2023). How Large are the Incremental AI Costs. Morgan Stanley.

5. McGregor, Jim. (May 12, 2023). Generative AI Breaks The Data Center: Data Center Infrastructure And Operating Costs Projected To Increase To Over $76 Billion By 2028. Forbes. Retrieved on July 25 on https://www.forbes.com/sites/tiriasresearch/2023/05/12/generative-ai-breaks-the-data-center-data-center-infrastructure-and-operating-costs-projected-to-increase-to-over-76-billion-by-2028/?sh=f6d1b797c15e.

6. Hu, Crystal. ChatGPT sets record for fastest-growing user base – analyst note. Reuters (Feb. 2, 2023). Reuters. Retrieved on July 23, 2023 on https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.

7. There’s an AI for that. Retrieved on July 25, 2023 on https://theresanaiforthat.com/.

8. Garfilkle, Alexandra. (Feb. 2, 2023). ChatGPT on track to surpass 100 million users faster than TikTok or Instagram: UBS. Yahoo! news. Retrieved on July 23, 2023 on https://news.yahoo.com/chatgpt-on-track-to-surpass-100-million-users-faster-than-tiktok-or-instagram-ubs-214423357.html.

Pat Lawlor
Director, Technical Marketing, Qualcomm Technologies

The post Generative AI Trends by the Numbers: Costs, Resources, Parameters and More appeared first on Edge AI and Vision Alliance.

Floating-point Arithmetic for AI Inference: Hit or Miss?

Brian Dipert — Wed, 26 Jul 2023 22:26:57 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

Our latest whitepaper shows that a new floating-point format doesn’t measure up to integer when you’re quantizing AI models to run on edge devices

Artificial intelligence (AI) has become pervasive in our lives, improving our phones, cars, homes, medical centers, and more. As currently structured, these models primarily run in power-hungry, network-dependent data centers. Running AI on edge devices such as smartphones and PCs would improve reliability, latency, privacy, network bandwidth usage, and overall cost.

To move AI workloads to devices, we need to make neural networks considerably more efficient. Qualcomm has been investing heavily in the tools to do so, most recently showcasing the world’s first Stable Diffusion model on an Android phone. Bringing models like GPT, with its hundreds of billions of parameters, to devices will require even more work.

The Qualcomm AI Research team has been making advances in deep learning model efficiency for the past years with state-of-the-art results in neural architecture search, compilation, conditional compute, and quantization. Quantization, which reduces the number of bits needed to represent information, is particularly important because it allows for the largest effective reduction of the weights and activations to improve power efficiency and performance while maintaining accuracy. It also helps enable use cases that run multiple AI models concurrently, which is relevant for industries such as mobile, XR, automotive, and more.

Recently, a new 8-bit floating-point format (FP8) has been suggested for efficient deep-learning network training. As some layers in neural networks can be trained in FP8 as opposed to the incumbent FP16 and FP32 networks, this format would improve efficiency for training tremendously. However, the integer formats such as INT4 and INT8 have traditionally been used for inference, producing an optimal trade-off between network accuracy and efficiency.

We investigate the differences between the FP8 and INT8 formats for efficient inference and conclude that the integer format is superior from a cost and performance perspective. We have also open sourced the code for our investigation for transparency.

Differences between floating point and integer quantization

Our whitepaper compares the efficiency of floating point and integer quantization. For training, the floating-point formats FP16 and FP32 are commonly used as they have high enough accuracy, and no hyper-parameters. They mostly work out of the box, making them easy to use.

Going down in the number of bits improves the efficiency of networks greatly, but the ease-of-use advantage disappears. For formats like INT8 and FP8, you have to set hyper-parameters for the representable range of the distributions. To get your original network accuracy back, you also have to spend some extra time quantizing these networks. Either in some simple quantization steps called post-training-quantitation (PTQ), or by training the network in a quantized way all together, called quantization-aware-training (QAT).

Figure 1: Comparison of different INT formats.

Given that most training in the industry is currently conducted with entire networks in FP32, or sometimes FP16 with mixed precision, the step to having some parts of a network run in FP8 is an appealing potential speed-up for the costly and time-intensive training procedures in deep learning. This topic has gained quite some traction lately, so we set out to find out what this development means for efficient inference on edge devices. Specifically, we look at both the hardware considerations for the formats and the effect of the chosen formats on neural network accuracy.

Our whitepaper shows that the hardware implementation of the FP8 format is somewhere between 50% to 180% less efficient than INT8 in terms of chip area and energy usage. This is because of the additional logic needed in the accumulation of FP formats versus integer formats. This seems like a broad range, but the actual efficiency depends on many hardware design choices that vary greatly. A similar conclusion was reached recently by Microsoft and Meta: Floating-point arithmetic is just much less efficient than integer arithmetic.

This means that FP8 will have to be significantly more accurate than INT8 to be worthwhile from a hardware-efficiency perspective.

The hardware implementation of the FP8 format is somewhere between 50% to 180% less efficient than INT8 in terms of chip area and energy usage.

Quantization-aware training (QAT) results

Quantization-aware training is the quantization scenario most like how a format like FP8 would be used in practice, you train with the format while optimizing your neural network. We show the QAT results below for different tested formats. We see that all quantized networks get close to their original floating-point performance. In most cases, we even see an improvement over the baseline results of FP32. The reason for this is simply that training these models for longer generally improves results, even if we would train longer in FP32.

Figure 2: QAT results for different test formats. FP8-E4 is the most proposed FP8 format with 4 exponent bits. W4A8 is the INT format with 4-bit weights and 8-bit activations.

The results are quite clear: INT8 tends to perform better than other formats for most types of networks. It is only for transformers that FP8 performs better, but in the paper, we delve deeper into transformers and show that this difference is easily mitigated. The conclusion is simple however: there is no a-priori reason to believe that the FP8 format is more accurate for neural networks. In some cases, even when going as low as 4-bit weights with the W4A8 format (as indicated in the rightmost column of Figure 2), the accuracy is comparable to the FP8 format.

Can we convert FP8 to INT8 with good accuracy?

Since there are some benefits to using the FP8 data format for training, we also investigated the performance when FP8-E4 (a FP8 format with 4 exponent bits) trained networks are converted naively to INT8 for inference. We found that INT8 can precisely represent roughly 90% of the range covered by the FP8-E4 format without any quantization error. The remaining 10% of the range close to 0 incurs a small quantization error.

Figure 3: The FP8-E4 and INT8 distributions overlayed. The numbers inside the red box incur a small error when converted from FP8 to INT8, however, the larger values away from 0 can be perfectly captured by the INT8 format without an error.

Figure 4: What happens when we take a FP32 network, quantize it with QAT to FP8-E4, and then naively convert it to INT8? The conversion is smooth most of the time.

The general conclusion is that for networks that were originally easy to quantize from FP32 to INT8, the conversion is expected to be smooth, and can in several cases be done directly.

For networks that were already problematic to convert to INT8 from FP32 with simple PTQ techniques, mostly networks with significant outliers, similar issues will arise when converting from FP8 to INT8. However, since these latter networks are trained to deal with the reduced precision of the FP8 format, the INT8 conversion results from FP8 are better when compared against INT8 simple conversion from FP32. Moreover, INT8 QAT can be further employed to recover more accuracy in such cases.

The path towards better AI inference on device

Overall, integer quantization is still the way to do efficient AI inference. With varying effort levels, you can achieve significant efficiency benefits without sacrificing much accuracy.

Figure 5: The INT quantization paradigm.

For optimizing networks even further, opting for QAT can get the networks into the W4A8 (4-bit weight and 8-bit activation) regime. This is very achievable for a wide range of networks. Transformer-based large language models such as GPT, Bloom and Llama tend to benefit greatly from this jump in efficiency from 8- to 4-bit, as they are weight-bounded. Several works have shown that 4-bit weights are not only possible for large language models, but this is also optimal and possible to do in the PTQ setting. This is an efficiency boost that currently does not exist in the floating-point world.

To sum it all up, we see that floating-point format FP8-E4 is not a replacement for INT8 in terms of performance and accuracy. In most cases, they perform worse. Only in some extremely specific scenarios where layers have significant outliers, can the floating-point format perform better in terms of accuracy. We are confident that our proposed solutions will lead to a better and more seamless implementation of large AI models on edge devices. For this purpose, the Qualcomm Innovation Center has open-sourced the AI Model Efficiency Toolkit (AIMET). This allows developers to quantize their models more easily and implement AI on device more efficiently.

More of the latest in AI technology

Vinesh Sukumar
Senior Director, Product Management, Qualcomm Technologies, Inc.

Tijmen Blankevoort
Director, Engineering, Qualcomm Technologies Netherlands B.V.

The post Floating-point Arithmetic for AI Inference: Hit or Miss? appeared first on Edge AI and Vision Alliance.

Qualcomm Works with Meta to Enable On-device AI Applications Using Llama 2

Brian Dipert — Tue, 18 Jul 2023 18:48:19 +0000

Highlights:

Qualcomm is scheduled to make available Llama 2-based AI implementations on flagship smartphones and PCs starting from 2024 onwards to enable developers to usher in new and exciting generative AI applications using the AI-capabilities of Snapdragon platforms.
On-device AI implementation helps to increase user privacy, address security preferences, enhance applications reliability and enable personalization – at a significantly lower cost for developers compared to the sole use of cloud-based AI implementation and services.

Jul 18, 2023 | San Diego | Qualcomm Technologies, Inc. and Meta are working to optimize the execution of Meta’s Llama 2 large language models directly on-device – without relying on the sole use of cloud services. The ability to run generative AI models like Llama 2 on devices such as smartphones, PCs, VR/AR headsets, and vehicles allows developers to save on cloud costs, and to provide users with private, more reliable, and personalized experiences.

As a result, Qualcomm Technologies plans to make available on-device Llama 2-based AI implementations to enable the creation of new and exciting AI applications. This will allow customers, partners, and developers to build use cases, such as intelligent virtual assistants, productivity applications, content creation tools, entertainment, and more. These new on-device AI experiences, powered by Snapdragon®, can work in areas with no connectivity or even in airplane mode.

“We applaud Meta’s approach to open and responsible AI and are committed to driving innovation and reducing barriers-to-entry for developers of any size by bringing generative AI on-device,” said Durga Malladi, senior vice president and general manager of technology, planning and edge solutions businesses, Qualcomm Technologies, Inc. “To effectively scale generative AI into the mainstream, AI will need to run on both the cloud and devices at the edge, such as smartphones, laptops, vehicles, and IoT devices.”

Meta and Qualcomm Technologies have a longstanding history of working together to drive technology innovation and deliver the next generation of premium device experiences. The Companies’ current collaboration to support the Llama ecosystem span across research and product engineering efforts. Qualcomm Technologies’ leadership in on-device AI uniquely positions it to support the Llama ecosystem. The Company has an unmatched footprint at the edge—with billions of smartphones, vehicles, XR headsets and glasses, PCs, IoT, and more, being powered by its industry-leading AI hardware and software solutions—enabling the opportunity for generative AI to scale.

Qualcomm Technologies is scheduled to make available Llama 2-based AI implementation on devices powered by Snapdragon starting from 2024 onwards. Developers can start today optimizing applications for on-device AI using the Qualcomm® AI Stack – a dedicated set of tools that allow to process AI more efficiently on Snapdragon, making on-device AI possible even in small, thin, and light devices. For more updates, subscribe to our monthly developers newsletter.

Cautionary Note Regarding Forward-Looking Statements

In addition to historical information, this news release contains forward-looking statements that are inherently subject to risks and uncertainties, including but not limited to statements regarding our collaboration with Meta and the benefits and impact thereof, our plans to make available Llama 2-based AI implementations on devices powered by Snapdragon and the timing thereof, and the benefits and performance of on-device AI. Forward-looking statements are generally identified by words such as “estimates,” “guidance,” “expects,” “anticipates,” “intends,” “plans,” “believes,” “seeks” and similar expressions. Such forward-looking statements speak only as of the date of this news release, and are based on our current assumptions, expectations and beliefs, and information currently available to us. These forward-looking statements are not guarantees of future performance, and actual results may differ materially from those referred to in the forward-looking statements due to a number of important factors, including but not limited to the risks described in our most recent Annual Report on Form 10-K and subsequent Quarterly Reports on Form 10-Q filed with the U.S. Securities and Exchange Commission (SEC). Our reports filed with the SEC are available on our website at www.qualcomm.com. We undertake no obligation to update, or continue to provide information with respect to, any forward-looking statement or risk factor, whether as a result of new information, future events or otherwise.

About Qualcomm

Qualcomm is the world’s leading wireless technology innovator and the driving force behind the development, launch, and expansion of 5G. When we connected the phone to the internet, the mobile revolution was born. Today, our foundational technologies enable the mobile ecosystem and are found in every 3G, 4G and 5G smartphone. We bring the benefits of mobile to new industries, including automotive, the internet of things, and computing, and are leading the way to a world where everything and everyone can communicate and interact seamlessly.

The post Qualcomm Works with Meta to Enable On-device AI Applications Using Llama 2 appeared first on Edge AI and Vision Alliance.

Qualcomm Delivers Unprecedented Accessibility to Mobile Experiences in the Value Tier with New Snapdragon 4 Gen 2 Mobile Platform

Brian Dipert — Tue, 27 Jun 2023 14:46:29 +0000

Highlights:

Packed with purpose, Snapdragon 4 Gen 2 makes impressive technologies like 5G more accessible worldwide.
Designed to meet consumer needs in the value tier by providing effortless multi-tasking, advanced photography and videography, and reliable connections.
Commercial devices are expected to be announced in the second half of 2023.

Jun 26, 2023 | San Diego| Qualcomm Technologies, Inc. announced the new Snapdragon^® 4 Gen 2 Mobile Platform, which has been creatively engineered to make incredible mobile experiences accessible to more consumers globally. Snapdragon 4 Gen 2 provides effortless, all-day use with fast CPU speeds, sharp photography and videography, plus speedy 5G and Wi-Fi for reliable connections.

“Snapdragon – at its core – is driving innovation while meeting the demands of OEMs and the broader industry,” said Matthew Lopatka, director of product management, Qualcomm Technologies, Inc. “With this generational advancement in the Snapdragon 4-series, consumers will have greater access to the most popular and relevant mobile features and capabilities. We optimized every aspect of the platform in order to maximize the experiences for users.”

Snapdragon 4 Gen 2 is packed with upgrades to provide better performance, optimal 5G connectivity, and richer experiences for users.

Performance: The first 4nm platform in the 4-series, Snapdragon 4 Gen 2 was designed to extend battery life and improve overall platform efficiency. The Qualcomm® Kryo CPU offers peak speeds up to 2.2 GHz and up to 10% better CPU performance¹ for speedy everyday use. Qualcomm® Quick Charge 4+ Technology can refill up to 50% of a battery in just 15 minutes, avoiding the hassle of limiting device interaction throughout the day. The platform offers support for 120fps FHD+ displays for improved clarity and smooth, seamless scrolling.
Camera: Razor-sharp photos and videos allow users to capture meaningful experiences. Electronic image stabilization and faster autofocus provide blur reduction for clearer images, even with moving subjects. For the first time in the 4-series, Multi Camera Temporal Filtering (MCTF) is built into the hardware – providing noise reduction for high-quality videos.
AI: Exciting new AI enhancements include an AI-based low-light for crisp, detailed images in dim environments. AI-enhanced background noise removal ensures users are heard clearly on calls and video during work or in a crowded environment.
Connectivity: Powered by the Snapdragon X61 5G Modem-RF System, Snapdragon 4 Gen 2 delivers blazing-fast speeds and support for more networks, frequencies, and bandwidths globally. Plus, our Qualcomm Wi-Fi 5 is a robust solution delivering fast, strong Wi-Fi connectivity for gaming, streaming, and more.

Key OEM brands, including Redmi and vivo, will adopt Snapdragon 4 Gen 2, with commercial devices expected to be announced in the second half of 2023. For more information regarding Snapdragon 4 Gen 2 visit here.

About Qualcomm

The post Qualcomm Delivers Unprecedented Accessibility to Mobile Experiences in the Value Tier with New Snapdragon 4 Gen 2 Mobile Platform appeared first on Edge AI and Vision Alliance.

Qualcomm at CVPR 2023: Advancing Research and Bringing Generative AI to the Edge

Brian Dipert — Wed, 21 Jun 2023 14:15:26 +0000

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

ControlNet running entirely on device, fitness coaching with an LLM, 3D reconstruction for XR, our accepted papers and much more

The annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) is regarded as one of the most important events not only in computer vision but also in the artificial intelligence (AI) field. This year it takes place in Vancouver from June 18 to June 22, and we are showcasing our accepted research papers and technology demos — drop by our booth 1212 to see us in person. Here are a few of our highlights at CVPR 2023.

Our CVPR demos

Our research in AI, computer vision, extended reality (XR), and autonomous vehicles expand from core theoretical innovations to downstream real-world applications. We show such examples through captivating demonstrations highlighted below.

World’s fastest ControlNet demo running on a phone

A few months ago, we showcased the world’s first demo of Stable Diffusion running on an Android phone, which is an accepted demo at CVPR this year. Now, Qualcomm AI Research is demonstrating ControlNet, a 1.5 billion parameter image-to-image model, running entirely on a phone as well. ControlNet is a class of generative AI solutions known as language-vision models, or LVMs. It allows more precise control for generating images by conditioning on an input image and an input text description. In this demo, AI images are generated on the mobile device in under 12 seconds without requiring any cloud access, allowing an interactive user experience that is efficient, enjoyable, reliable and private. The impressive performance was achieved with a suite of full-stack AI optimizations across the model architecture, AI software and neural hardware accelerators. Our advanced AI tools and hardware used for this process include the AI Model Efficiency Toolkit (AIMET), the Qualcomm AI Stack and the Qualcomm AI Engine.

Fitness coaching with an LLM grounded in real-time vision

Qualcomm AI Research has used generative AI to develop a digital fitness coach that improves upon existing solutions in terms of accuracy and realism. The fitness coach provides real-time interaction by encouraging, correcting, and helping the user meet their fitness goals. Our demo showcases how a visually grounded large language model (LLM) can enable natural interactions that are contextual, multimodal and real-time. A video stream of the user exercising is processed by our action recognition model. Based on the recognized action, our stateful orchestrator grounds the prompt and feeds it to the LLM. The fitness coach provides the LLM answer back to the user through a text-to-speech avatar. This is made possible thanks to three key innovations: a vision model that is trained to detect fine-grained fitness activities, a language model that is trained to generate language grounded in the visual concepts, and an orchestrator that coordinates the fluid interaction between these two modalities to facilitate live dialogue coaching feedback. The result is a fitness coach that provides real-time interaction for an engaging and dynamic user experience.

World’s first 1080p neural video coding on a phone

In another world’s first in terms of AI running on device, this demo showcases encoding and decoding 1080p videos on a mobile device. Neural codecs are versatile: they can be customized for specific video needs, can be optimized for perceptual quality through advances in generative AI, can be extended to new modalities, and can run on general-purpose AI hardware. However, they present numerous challenges which make them difficult to implement on compute-constrained devices. We designed a novel and efficient neural interframe video compression architecture which makes it possible to do 1080p video coding on device. In the demo, you can see that the rich visual structures and complex motions of the high-quality video are accurately preserved by the neural video codec.

3D reconstruction for XR

We’ve successfully developed a cutting-edge real-time 3D reconstruction system that excels in accuracy and efficiency, enabling the creation of highly detailed 3D models of any environment. Our solution runs on a mobile device, generates depth maps from individual images, and combines them into a 3D scene representation. With an accurate and real-time 3D map, developers can unlock a vast array of augmented and virtual reality applications. To showcase the capabilities of our innovation, we have designed an engaging demonstration where users can shoot virtual balls against the real objects in the scene, such as walls and furniture, witnessing realistic bounces based on accurate physics calculations. This perception technology fosters immersive experiences and promises to accelerate the widespread adoption of the metaverse.

Computer vision for smart cameras

Photo and video capture continue to improve every year with new capabilities made possible by advancements from AI-based computer vision. Our demonstration shows semantic segmentation, monocular depth estimation, and instance segmentation enabling Bokeh effects, background replacement, cinematic mode, and class-dependent image quality improvement in sharpness, smoothness, clarity and contrast. These neural networks run video enhancement in real time on devices powered by Snapdragon platforms.

Driver monitoring technology for enhanced safety

The driver monitoring system (DMS) demonstration uses computer vision to infer dangerous driving conditions and improve safety. By using active infrared cameras within the cockpit, the DMS monitors the driver’s status in real time, including distraction and drowsiness, based on eye openness, gaze, head pose, facial expression, body activities and much more. The system warns the driver when dangerous driving is detected and can ultimately help save lives. The DMS runs in parallel with Advanced Driver Assistance Systems (ADAS) on the Snapdragon Ride Flex SoC.

Facial avatars for XR

Avatars are an essential ingredient for enabling immersive XR experiences in the metaverse, whether photorealistic or cartoonish. With one or more 2D photos, we use on-device AI to generate a personalized mesh and corresponding texture. For real-time rendering of the avatar, we use headset cameras that see the movements of the user’s eyes and mouth. The resulting demonstration is an avatar that is reconstructed and animated close to ground truth and relighted according to the environment. Our goal is to make a digital human available on the Snapdragon XR platform used in the metaverse and in human-machine interfaces.

Our CVPR papers

Premier conferences, such as CVPR, play a pivotal role in advancing the AI field, as they feature meticulously peer-reviewed papers that establish the new state-of-the-art and contribute impactful research to the rest of the community. We’d like to highlight eight of our accepted papers at the main conference, advancing the frontiers in computer vision for two broad categories: making the best use of data and creating better architectures.

Making the best use of data

In our paper “DistractFlow: Improving Optical Flow Estimation Models via Realistic Distractions and Pseudo-Labeling,” we introduce a novel data augmentation technique that specifically tackles the challenge of limited data availability in training optical flow estimation models. This problem arises when representative and diverse data samples are scarce, which is inherent for motion estimation. Our proposed method overcomes this limitation by incorporating realistic distractions into the labeled input frames, enhancing the model’s generalization ability. When unlabeled data is accessible, we extend our augmentation to self-supervised settings using pseudo-labeling and cross-consistency regularization, which enables us to substantially increase the number of training pairs without requiring complex and expensive data collection. Comprehensive evaluations across multiple benchmarks show that our method consistently improves optical flow estimation performance.

Our paper, “Progressive Random Convolutions for Single Domain Generalization,” presents a data-efficient framework that uses a novel image augmentation method based on Progressive Random Convolutions (Pro-RandConv). This progressive approach mitigates semantic distortions in augmented images by reducing the influence of non-local pixels in the receptive fields of the convolutional kernels, allowing the generation of more effective and representative domains by gradually increasing the style diversity in augmentation. This generalization strategy outperforms state-of-the-art methods on single-domain and multi-domain image classification, recognition, and segmentation benchmarks.

Learning-based gaze estimation requires large amounts of training data with accurate gaze annotations. In our paper “ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection,” we propose a neural network called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in high-resolution full-face images based on assigned directional values in an interpretable manner. By combining ReDirTrans with a pretrained e4e-StyleGAN pair, we create ReDirTrans-GAN, which enables accurate redirecting gaze while preserving other attributes such as identity, expression, and hairstyle.

In the paper “DejaVu: Regenerative Learning to Enhance Dense Prediction,” we show a novel framework which leverages conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks such as segmentation, depth estimation, and surface normal prediction. Our framework encourages the base network to learn to embed accurate scene structure in its dense prediction. This leads to more accurate predictions with clearer boundaries and better spatial consistency. Through extensive experiments on multiple dense prediction benchmarks, we demonstrate the efficacy of employing our framework during training, as it outperforms state-of-the-art methods at no added computation cost.

Creating better architectures

The method presented in “X³-KD: Cross-modal Cross-stage Cross-task Knowledge Distillation for 3D Object Detection” is a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3D object detection (3DOD). Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the perspective view feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. The model outperforms previous state-of-the-art approaches on key datasets and generalizes to RADAR-based 3DOD.

With “EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization,” we present a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. TTA is primarily conducted on edge devices with limited memory, so reducing memory is crucial but has been overlooked in previous TTA studies. In addition, long-term adaptation often leads to catastrophic forgetting and error accumulation, which hinders applying TTA in real-world deployments. Our method consists of two components to address these issues. First, it uses lightweight meta networks to adapt the original networks to the target domain. This minimizes memory by decreasing the size of intermediate activations required for backpropagation. Second, a novel self-distilled regularization controls the output of the meta networks not to deviate significantly from the output of the original networks, thereby preserving well-trained knowledge from the source domain. Therefore, our approach preserves well-trained knowledge from the source domain. This effective strategy outperforms other state-of-the-art methods for image classification and semantic segmentation tasks on various benchmarks.

The problem of incremental learning is tackled in “Dense Network Expansion for Class Incremental Learning.” A new network expansion method, called dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. This is accomplished by introducing dense connections between the intermediate layers of the task expert networks, which enable the knowledge transfer from old to new tasks via feature sharing and reusing. This sharing is implemented with a cross-task attention mechanism, based on a new task attention block (TAB), that fuses information across tasks. The DNE-based approach outperforms the previous state-of-the-art methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.

With “PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models” we propose a novel approach that enables zero-shot and few-shot, generalizable 3D part segmentation by leveraging the latest advances of pretrained language-vision models (LVMs). Currently, the LVMs can only operate on 2D images and thus cannot be directly applied to 3D part segmentation. We designed a 3D fusion module which processes the results from multiple views of an object, fuses them, and generates the part segmentation on the 3D point cloud, with compelling results against 3D benchmark datasets.

Workshops

CVPR 2023 Workshop on Autonomous Driving, paper: EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation [creating better architectures]

CVPR 2023 Mobile AI Workshop, paper: DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow [creating better architectures]

CVPR 2023 Mobile AI Workshop, paper: QuickSRNet Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms [creating better architectures]

CVPR 2023 Workshop on Learning with Limited Labelled Data for Image and Video Understanding, paper: Neural Transformation Network to Generate Diverse Views for Contrastive Learning [making the best use of data]

CVPR 2023 Embodied AI Workshop, paper: Situated real-time interaction with a virtually embodied avatar [making the best use of data]

Continuing to push the boundaries of AI

These are just some of our highlights from this year’s edition of CVPR. If you are at CVPR, drop by the Qualcomm booth to find out more about our research work, experience the demos live, and learn more about our machine learning job openings.

Ning Bi
VP of Engineering, Qualcomm Technologies

Fatih Porikli
Senior Director of Technology, Qualcomm Technologies

The post Qualcomm at CVPR 2023: Advancing Research and Bringing Generative AI to the Edge appeared first on Edge AI and Vision Alliance.

Qualcomm - Edge AI and Vision Alliance

The History of AI: How Generative AI Grew from Early Research

From how AI started to how it impacts you today, here’s your comprehensive AI primer

History of AI: The origins of AI research

AI’s proof of concept

Modern day leaps in AI

How AI works and AI technology definitions

The deep in deep learning

Real-time AI for everyone, everywhere

Qualcomm Launches Its Next Generation XR and AR Platforms, Enabling Immersive Experiences and Slimmer Devices

Meta to commercialize both in 2023.

Highlights:

Snapdragon® XR2 Gen 2 Platform delivers significant performance innovations with 2.5x higher GPU performance and 8x better AI1.

Snapdragon AR1 Gen 1 Platform is the first dedicated processor for sleek smart glasses.

Both platforms deliver on-device AI, enabling more complex, immersive, and personalized experiences.

Qualcomm remains the spatial computing platform of choice for leading XR players.

About Qualcomm

“Generative AI: How Will It Impact Edge Applications and Machine Perception?,” An Embedded Vision Summit Expert Panel Discussion

Applying Advanced Compilation Technology to the AI Stack for Better Performance, Productivity and Power Efficiency

How Qualcomm is improving on-device performance with AI compilers

What do compilers need?

Optimization

The Qualcomm Polyhedral Mapper

Improving hierarchical mapping

Improving automated vectorization

Toward the future of polyhedral AI compilers

5 Benefits of On-device Generative AI

On-device AI processing offers must-have benefits for privacy, performance, personalization, cost, and energy

1. AI privacy and security

2. AI performance

3. AI and personalization

4. Cost of AI

5. AI and energy

Pushing the boundaries of technology

Generative AI Trends by the Numbers: Costs, Resources, Parameters and More

We did the math: And here’s the cost of today’s AI Big Bang

The cost of cloud computing to ISPs

The cost of increased use

The cost-efficient path to reaching AI’s full potential

References

Floating-point Arithmetic for AI Inference: Hit or Miss?

Our latest whitepaper shows that a new floating-point format doesn’t measure up to integer when you’re quantizing AI models to run on edge devices

Differences between floating point and integer quantization

Quantization-aware training (QAT) results

Can we convert FP8 to INT8 with good accuracy?

The path towards better AI inference on device

More of the latest in AI technology

Qualcomm Works with Meta to Enable On-device AI Applications Using Llama 2

Highlights:

Qualcomm is scheduled to make available Llama 2-based AI implementations on flagship smartphones and PCs starting from 2024 onwards to enable developers to usher in new and exciting generative AI applications using the AI-capabilities of Snapdragon platforms.

On-device AI implementation helps to increase user privacy, address security preferences, enhance applications reliability and enable personalization – at a significantly lower cost for developers compared to the sole use of cloud-based AI implementation and services.

Cautionary Note Regarding Forward-Looking Statements

About Qualcomm

Qualcomm Delivers Unprecedented Accessibility to Mobile Experiences in the Value Tier with New Snapdragon 4 Gen 2 Mobile Platform

Highlights:

Packed with purpose, Snapdragon 4 Gen 2 makes impressive technologies like 5G more accessible worldwide.

Designed to meet consumer needs in the value tier by providing effortless multi-tasking, advanced photography and videography, and reliable connections.

Commercial devices are expected to be announced in the second half of 2023.

About Qualcomm

Qualcomm at CVPR 2023: Advancing Research and Bringing Generative AI to the Edge

ControlNet running entirely on device, fitness coaching with an LLM, 3D reconstruction for XR, our accepted papers and much more

Our CVPR demos

World’s fastest ControlNet demo running on a phone

Fitness coaching with an LLM grounded in real-time vision

World’s first 1080p neural video coding on a phone

3D reconstruction for XR

Computer vision for smart cameras

Driver monitoring technology for enhanced safety

Facial avatars for XR

Our CVPR papers

Making the best use of data

Creating better architectures

Workshops

Continuing to push the boundaries of AI

Snapdragon® XR2 Gen 2 Platform delivers significant performance innovations with 2.5x higher GPU performance and 8x better AI¹.