High-performance computing, CUDA, GPGPU, Graphics processing unit, Intel Corporation, Scalable Link Interface

Paper: Flexible Performant GEMM Kernels on GPUs

On Sep 29, 2020
@jeremyphoward shared
RT @Viral_B_Shah: GPU GEMM kernels in native #julialang coming very close to CUBLAS performance, including mixed precision arithmetic. @maleadt https://t.co/WKW3Cbnrhw https://t.co/yFMv1p7Y8M
Open

Thomas Faignaert, Tim Besard, Bjorn De Sutter General Matrix Multiplication or GEMM kernels take center place in high performance computing and machine learning. Recent NVIDIA GPUs include GEMM accelerators, such as NVIDIA’s Tensor Cores. In this paper we show how it is possible to ...

juliagpu.org
On Sep 29, 2020
@jeremyphoward shared
RT @Viral_B_Shah: GPU GEMM kernels in native #julialang coming very close to CUBLAS performance, including mixed precision arithmetic. @maleadt https://t.co/WKW3Cbnrhw https://t.co/yFMv1p7Y8M
Open

Paper: Flexible Performant GEMM Kernels on GPUs

Paper: Flexible Performant GEMM Kernels on GPUs

Thomas Faignaert, Tim Besard, Bjorn De Sutter General Matrix Multiplication or GEMM kernels take center place in high performance computing and machine learning. Recent NVIDIA GPUs include ...

Volkov and Demmel Paper on GPUs Wins SC19 Test of Time Award

Volkov and Demmel Paper on GPUs Wins SC19 Test of Time Award

Today SC19 announced the winners of the Test of Time Award. The annual award recognizes an outstanding paper that has deeply influenced the HPC discipline. We are pleased to announce the ...

Exact Gaussian Processes on a Million Data Points

Exact Gaussian Processes on a Million Data Points

Gaussian processes (GPs) are flexible models with state-of-the-art performance on many impactful applications. However, computational constraints with standard inference procedures have ...

CUDA 10 Features Revealed: Turing, CUDA Graphs and More

CUDA 10 Features Revealed: Turing, CUDA Graphs and More

CUDA 10 supports the new Turing architecture, including added Tensor Core data types, CUDA graphs, and improved analysis tools

Running GPU-Accelerated Kubernetes Workloads on P3 and P2 EC2 Instances with Amazon EKS

Running GPU-Accelerated Kubernetes Workloads on P3 and P2 EC2 Instances with Amazon EKS

This post contributed by Scott Malkie, AWS Solutions Architect Amazon EC2 P3 and P2 instances, featuring NVIDIA GPUs, power some of the most computationally advanced workloads today, ...

NVIDIA GPUs now work with Arm processors, Magnum open source I/O accelerates data workloads for AI

NVIDIA GPUs now work with Arm processors, Magnum open source I/O accelerates data workloads for AI

NVIDIA expands its ecosystem, flexes its software muscle, and takes a bet on new processors, workloads, and use cases. The developments paint a new picture in the AI chip race in the cloud ...

Open-sourcing FBGEMM for state-of-the-art server-side inference

Open-sourcing FBGEMM for state-of-the-art server-side inference

Facebook is open-sourcing FBGEMM to enable large-scale production servers to run the newest, most powerful deep learning models efficiently.

What is CUDA? Parallel programming for GPUs

What is CUDA? Parallel programming for GPUs

You can accelerate deep learning and other compute-intensive apps by taking advantage of CUDA and the parallel processing power of GPUs

Webinar: Introduction to AMD GPU Programming with HIP

Webinar: Introduction to AMD GPU Programming with HIP

When: June 7, 2019 @ 1:00 pm – 3:00 pm AMD Research will be presenting a webinar titled, “Introduction to AMD GPU programming with HIP” on June 7th from 1:00 pm to 3:00 pm ET. HIP is ...

Intel, NVIDIA Roll Out New HPC Hardware at SC19

Intel, NVIDIA Roll Out New HPC Hardware at SC19

Intel and NVIDIA rolled out new offerings around SC19 in Denver, the annual conference for the high performance computing (HPC) community, which provides a showcase for cutting-edge ...

Deep Learning on GPUs: Successes and Promises

Deep Learning on GPUs: Successes and Promises

The rise of deep-learning (DL) has been fueled by the improvements in accelerators. Accelerators allow DL models to crunch a large amount of data, which

NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator

NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator

The bread and butter of their success in the Volta/Turing generation on AI training and inference, NVIDIA is back with their third generation of tensor cores, and with them significant ...

DEEP LEARNING PLATFORMS

DEEP LEARNING PLATFORMS

AMAX is a global solutions partner specializing in highly-efficient, rack-integrated computing platforms geared towards optimizing OPEX and CAPEX.

NVIDIA Aims To Change The Computing World With New A100 Line

NVIDIA Aims To Change The Computing World With New A100 Line

NVIDIA's A100 and EGX platforms change the scale and internetworking of data center computing. The GPU changes the game in AI processing because it can do both inference in training on one ...

New AI Chips Set to Reshape Data Centers

New AI Chips Set to Reshape Data Centers

AI chip startups are hot on the heels of GPU leader Nvidia. At the same time, there is also significant competition in data center inference...

AMD now wants to take on Nvidia in the data center

AMD now wants to take on Nvidia in the data center

AMD’s new GPUs designed for the data center give it a chance against Nvidia, but the software is still an issue.