This article is based on findings from a kernel-level GPU trace investigation performed on a real PyTorch issue (#154318) using eBPF uprobes. Trace databases are published in the Ingero open-source ...
Microsoft has released DeepSpeed, a new deep learning optimization library for PyTorch, that is designed to reduce memory use and train models with better parallelism on existing hardware. According ...