Skip to content

Move repo to RHEL9

Terry Barrett requested to merge move-to-rhel9 into main

Updates to move this codebase from the RHEL7 cluster to the RHEL9 cluster.

Key changes made as part of this update:

  • Dropped finetuning demo – I had hoped to bring this demo over to RHEL9, but hit some barriers setting it up due to it relying on older versions of llama, llama-recipes, and their many dependent packages. Felt that spending more time on it wasn't worth it, particularly since I'll be developing new demos for the Open House workshop
  • Dropped use of a module for Python runtime - Since the module build was a bit of a hassle for the RHEL7 version, and to maximize the team's time on preparations for the upcoming workshop, I focused on use of conda environments for this release
  • Added H100 support and dropped AMD-GPU support (Black Diamond's AMD GPUs weren't carried over to new cluster)
  • Added two new benchmarking runs (runs 20 and 21) and updated included spreadsheet with their data
    • A100 and V100 performance is similar to that on RHEL7
    • Models load and execute faster on the H100s, as expected

Merge request reports

Loading