Move repo to RHEL9
Updates to move this codebase from the RHEL7 cluster to the RHEL9 cluster.
Key changes made as part of this update:
- Dropped finetuning demo – I had hoped to bring this demo over to RHEL9, but hit some barriers setting it up due to it relying on older versions of llama, llama-recipes, and their many dependent packages. Felt that spending more time on it wasn't worth it, particularly since I'll be developing new demos for the Open House workshop
- Dropped use of a module for Python runtime - Since the module build was a bit of a hassle for the RHEL7 version, and to maximize the team's time on preparations for the upcoming workshop, I focused on use of conda environments for this release
- Added H100 support and dropped AMD-GPU support (Black Diamond's AMD GPUs weren't carried over to new cluster)
- Added two new benchmarking runs (runs 20 and 21) and updated included spreadsheet with their data
- A100 and V100 performance is similar to that on RHEL7
- Models load and execute faster on the H100s, as expected