options

Executable Output


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 1, "n_threads_batch": 1, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 33.944408, "speed_tg": 3.770871, "t": 33.944408, "speed": 3.770871}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_0  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271342 tid 271342 thread 0 bound to OS proc set {0}
OMP: pid 271342 tid 271409 thread 1 bound to OS proc set {32}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 2, "n_threads_batch": 2, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 17.824339, "speed_tg": 7.181192, "t": 17.824339, "speed": 7.181192}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_1  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271429 tid 271429 thread 0 bound to OS proc set {0}
OMP: pid 271429 tid 271497 thread 2 bound to OS proc set {32}
OMP: pid 271429 tid 271496 thread 1 bound to OS proc set {16}
OMP: pid 271429 tid 271498 thread 3 bound to OS proc set {48}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 4, "n_threads_batch": 4, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000001, "speed_pp": 0.000000, "t_tg": 9.463676, "speed_tg": 13.525399, "t": 9.463677, "speed": 13.525397}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_2  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271518 tid 271518 thread 0 bound to OS proc set {0}
OMP: pid 271518 tid 271588 thread 3 bound to OS proc set {24}
OMP: pid 271518 tid 271587 thread 2 bound to OS proc set {16}
OMP: pid 271518 tid 271586 thread 1 bound to OS proc set {8}
OMP: pid 271518 tid 271589 thread 4 bound to OS proc set {32}
OMP: pid 271518 tid 271591 thread 6 bound to OS proc set {48}
OMP: pid 271518 tid 271590 thread 5 bound to OS proc set {40}
OMP: pid 271518 tid 271592 thread 7 bound to OS proc set {56}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 8, "n_threads_batch": 8, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 5.419611, "speed_tg": 23.617931, "t": 5.419611, "speed": 23.617931}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_3  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271663 tid 271663 thread 0 bound to OS proc set {0}
OMP: pid 271663 tid 271732 thread 3 bound to OS proc set {12}
OMP: pid 271663 tid 271730 thread 1 bound to OS proc set {4}
OMP: pid 271663 tid 271731 thread 2 bound to OS proc set {8}
OMP: pid 271663 tid 271741 thread 12 bound to OS proc set {48}
OMP: pid 271663 tid 271743 thread 14 bound to OS proc set {56}
OMP: pid 271663 tid 271742 thread 13 bound to OS proc set {52}
OMP: pid 271663 tid 271740 thread 11 bound to OS proc set {44}
OMP: pid 271663 tid 271733 thread 4 bound to OS proc set {16}
OMP: pid 271663 tid 271737 thread 8 bound to OS proc set {32}
OMP: pid 271663 tid 271735 thread 6 bound to OS proc set {24}
OMP: pid 271663 tid 271736 thread 7 bound to OS proc set {28}
OMP: pid 271663 tid 271739 thread 10 bound to OS proc set {40}
OMP: pid 271663 tid 271738 thread 9 bound to OS proc set {36}
OMP: pid 271663 tid 271734 thread 5 bound to OS proc set {20}
OMP: pid 271663 tid 271744 thread 15 bound to OS proc set {60}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 16, "n_threads_batch": 16, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 3.882208, "speed_tg": 32.970928, "t": 3.882208, "speed": 32.970928}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_4  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271765 tid 271765 thread 0 bound to OS proc set {0}
OMP: pid 271765 tid 271835 thread 3 bound to OS proc set {8}
OMP: pid 271765 tid 271833 thread 1 bound to OS proc set {2}
OMP: pid 271765 tid 271836 thread 4 bound to OS proc set {10}
OMP: pid 271765 tid 271841 thread 9 bound to OS proc set {24}
OMP: pid 271765 tid 271844 thread 12 bound to OS proc set {32}
OMP: pid 271765 tid 271847 thread 15 bound to OS proc set {40}
OMP: pid 271765 tid 271839 thread 7 bound to OS proc set {18}
OMP: pid 271765 tid 271848 thread 16 bound to OS proc set {43}
OMP: pid 271765 tid 271842 thread 10 bound to OS proc set {27}
OMP: pid 271765 tid 271838 thread 6 bound to OS proc set {16}
OMP: pid 271765 tid 271840 thread 8 bound to OS proc set {21}
OMP: pid 271765 tid 271851 thread 19 bound to OS proc set {51}
OMP: pid 271765 tid 271846 thread 14 bound to OS proc set {37}
OMP: pid 271765 tid 271843 thread 11 bound to OS proc set {29}
OMP: pid 271765 tid 271850 thread 18 bound to OS proc set {48}
OMP: pid 271765 tid 271837 thread 5 bound to OS proc set {13}
OMP: pid 271765 tid 271845 thread 13 bound to OS proc set {35}
OMP: pid 271765 tid 271834 thread 2 bound to OS proc set {5}
OMP: pid 271765 tid 271852 thread 20 bound to OS proc set {54}
OMP: pid 271765 tid 271849 thread 17 bound to OS proc set {46}
OMP: pid 271765 tid 271854 thread 22 bound to OS proc set {59}
OMP: pid 271765 tid 271853 thread 21 bound to OS proc set {56}
OMP: pid 271765 tid 271855 thread 23 bound to OS proc set {62}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 24, "n_threads_batch": 24, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 3.696370, "speed_tg": 34.628571, "t": 3.696370, "speed": 34.628571}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_5  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271875 tid 271875 thread 0 bound to OS proc set {0}
OMP: pid 271875 tid 271953 thread 12 bound to OS proc set {24}
OMP: pid 271875 tid 271952 thread 11 bound to OS proc set {22}
OMP: pid 271875 tid 271949 thread 8 bound to OS proc set {16}
OMP: pid 271875 tid 271955 thread 14 bound to OS proc set {28}
OMP: pid 271875 tid 271942 thread 1 bound to OS proc set {2}
OMP: pid 271875 tid 271943 thread 2 bound to OS proc set {4}
OMP: pid 271875 tid 271948 thread 7 bound to OS proc set {14}
OMP: pid 271875 tid 271945 thread 4 bound to OS proc set {8}
OMP: pid 271875 tid 271944 thread 3 bound to OS proc set {6}
OMP: pid 271875 tid 271956 thread 15 bound to OS proc set {30}
OMP: pid 271875 tid 271957 thread 16 bound to OS proc set {32}
OMP: pid 271875 tid 271946 thread 5 bound to OS proc set {10}
OMP: pid 271875 tid 271951 thread 10 bound to OS proc set {20}
OMP: pid 271875 tid 271969 thread 28 bound to OS proc set {56}
OMP: pid 271875 tid 271954 thread 13 bound to OS proc set {26}
OMP: pid 271875 tid 271950 thread 9 bound to OS proc set {18}
OMP: pid 271875 tid 271971 thread 30 bound to OS proc set {60}
OMP: pid 271875 tid 271968 thread 27 bound to OS proc set {54}
OMP: pid 271875 tid 271947 thread 6 bound to OS proc set {12}
OMP: pid 271875 tid 271960 thread 19 bound to OS proc set {38}
OMP: pid 271875 tid 271959 thread 18 bound to OS proc set {36}
OMP: pid 271875 tid 271965 thread 24 bound to OS proc set {48}
OMP: pid 271875 tid 271961 thread 20 bound to OS proc set {40}
OMP: pid 271875 tid 271963 thread 22 bound to OS proc set {44}
OMP: pid 271875 tid 271970 thread 29 bound to OS proc set {58}
OMP: pid 271875 tid 271966 thread 25 bound to OS proc set {50}
OMP: pid 271875 tid 271958 thread 17 bound to OS proc set {34}
OMP: pid 271875 tid 271972 thread 31 bound to OS proc set {62}
OMP: pid 271875 tid 271964 thread 23 bound to OS proc set {46}
OMP: pid 271875 tid 271967 thread 26 bound to OS proc set {52}
OMP: pid 271875 tid 271962 thread 21 bound to OS proc set {42}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 32, "n_threads_batch": 32, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 3.577047, "speed_tg": 35.783707, "t": 3.577047, "speed": 35.783707}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_6  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 271992 tid 271992 thread 0 bound to OS proc set {0}
OMP: pid 271992 tid 272059 thread 1 bound to OS proc set {1}
OMP: pid 271992 tid 272060 thread 2 bound to OS proc set {3}
OMP: pid 271992 tid 272065 thread 7 bound to OS proc set {11}
OMP: pid 271992 tid 272072 thread 14 bound to OS proc set {22}
OMP: pid 271992 tid 272061 thread 3 bound to OS proc set {4}
OMP: pid 271992 tid 272070 thread 12 bound to OS proc set {19}
OMP: pid 271992 tid 272068 thread 10 bound to OS proc set {16}
OMP: pid 271992 tid 272066 thread 8 bound to OS proc set {13}
OMP: pid 271992 tid 272064 thread 6 bound to OS proc set {9}
OMP: pid 271992 tid 272062 thread 4 bound to OS proc set {6}
OMP: pid 271992 tid 272076 thread 18 bound to OS proc set {29}
OMP: pid 271992 tid 272093 thread 35 bound to OS proc set {56}
OMP: pid 271992 tid 272071 thread 13 bound to OS proc set {21}
OMP: pid 271992 tid 272063 thread 5 bound to OS proc set {8}
OMP: pid 271992 tid 272067 thread 9 bound to OS proc set {14}
OMP: pid 271992 tid 272077 thread 19 bound to OS proc set {30}
OMP: pid 271992 tid 272074 thread 16 bound to OS proc set {26}
OMP: pid 271992 tid 272073 thread 15 bound to OS proc set {24}
OMP: pid 271992 tid 272090 thread 32 bound to OS proc set {52}
OMP: pid 271992 tid 272069 thread 11 bound to OS proc set {17}
OMP: pid 271992 tid 272081 thread 23 bound to OS proc set {37}
OMP: pid 271992 tid 272097 thread 39 bound to OS proc set {63}
OMP: pid 271992 tid 272078 thread 20 bound to OS proc set {32}
OMP: pid 271992 tid 272083 thread 25 bound to OS proc set {40}
OMP: pid 271992 tid 272085 thread 27 bound to OS proc set {43}
OMP: pid 271992 tid 272092 thread 34 bound to OS proc set {55}
OMP: pid 271992 tid 272082 thread 24 bound to OS proc set {39}
OMP: pid 271992 tid 272088 thread 30 bound to OS proc set {48}
OMP: pid 271992 tid 272080 thread 22 bound to OS proc set {35}
OMP: pid 271992 tid 272075 thread 17 bound to OS proc set {27}
OMP: pid 271992 tid 272089 thread 31 bound to OS proc set {50}
OMP: pid 271992 tid 272094 thread 36 bound to OS proc set {58}
OMP: pid 271992 tid 272086 thread 28 bound to OS proc set {45}
OMP: pid 271992 tid 272084 thread 26 bound to OS proc set {42}
OMP: pid 271992 tid 272096 thread 38 bound to OS proc set {61}
OMP: pid 271992 tid 272091 thread 33 bound to OS proc set {53}
OMP: pid 271992 tid 272087 thread 29 bound to OS proc set {47}
OMP: pid 271992 tid 272095 thread 37 bound to OS proc set {60}
OMP: pid 271992 tid 272079 thread 21 bound to OS proc set {34}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 40, "n_threads_batch": 40, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 3.510397, "speed_tg": 36.463112, "t": 3.510397, "speed": 36.463112}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_7  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 272167 tid 272167 thread 0 bound to OS proc set {0}
OMP: pid 272167 tid 272235 thread 2 bound to OS proc set {2}
OMP: pid 272167 tid 272234 thread 1 bound to OS proc set {1}
OMP: pid 272167 tid 272244 thread 11 bound to OS proc set {14}
OMP: pid 272167 tid 272241 thread 8 bound to OS proc set {10}
OMP: pid 272167 tid 272236 thread 3 bound to OS proc set {4}
OMP: pid 272167 tid 272243 thread 10 bound to OS proc set {13}
OMP: pid 272167 tid 272245 thread 12 bound to OS proc set {16}
OMP: pid 272167 tid 272261 thread 28 bound to OS proc set {37}
OMP: pid 272167 tid 272260 thread 27 bound to OS proc set {36}
OMP: pid 272167 tid 272277 thread 44 bound to OS proc set {59}
OMP: pid 272167 tid 272268 thread 35 bound to OS proc set {47}
OMP: pid 272167 tid 272249 thread 16 bound to OS proc set {21}
OMP: pid 272167 tid 272257 thread 24 bound to OS proc set {32}
OMP: pid 272167 tid 272251 thread 18 bound to OS proc set {24}
OMP: pid 272167 tid 272242 thread 9 bound to OS proc set {12}
OMP: pid 272167 tid 272240 thread 7 bound to OS proc set {9}
OMP: pid 272167 tid 272267 thread 34 bound to OS proc set {46}
OMP: pid 272167 tid 272252 thread 19 bound to OS proc set {25}
OMP: pid 272167 tid 272280 thread 47 bound to OS proc set {63}
OMP: pid 272167 tid 272239 thread 6 bound to OS proc set {8}
OMP: pid 272167 tid 272266 thread 33 bound to OS proc set {44}
OMP: pid 272167 tid 272247 thread 14 bound to OS proc set {18}
OMP: pid 272167 tid 272237 thread 4 bound to OS proc set {5}
OMP: pid 272167 tid 272259 thread 26 bound to OS proc set {35}
OMP: pid 272167 tid 272248 thread 15 bound to OS proc set {20}
OMP: pid 272167 tid 272265 thread 32 bound to OS proc set {43}
OMP: pid 272167 tid 272279 thread 46 bound to OS proc set {62}
OMP: pid 272167 tid 272264 thread 31 bound to OS proc set {41}
OMP: pid 272167 tid 272250 thread 17 bound to OS proc set {23}
OMP: pid 272167 tid 272276 thread 43 bound to OS proc set {58}
OMP: pid 272167 tid 272246 thread 13 bound to OS proc set {17}
OMP: pid 272167 tid 272269 thread 36 bound to OS proc set {48}
OMP: pid 272167 tid 272258 thread 25 bound to OS proc set {33}
OMP: pid 272167 tid 272263 thread 30 bound to OS proc set {40}
OMP: pid 272167 tid 272273 thread 40 bound to OS proc set {54}
OMP: pid 272167 tid 272262 thread 29 bound to OS proc set {39}
OMP: pid 272167 tid 272278 thread 45 bound to OS proc set {60}
OMP: pid 272167 tid 272253 thread 20 bound to OS proc set {27}
OMP: pid 272167 tid 272256 thread 23 bound to OS proc set {31}
OMP: pid 272167 tid 272238 thread 5 bound to OS proc set {6}
OMP: pid 272167 tid 272255 thread 22 bound to OS proc set {29}
OMP: pid 272167 tid 272275 thread 42 bound to OS proc set {56}
OMP: pid 272167 tid 272271 thread 38 bound to OS proc set {51}
OMP: pid 272167 tid 272272 thread 39 bound to OS proc set {52}
OMP: pid 272167 tid 272270 thread 37 bound to OS proc set {50}
OMP: pid 272167 tid 272254 thread 21 bound to OS proc set {28}
OMP: pid 272167 tid 272274 thread 41 bound to OS proc set {55}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 48, "n_threads_batch": 48, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000001, "speed_pp": 0.000000, "t_tg": 3.455821, "speed_tg": 37.038956, "t": 3.455822, "speed": 37.038944}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_8  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 272300 tid 272300 thread 0 bound to OS proc set {0}
OMP: pid 272300 tid 272381 thread 15 bound to OS proc set {17}
OMP: pid 272300 tid 272378 thread 12 bound to OS proc set {13}
OMP: pid 272300 tid 272380 thread 14 bound to OS proc set {16}
OMP: pid 272300 tid 272377 thread 11 bound to OS proc set {12}
OMP: pid 272300 tid 272374 thread 8 bound to OS proc set {9}
OMP: pid 272300 tid 272373 thread 7 bound to OS proc set {8}
OMP: pid 272300 tid 272376 thread 10 bound to OS proc set {11}
OMP: pid 272300 tid 272382 thread 16 bound to OS proc set {18}
OMP: pid 272300 tid 272384 thread 18 bound to OS proc set {20}
OMP: pid 272300 tid 272375 thread 9 bound to OS proc set {10}
OMP: pid 272300 tid 272383 thread 17 bound to OS proc set {19}
OMP: pid 272300 tid 272370 thread 4 bound to OS proc set {4}
OMP: pid 272300 tid 272368 thread 2 bound to OS proc set {2}
OMP: pid 272300 tid 272369 thread 3 bound to OS proc set {3}
OMP: pid 272300 tid 272398 thread 32 bound to OS proc set {37}
OMP: pid 272300 tid 272415 thread 49 bound to OS proc set {56}
OMP: pid 272300 tid 272410 thread 44 bound to OS proc set {51}
OMP: pid 272300 tid 272417 thread 51 bound to OS proc set {59}
OMP: pid 272300 tid 272414 thread 48 bound to OS proc set {55}
OMP: pid 272300 tid 272416 thread 50 bound to OS proc set {58}
OMP: pid 272300 tid 272409 thread 43 bound to OS proc set {49}
OMP: pid 272300 tid 272372 thread 6 bound to OS proc set {6}
OMP: pid 272300 tid 272413 thread 47 bound to OS proc set {54}
OMP: pid 272300 tid 272379 thread 13 bound to OS proc set {15}
OMP: pid 272300 tid 272367 thread 1 bound to OS proc set {1}
OMP: pid 272300 tid 272371 thread 5 bound to OS proc set {5}
OMP: pid 272300 tid 272406 thread 40 bound to OS proc set {46}
OMP: pid 272300 tid 272394 thread 28 bound to OS proc set {32}
OMP: pid 272300 tid 272412 thread 46 bound to OS proc set {53}
OMP: pid 272300 tid 272418 thread 52 bound to OS proc set {60}
OMP: pid 272300 tid 272402 thread 36 bound to OS proc set {41}
OMP: pid 272300 tid 272405 thread 39 bound to OS proc set {45}
OMP: pid 272300 tid 272385 thread 19 bound to OS proc set {22}
OMP: pid 272300 tid 272396 thread 30 bound to OS proc set {34}
OMP: pid 272300 tid 272408 thread 42 bound to OS proc set {48}
OMP: pid 272300 tid 272397 thread 31 bound to OS proc set {35}
OMP: pid 272300 tid 272392 thread 26 bound to OS proc set {30}
OMP: pid 272300 tid 272399 thread 33 bound to OS proc set {38}
OMP: pid 272300 tid 272407 thread 41 bound to OS proc set {47}
OMP: pid 272300 tid 272411 thread 45 bound to OS proc set {52}
OMP: pid 272300 tid 272389 thread 23 bound to OS proc set {26}
OMP: pid 272300 tid 272395 thread 29 bound to OS proc set {33}
OMP: pid 272300 tid 272393 thread 27 bound to OS proc set {31}
OMP: pid 272300 tid 272404 thread 38 bound to OS proc set {44}
OMP: pid 272300 tid 272419 thread 53 bound to OS proc set {61}
OMP: pid 272300 tid 272401 thread 35 bound to OS proc set {40}
OMP: pid 272300 tid 272400 thread 34 bound to OS proc set {39}
OMP: pid 272300 tid 272403 thread 37 bound to OS proc set {42}
OMP: pid 272300 tid 272388 thread 22 bound to OS proc set {25}
OMP: pid 272300 tid 272421 thread 55 bound to OS proc set {63}
OMP: pid 272300 tid 272386 thread 20 bound to OS proc set {23}
OMP: pid 272300 tid 272391 thread 25 bound to OS proc set {29}
OMP: pid 272300 tid 272390 thread 24 bound to OS proc set {27}
OMP: pid 272300 tid 272387 thread 21 bound to OS proc set {24}
OMP: pid 272300 tid 272420 thread 54 bound to OS proc set {62}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 56, "n_threads_batch": 56, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000001, "speed_pp": 0.000000, "t_tg": 3.568447, "speed_tg": 35.869946, "t": 3.568448, "speed": 35.869934}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_9  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 272441 tid 272441 thread 0 bound to OS proc set {0}
OMP: pid 272441 tid 272510 thread 3 bound to OS proc set {3}
OMP: pid 272441 tid 272519 thread 12 bound to OS proc set {12}
OMP: pid 272441 tid 272522 thread 15 bound to OS proc set {15}
OMP: pid 272441 tid 272509 thread 2 bound to OS proc set {2}
OMP: pid 272441 tid 272521 thread 14 bound to OS proc set {14}
OMP: pid 272441 tid 272518 thread 11 bound to OS proc set {11}
OMP: pid 272441 tid 272520 thread 13 bound to OS proc set {13}
OMP: pid 272441 tid 272515 thread 8 bound to OS proc set {8}
OMP: pid 272441 tid 272523 thread 16 bound to OS proc set {16}
OMP: pid 272441 tid 272517 thread 10 bound to OS proc set {10}
OMP: pid 272441 tid 272526 thread 19 bound to OS proc set {19}
OMP: pid 272441 tid 272514 thread 7 bound to OS proc set {7}
OMP: pid 272441 tid 272525 thread 18 bound to OS proc set {18}
OMP: pid 272441 tid 272511 thread 4 bound to OS proc set {4}
OMP: pid 272441 tid 272516 thread 9 bound to OS proc set {9}
OMP: pid 272441 tid 272531 thread 24 bound to OS proc set {24}
OMP: pid 272441 tid 272513 thread 6 bound to OS proc set {6}
OMP: pid 272441 tid 272524 thread 17 bound to OS proc set {17}
OMP: pid 272441 tid 272569 thread 62 bound to OS proc set {62}
OMP: pid 272441 tid 272527 thread 20 bound to OS proc set {20}
OMP: pid 272441 tid 272557 thread 50 bound to OS proc set {50}
OMP: pid 272441 tid 272530 thread 23 bound to OS proc set {23}
OMP: pid 272441 tid 272529 thread 22 bound to OS proc set {22}
OMP: pid 272441 tid 272508 thread 1 bound to OS proc set {1}
OMP: pid 272441 tid 272535 thread 28 bound to OS proc set {28}
OMP: pid 272441 tid 272570 thread 63 bound to OS proc set {63}
OMP: pid 272441 tid 272558 thread 51 bound to OS proc set {51}
OMP: pid 272441 tid 272568 thread 61 bound to OS proc set {61}
OMP: pid 272441 tid 272555 thread 48 bound to OS proc set {48}
OMP: pid 272441 tid 272553 thread 46 bound to OS proc set {46}
OMP: pid 272441 tid 272556 thread 49 bound to OS proc set {49}
OMP: pid 272441 tid 272559 thread 52 bound to OS proc set {52}
OMP: pid 272441 tid 272539 thread 32 bound to OS proc set {32}
OMP: pid 272441 tid 272561 thread 54 bound to OS proc set {54}
OMP: pid 272441 tid 272565 thread 58 bound to OS proc set {58}
OMP: pid 272441 tid 272567 thread 60 bound to OS proc set {60}
OMP: pid 272441 tid 272541 thread 34 bound to OS proc set {34}
OMP: pid 272441 tid 272563 thread 56 bound to OS proc set {56}
OMP: pid 272441 tid 272566 thread 59 bound to OS proc set {59}
OMP: pid 272441 tid 272543 thread 36 bound to OS proc set {36}
OMP: pid 272441 tid 272534 thread 27 bound to OS proc set {27}
OMP: pid 272441 tid 272533 thread 26 bound to OS proc set {26}
OMP: pid 272441 tid 272540 thread 33 bound to OS proc set {33}
OMP: pid 272441 tid 272538 thread 31 bound to OS proc set {31}
OMP: pid 272441 tid 272512 thread 5 bound to OS proc set {5}
OMP: pid 272441 tid 272542 thread 35 bound to OS proc set {35}
OMP: pid 272441 tid 272554 thread 47 bound to OS proc set {47}
OMP: pid 272441 tid 272551 thread 44 bound to OS proc set {44}
OMP: pid 272441 tid 272544 thread 37 bound to OS proc set {37}
OMP: pid 272441 tid 272547 thread 40 bound to OS proc set {40}
OMP: pid 272441 tid 272545 thread 38 bound to OS proc set {38}
OMP: pid 272441 tid 272562 thread 55 bound to OS proc set {55}
OMP: pid 272441 tid 272546 thread 39 bound to OS proc set {39}
OMP: pid 272441 tid 272548 thread 41 bound to OS proc set {41}
OMP: pid 272441 tid 272550 thread 43 bound to OS proc set {43}
OMP: pid 272441 tid 272564 thread 57 bound to OS proc set {57}
OMP: pid 272441 tid 272560 thread 53 bound to OS proc set {53}
OMP: pid 272441 tid 272552 thread 45 bound to OS proc set {45}
OMP: pid 272441 tid 272532 thread 25 bound to OS proc set {25}
OMP: pid 272441 tid 272549 thread 42 bound to OS proc set {42}
OMP: pid 272441 tid 272528 thread 21 bound to OS proc set {21}
OMP: pid 272441 tid 272536 thread 29 bound to OS proc set {29}
OMP: pid 272441 tid 272537 thread 30 bound to OS proc set {30}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 64, "n_threads_batch": 64, "pp": 0, "tg": 128, "pl": 1, "n_kv": 128, "t_pp": 0.000000, "speed_pp": nan, "t_tg": 3.749776, "speed_tg": 34.135372, "t": 3.749776, "speed": 34.135372}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10

To display your profiling results:
########################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                                COMMAND                                                                                                #
########################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-406-0420/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-25_09-21-13/tools/lprof_npsu_run_10  #
########################################################################################################################################################################################################################################

×