options

Executable Output


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 1, "n_threads_batch": 1, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 73.037079, "speed_pp": 14.020276, "t_tg": 0.000001, "speed_tg": 0.000000, "t": 73.037079, "speed": 14.020276}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_0  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1128787 tid 1128787 thread 0 bound to OS proc set {0}
OMP: pid 1128787 tid 1128854 thread 1 bound to OS proc set {32}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 2, "n_threads_batch": 2, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 36.501072, "speed_pp": 28.053970, "t_tg": 0.000000, "speed_tg": nan, "t": 36.501072, "speed": 28.053970}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_1  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1128876 tid 1128876 thread 0 bound to OS proc set {0}
OMP: pid 1128876 tid 1128944 thread 2 bound to OS proc set {32}
OMP: pid 1128876 tid 1128943 thread 1 bound to OS proc set {16}
OMP: pid 1128876 tid 1128945 thread 3 bound to OS proc set {48}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 4, "n_threads_batch": 4, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 18.291191, "speed_pp": 55.983231, "t_tg": 0.000001, "speed_tg": 0.000000, "t": 18.291193, "speed": 55.983227}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_2  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129015 tid 1129015 thread 0 bound to OS proc set {0}
OMP: pid 1129015 tid 1129084 thread 3 bound to OS proc set {24}
OMP: pid 1129015 tid 1129083 thread 2 bound to OS proc set {16}
OMP: pid 1129015 tid 1129082 thread 1 bound to OS proc set {8}
OMP: pid 1129015 tid 1129085 thread 4 bound to OS proc set {32}
OMP: pid 1129015 tid 1129087 thread 6 bound to OS proc set {48}
OMP: pid 1129015 tid 1129086 thread 5 bound to OS proc set {40}
OMP: pid 1129015 tid 1129088 thread 7 bound to OS proc set {56}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 8, "n_threads_batch": 8, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 9.180444, "speed_pp": 111.541451, "t_tg": 0.000000, "speed_tg": nan, "t": 9.180444, "speed": 111.541451}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_3  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129108 tid 1129108 thread 0 bound to OS proc set {0}
OMP: pid 1129108 tid 1129177 thread 3 bound to OS proc set {12}
OMP: pid 1129108 tid 1129176 thread 2 bound to OS proc set {8}
OMP: pid 1129108 tid 1129186 thread 12 bound to OS proc set {48}
OMP: pid 1129108 tid 1129175 thread 1 bound to OS proc set {4}
OMP: pid 1129108 tid 1129185 thread 11 bound to OS proc set {44}
OMP: pid 1129108 tid 1129182 thread 8 bound to OS proc set {32}
OMP: pid 1129108 tid 1129188 thread 14 bound to OS proc set {56}
OMP: pid 1129108 tid 1129187 thread 13 bound to OS proc set {52}
OMP: pid 1129108 tid 1129184 thread 10 bound to OS proc set {40}
OMP: pid 1129108 tid 1129178 thread 4 bound to OS proc set {16}
OMP: pid 1129108 tid 1129183 thread 9 bound to OS proc set {36}
OMP: pid 1129108 tid 1129181 thread 7 bound to OS proc set {28}
OMP: pid 1129108 tid 1129180 thread 6 bound to OS proc set {24}
OMP: pid 1129108 tid 1129179 thread 5 bound to OS proc set {20}
OMP: pid 1129108 tid 1129189 thread 15 bound to OS proc set {60}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 16, "n_threads_batch": 16, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 4.639952, "speed_pp": 220.691925, "t_tg": 0.000001, "speed_tg": 0.000000, "t": 4.639953, "speed": 220.691879}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_4  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129210 tid 1129210 thread 0 bound to OS proc set {0}
OMP: pid 1129210 tid 1129279 thread 3 bound to OS proc set {8}
OMP: pid 1129210 tid 1129277 thread 1 bound to OS proc set {2}
OMP: pid 1129210 tid 1129280 thread 4 bound to OS proc set {10}
OMP: pid 1129210 tid 1129282 thread 6 bound to OS proc set {16}
OMP: pid 1129210 tid 1129283 thread 7 bound to OS proc set {18}
OMP: pid 1129210 tid 1129288 thread 12 bound to OS proc set {32}
OMP: pid 1129210 tid 1129292 thread 16 bound to OS proc set {43}
OMP: pid 1129210 tid 1129291 thread 15 bound to OS proc set {40}
OMP: pid 1129210 tid 1129293 thread 17 bound to OS proc set {46}
OMP: pid 1129210 tid 1129295 thread 19 bound to OS proc set {51}
OMP: pid 1129210 tid 1129287 thread 11 bound to OS proc set {29}
OMP: pid 1129210 tid 1129294 thread 18 bound to OS proc set {48}
OMP: pid 1129210 tid 1129290 thread 14 bound to OS proc set {37}
OMP: pid 1129210 tid 1129296 thread 20 bound to OS proc set {54}
OMP: pid 1129210 tid 1129289 thread 13 bound to OS proc set {35}
OMP: pid 1129210 tid 1129285 thread 9 bound to OS proc set {24}
OMP: pid 1129210 tid 1129284 thread 8 bound to OS proc set {21}
OMP: pid 1129210 tid 1129286 thread 10 bound to OS proc set {27}
OMP: pid 1129210 tid 1129298 thread 22 bound to OS proc set {59}
OMP: pid 1129210 tid 1129281 thread 5 bound to OS proc set {13}
OMP: pid 1129210 tid 1129278 thread 2 bound to OS proc set {5}
OMP: pid 1129210 tid 1129297 thread 21 bound to OS proc set {56}
OMP: pid 1129210 tid 1129299 thread 23 bound to OS proc set {62}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 24, "n_threads_batch": 24, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 3.484147, "speed_pp": 293.902618, "t_tg": 0.000000, "speed_tg": nan, "t": 3.484147, "speed": 293.902618}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_5  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129319 tid 1129319 thread 0 bound to OS proc set {0}
OMP: pid 1129319 tid 1129386 thread 1 bound to OS proc set {2}
OMP: pid 1129319 tid 1129396 thread 11 bound to OS proc set {22}
OMP: pid 1129319 tid 1129392 thread 7 bound to OS proc set {14}
OMP: pid 1129319 tid 1129387 thread 2 bound to OS proc set {4}
OMP: pid 1129319 tid 1129389 thread 4 bound to OS proc set {8}
OMP: pid 1129319 tid 1129393 thread 8 bound to OS proc set {16}
OMP: pid 1129319 tid 1129399 thread 14 bound to OS proc set {28}
OMP: pid 1129319 tid 1129397 thread 12 bound to OS proc set {24}
OMP: pid 1129319 tid 1129388 thread 3 bound to OS proc set {6}
OMP: pid 1129319 tid 1129390 thread 5 bound to OS proc set {10}
OMP: pid 1129319 tid 1129391 thread 6 bound to OS proc set {12}
OMP: pid 1129319 tid 1129409 thread 24 bound to OS proc set {48}
OMP: pid 1129319 tid 1129394 thread 9 bound to OS proc set {18}
OMP: pid 1129319 tid 1129400 thread 15 bound to OS proc set {30}
OMP: pid 1129319 tid 1129413 thread 28 bound to OS proc set {56}
OMP: pid 1129319 tid 1129395 thread 10 bound to OS proc set {20}
OMP: pid 1129319 tid 1129412 thread 27 bound to OS proc set {54}
OMP: pid 1129319 tid 1129404 thread 19 bound to OS proc set {38}
OMP: pid 1129319 tid 1129415 thread 30 bound to OS proc set {60}
OMP: pid 1129319 tid 1129401 thread 16 bound to OS proc set {32}
OMP: pid 1129319 tid 1129416 thread 31 bound to OS proc set {62}
OMP: pid 1129319 tid 1129402 thread 17 bound to OS proc set {34}
OMP: pid 1129319 tid 1129398 thread 13 bound to OS proc set {26}
OMP: pid 1129319 tid 1129414 thread 29 bound to OS proc set {58}
OMP: pid 1129319 tid 1129408 thread 23 bound to OS proc set {46}
OMP: pid 1129319 tid 1129403 thread 18 bound to OS proc set {36}
OMP: pid 1129319 tid 1129410 thread 25 bound to OS proc set {50}
OMP: pid 1129319 tid 1129411 thread 26 bound to OS proc set {52}
OMP: pid 1129319 tid 1129406 thread 21 bound to OS proc set {42}
OMP: pid 1129319 tid 1129405 thread 20 bound to OS proc set {40}
OMP: pid 1129319 tid 1129407 thread 22 bound to OS proc set {44}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 32, "n_threads_batch": 32, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 2.785860, "speed_pp": 367.570496, "t_tg": 0.000000, "speed_tg": nan, "t": 2.785860, "speed": 367.570496}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_6  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129485 tid 1129485 thread 0 bound to OS proc set {0}
OMP: pid 1129485 tid 1129552 thread 1 bound to OS proc set {1}
OMP: pid 1129485 tid 1129583 thread 32 bound to OS proc set {52}
OMP: pid 1129485 tid 1129553 thread 2 bound to OS proc set {3}
OMP: pid 1129485 tid 1129586 thread 35 bound to OS proc set {56}
OMP: pid 1129485 tid 1129562 thread 11 bound to OS proc set {17}
OMP: pid 1129485 tid 1129563 thread 12 bound to OS proc set {19}
OMP: pid 1129485 tid 1129558 thread 7 bound to OS proc set {11}
OMP: pid 1129485 tid 1129566 thread 15 bound to OS proc set {24}
OMP: pid 1129485 tid 1129554 thread 3 bound to OS proc set {4}
OMP: pid 1129485 tid 1129559 thread 8 bound to OS proc set {13}
OMP: pid 1129485 tid 1129555 thread 4 bound to OS proc set {6}
OMP: pid 1129485 tid 1129582 thread 31 bound to OS proc set {50}
OMP: pid 1129485 tid 1129584 thread 33 bound to OS proc set {53}
OMP: pid 1129485 tid 1129560 thread 9 bound to OS proc set {14}
OMP: pid 1129485 tid 1129557 thread 6 bound to OS proc set {9}
OMP: pid 1129485 tid 1129561 thread 10 bound to OS proc set {16}
OMP: pid 1129485 tid 1129564 thread 13 bound to OS proc set {21}
OMP: pid 1129485 tid 1129556 thread 5 bound to OS proc set {8}
OMP: pid 1129485 tid 1129565 thread 14 bound to OS proc set {22}
OMP: pid 1129485 tid 1129587 thread 36 bound to OS proc set {58}
OMP: pid 1129485 tid 1129589 thread 38 bound to OS proc set {61}
OMP: pid 1129485 tid 1129570 thread 19 bound to OS proc set {30}
OMP: pid 1129485 tid 1129585 thread 34 bound to OS proc set {55}
OMP: pid 1129485 tid 1129579 thread 28 bound to OS proc set {45}
OMP: pid 1129485 tid 1129569 thread 18 bound to OS proc set {29}
OMP: pid 1129485 tid 1129588 thread 37 bound to OS proc set {60}
OMP: pid 1129485 tid 1129580 thread 29 bound to OS proc set {47}
OMP: pid 1129485 tid 1129568 thread 17 bound to OS proc set {27}
OMP: pid 1129485 tid 1129567 thread 16 bound to OS proc set {26}
OMP: pid 1129485 tid 1129581 thread 30 bound to OS proc set {48}
OMP: pid 1129485 tid 1129577 thread 26 bound to OS proc set {42}
OMP: pid 1129485 tid 1129575 thread 24 bound to OS proc set {39}
OMP: pid 1129485 tid 1129574 thread 23 bound to OS proc set {37}
OMP: pid 1129485 tid 1129571 thread 20 bound to OS proc set {32}
OMP: pid 1129485 tid 1129573 thread 22 bound to OS proc set {35}
OMP: pid 1129485 tid 1129578 thread 27 bound to OS proc set {43}
OMP: pid 1129485 tid 1129590 thread 39 bound to OS proc set {63}
OMP: pid 1129485 tid 1129576 thread 25 bound to OS proc set {40}
OMP: pid 1129485 tid 1129572 thread 21 bound to OS proc set {34}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 40, "n_threads_batch": 40, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 2.361309, "speed_pp": 433.657776, "t_tg": 0.000000, "speed_tg": nan, "t": 2.361309, "speed": 433.657776}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_7  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129610 tid 1129610 thread 0 bound to OS proc set {0}
OMP: pid 1129610 tid 1129678 thread 2 bound to OS proc set {2}
OMP: pid 1129610 tid 1129677 thread 1 bound to OS proc set {1}
OMP: pid 1129610 tid 1129687 thread 11 bound to OS proc set {14}
OMP: pid 1129610 tid 1129679 thread 3 bound to OS proc set {4}
OMP: pid 1129610 tid 1129683 thread 7 bound to OS proc set {9}
OMP: pid 1129610 tid 1129685 thread 9 bound to OS proc set {12}
OMP: pid 1129610 tid 1129684 thread 8 bound to OS proc set {10}
OMP: pid 1129610 tid 1129710 thread 34 bound to OS proc set {46}
OMP: pid 1129610 tid 1129711 thread 35 bound to OS proc set {47}
OMP: pid 1129610 tid 1129709 thread 33 bound to OS proc set {44}
OMP: pid 1129610 tid 1129720 thread 44 bound to OS proc set {59}
OMP: pid 1129610 tid 1129686 thread 10 bound to OS proc set {13}
OMP: pid 1129610 tid 1129682 thread 6 bound to OS proc set {8}
OMP: pid 1129610 tid 1129723 thread 47 bound to OS proc set {63}
OMP: pid 1129610 tid 1129688 thread 12 bound to OS proc set {16}
OMP: pid 1129610 tid 1129700 thread 24 bound to OS proc set {32}
OMP: pid 1129610 tid 1129680 thread 4 bound to OS proc set {5}
OMP: pid 1129610 tid 1129681 thread 5 bound to OS proc set {6}
OMP: pid 1129610 tid 1129706 thread 30 bound to OS proc set {40}
OMP: pid 1129610 tid 1129721 thread 45 bound to OS proc set {60}
OMP: pid 1129610 tid 1129722 thread 46 bound to OS proc set {62}
OMP: pid 1129610 tid 1129699 thread 23 bound to OS proc set {31}
OMP: pid 1129610 tid 1129690 thread 14 bound to OS proc set {18}
OMP: pid 1129610 tid 1129703 thread 27 bound to OS proc set {36}
OMP: pid 1129610 tid 1129702 thread 26 bound to OS proc set {35}
OMP: pid 1129610 tid 1129719 thread 43 bound to OS proc set {58}
OMP: pid 1129610 tid 1129698 thread 22 bound to OS proc set {29}
OMP: pid 1129610 tid 1129689 thread 13 bound to OS proc set {17}
OMP: pid 1129610 tid 1129707 thread 31 bound to OS proc set {41}
OMP: pid 1129610 tid 1129712 thread 36 bound to OS proc set {48}
OMP: pid 1129610 tid 1129708 thread 32 bound to OS proc set {43}
OMP: pid 1129610 tid 1129694 thread 18 bound to OS proc set {24}
OMP: pid 1129610 tid 1129691 thread 15 bound to OS proc set {20}
OMP: pid 1129610 tid 1129701 thread 25 bound to OS proc set {33}
OMP: pid 1129610 tid 1129705 thread 29 bound to OS proc set {39}
OMP: pid 1129610 tid 1129695 thread 19 bound to OS proc set {25}
OMP: pid 1129610 tid 1129704 thread 28 bound to OS proc set {37}
OMP: pid 1129610 tid 1129696 thread 20 bound to OS proc set {27}
OMP: pid 1129610 tid 1129693 thread 17 bound to OS proc set {23}
OMP: pid 1129610 tid 1129697 thread 21 bound to OS proc set {28}
OMP: pid 1129610 tid 1129692 thread 16 bound to OS proc set {21}
OMP: pid 1129610 tid 1129715 thread 39 bound to OS proc set {52}
OMP: pid 1129610 tid 1129716 thread 40 bound to OS proc set {54}
OMP: pid 1129610 tid 1129714 thread 38 bound to OS proc set {51}
OMP: pid 1129610 tid 1129713 thread 37 bound to OS proc set {50}
OMP: pid 1129610 tid 1129717 thread 41 bound to OS proc set {55}
OMP: pid 1129610 tid 1129718 thread 42 bound to OS proc set {56}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 48, "n_threads_batch": 48, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 2.053570, "speed_pp": 498.643829, "t_tg": 0.000001, "speed_tg": 0.000000, "t": 2.053571, "speed": 498.643585}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_8  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129743 tid 1129743 thread 0 bound to OS proc set {0}
OMP: pid 1129743 tid 1129812 thread 3 bound to OS proc set {3}
OMP: pid 1129743 tid 1129811 thread 2 bound to OS proc set {2}
OMP: pid 1129743 tid 1129810 thread 1 bound to OS proc set {1}
OMP: pid 1129743 tid 1129813 thread 4 bound to OS proc set {4}
OMP: pid 1129743 tid 1129815 thread 6 bound to OS proc set {6}
OMP: pid 1129743 tid 1129814 thread 5 bound to OS proc set {5}
OMP: pid 1129743 tid 1129824 thread 15 bound to OS proc set {17}
OMP: pid 1129743 tid 1129820 thread 11 bound to OS proc set {12}
OMP: pid 1129743 tid 1129858 thread 49 bound to OS proc set {56}
OMP: pid 1129743 tid 1129857 thread 48 bound to OS proc set {55}
OMP: pid 1129743 tid 1129819 thread 10 bound to OS proc set {11}
OMP: pid 1129743 tid 1129860 thread 51 bound to OS proc set {59}
OMP: pid 1129743 tid 1129817 thread 8 bound to OS proc set {9}
OMP: pid 1129743 tid 1129836 thread 27 bound to OS proc set {31}
OMP: pid 1129743 tid 1129818 thread 9 bound to OS proc set {10}
OMP: pid 1129743 tid 1129861 thread 52 bound to OS proc set {60}
OMP: pid 1129743 tid 1129821 thread 12 bound to OS proc set {13}
OMP: pid 1129743 tid 1129864 thread 55 bound to OS proc set {63}
OMP: pid 1129743 tid 1129833 thread 24 bound to OS proc set {27}
OMP: pid 1129743 tid 1129825 thread 16 bound to OS proc set {18}
OMP: pid 1129743 tid 1129822 thread 13 bound to OS proc set {15}
OMP: pid 1129743 tid 1129859 thread 50 bound to OS proc set {58}
OMP: pid 1129743 tid 1129823 thread 14 bound to OS proc set {16}
OMP: pid 1129743 tid 1129835 thread 26 bound to OS proc set {30}
OMP: pid 1129743 tid 1129834 thread 25 bound to OS proc set {29}
OMP: pid 1129743 tid 1129840 thread 31 bound to OS proc set {35}
OMP: pid 1129743 tid 1129832 thread 23 bound to OS proc set {26}
OMP: pid 1129743 tid 1129863 thread 54 bound to OS proc set {62}
OMP: pid 1129743 tid 1129853 thread 44 bound to OS proc set {51}
OMP: pid 1129743 tid 1129837 thread 28 bound to OS proc set {32}
OMP: pid 1129743 tid 1129827 thread 18 bound to OS proc set {20}
OMP: pid 1129743 tid 1129862 thread 53 bound to OS proc set {61}
OMP: pid 1129743 tid 1129844 thread 35 bound to OS proc set {40}
OMP: pid 1129743 tid 1129826 thread 17 bound to OS proc set {19}
OMP: pid 1129743 tid 1129839 thread 30 bound to OS proc set {34}
OMP: pid 1129743 tid 1129843 thread 34 bound to OS proc set {39}
OMP: pid 1129743 tid 1129855 thread 46 bound to OS proc set {53}
OMP: pid 1129743 tid 1129829 thread 20 bound to OS proc set {23}
OMP: pid 1129743 tid 1129856 thread 47 bound to OS proc set {54}
OMP: pid 1129743 tid 1129828 thread 19 bound to OS proc set {22}
OMP: pid 1129743 tid 1129845 thread 36 bound to OS proc set {41}
OMP: pid 1129743 tid 1129849 thread 40 bound to OS proc set {46}
OMP: pid 1129743 tid 1129847 thread 38 bound to OS proc set {44}
OMP: pid 1129743 tid 1129816 thread 7 bound to OS proc set {8}
OMP: pid 1129743 tid 1129838 thread 29 bound to OS proc set {33}
OMP: pid 1129743 tid 1129842 thread 33 bound to OS proc set {38}
OMP: pid 1129743 tid 1129831 thread 22 bound to OS proc set {25}
OMP: pid 1129743 tid 1129850 thread 41 bound to OS proc set {47}
OMP: pid 1129743 tid 1129851 thread 42 bound to OS proc set {48}
OMP: pid 1129743 tid 1129854 thread 45 bound to OS proc set {52}
OMP: pid 1129743 tid 1129848 thread 39 bound to OS proc set {45}
OMP: pid 1129743 tid 1129846 thread 37 bound to OS proc set {42}
OMP: pid 1129743 tid 1129841 thread 32 bound to OS proc set {37}
OMP: pid 1129743 tid 1129830 thread 21 bound to OS proc set {24}
OMP: pid 1129743 tid 1129852 thread 43 bound to OS proc set {49}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 56, "n_threads_batch": 56, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 1.810754, "speed_pp": 565.510315, "t_tg": 0.000000, "speed_tg": nan, "t": 1.810754, "speed": 565.510315}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9

To display your profiling results:
######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                               #
######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_9  #
######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 1129884 tid 1129884 thread 0 bound to OS proc set {0}
OMP: pid 1129884 tid 1129953 thread 3 bound to OS proc set {3}
OMP: pid 1129884 tid 1129962 thread 12 bound to OS proc set {12}
OMP: pid 1129884 tid 1129965 thread 15 bound to OS proc set {15}
OMP: pid 1129884 tid 1129952 thread 2 bound to OS proc set {2}
OMP: pid 1129884 tid 1129961 thread 11 bound to OS proc set {11}
OMP: pid 1129884 tid 1129958 thread 8 bound to OS proc set {8}
OMP: pid 1129884 tid 1129964 thread 14 bound to OS proc set {14}
OMP: pid 1129884 tid 1129963 thread 13 bound to OS proc set {13}
OMP: pid 1129884 tid 1129957 thread 7 bound to OS proc set {7}
OMP: pid 1129884 tid 1129960 thread 10 bound to OS proc set {10}
OMP: pid 1129884 tid 1129954 thread 4 bound to OS proc set {4}
OMP: pid 1129884 tid 1129951 thread 1 bound to OS proc set {1}
OMP: pid 1129884 tid 1129959 thread 9 bound to OS proc set {9}
OMP: pid 1129884 tid 1129969 thread 19 bound to OS proc set {19}
OMP: pid 1129884 tid 1129956 thread 6 bound to OS proc set {6}
OMP: pid 1129884 tid 1129966 thread 16 bound to OS proc set {16}
OMP: pid 1129884 tid 1129968 thread 18 bound to OS proc set {18}
OMP: pid 1129884 tid 1129955 thread 5 bound to OS proc set {5}
OMP: pid 1129884 tid 1129967 thread 17 bound to OS proc set {17}
OMP: pid 1129884 tid 1129982 thread 32 bound to OS proc set {32}
OMP: pid 1129884 tid 1130000 thread 50 bound to OS proc set {50}
OMP: pid 1129884 tid 1129970 thread 20 bound to OS proc set {20}
OMP: pid 1129884 tid 1129971 thread 21 bound to OS proc set {21}
OMP: pid 1129884 tid 1129981 thread 31 bound to OS proc set {31}
OMP: pid 1129884 tid 1129985 thread 35 bound to OS proc set {35}
OMP: pid 1129884 tid 1129978 thread 28 bound to OS proc set {28}
OMP: pid 1129884 tid 1129977 thread 27 bound to OS proc set {27}
OMP: pid 1129884 tid 1130011 thread 61 bound to OS proc set {61}
OMP: pid 1129884 tid 1130013 thread 63 bound to OS proc set {63}
OMP: pid 1129884 tid 1130002 thread 52 bound to OS proc set {52}
OMP: pid 1129884 tid 1129979 thread 29 bound to OS proc set {29}
OMP: pid 1129884 tid 1129994 thread 44 bound to OS proc set {44}
OMP: pid 1129884 tid 1129986 thread 36 bound to OS proc set {36}
OMP: pid 1129884 tid 1129990 thread 40 bound to OS proc set {40}
OMP: pid 1129884 tid 1129980 thread 30 bound to OS proc set {30}
OMP: pid 1129884 tid 1130001 thread 51 bound to OS proc set {51}
OMP: pid 1129884 tid 1130004 thread 54 bound to OS proc set {54}
OMP: pid 1129884 tid 1129998 thread 48 bound to OS proc set {48}
OMP: pid 1129884 tid 1129983 thread 33 bound to OS proc set {33}
OMP: pid 1129884 tid 1129974 thread 24 bound to OS proc set {24}
OMP: pid 1129884 tid 1129984 thread 34 bound to OS proc set {34}
OMP: pid 1129884 tid 1129975 thread 25 bound to OS proc set {25}
OMP: pid 1129884 tid 1129996 thread 46 bound to OS proc set {46}
OMP: pid 1129884 tid 1129999 thread 49 bound to OS proc set {49}
OMP: pid 1129884 tid 1130010 thread 60 bound to OS proc set {60}
OMP: pid 1129884 tid 1129976 thread 26 bound to OS proc set {26}
OMP: pid 1129884 tid 1130006 thread 56 bound to OS proc set {56}
OMP: pid 1129884 tid 1130005 thread 55 bound to OS proc set {55}
OMP: pid 1129884 tid 1129989 thread 39 bound to OS proc set {39}
OMP: pid 1129884 tid 1130008 thread 58 bound to OS proc set {58}
OMP: pid 1129884 tid 1129973 thread 23 bound to OS proc set {23}
OMP: pid 1129884 tid 1129997 thread 47 bound to OS proc set {47}
OMP: pid 1129884 tid 1129991 thread 41 bound to OS proc set {41}
OMP: pid 1129884 tid 1129995 thread 45 bound to OS proc set {45}
OMP: pid 1129884 tid 1129988 thread 38 bound to OS proc set {38}
OMP: pid 1129884 tid 1129993 thread 43 bound to OS proc set {43}
OMP: pid 1129884 tid 1129987 thread 37 bound to OS proc set {37}
OMP: pid 1129884 tid 1130009 thread 59 bound to OS proc set {59}
OMP: pid 1129884 tid 1130007 thread 57 bound to OS proc set {57}
OMP: pid 1129884 tid 1129972 thread 22 bound to OS proc set {22}
OMP: pid 1129884 tid 1130012 thread 62 bound to OS proc set {62}
OMP: pid 1129884 tid 1129992 thread 42 bound to OS proc set {42}
OMP: pid 1129884 tid 1130003 thread 53 bound to OS proc set {53}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 64, "n_threads_batch": 64, "pp": 128, "tg": 0, "pl": 8, "n_kv": 1024, "t_pp": 1.616599, "speed_pp": 633.428589, "t_tg": 0.000000, "speed_tg": nan, "t": 1.616599, "speed": 633.428589}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/PP128_B8_Q4/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-24_13-52-43/tools/lprof_npsu_run_10  #
#######################################################################################################################################################################################################################################

×