options

Executable Output


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 1, "n_threads_batch": 1, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 92.905777, "speed_pp": 5.510960, "t_tg": 0.000000, "speed_tg": nan, "t": 92.905777, "speed": 5.510960}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_0  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4051910 tid 4051910 thread 0 bound to OS proc set {0}
OMP: pid 4051910 tid 4051977 thread 1 bound to OS proc set {32}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 2, "n_threads_batch": 2, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 47.287048, "speed_pp": 10.827489, "t_tg": 0.000000, "speed_tg": nan, "t": 47.287048, "speed": 10.827489}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_1  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4052270 tid 4052270 thread 0 bound to OS proc set {0}
OMP: pid 4052270 tid 4052338 thread 2 bound to OS proc set {32}
OMP: pid 4052270 tid 4052337 thread 1 bound to OS proc set {16}
OMP: pid 4052270 tid 4052339 thread 3 bound to OS proc set {48}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 4, "n_threads_batch": 4, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 24.621490, "speed_pp": 20.794842, "t_tg": 0.000000, "speed_tg": nan, "t": 24.621490, "speed": 20.794842}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_2  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4053023 tid 4053023 thread 0 bound to OS proc set {0}
OMP: pid 4053023 tid 4053092 thread 3 bound to OS proc set {24}
OMP: pid 4053023 tid 4053091 thread 2 bound to OS proc set {16}
OMP: pid 4053023 tid 4053090 thread 1 bound to OS proc set {8}
OMP: pid 4053023 tid 4053093 thread 4 bound to OS proc set {32}
OMP: pid 4053023 tid 4053095 thread 6 bound to OS proc set {48}
OMP: pid 4053023 tid 4053094 thread 5 bound to OS proc set {40}
OMP: pid 4053023 tid 4053096 thread 7 bound to OS proc set {56}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 8, "n_threads_batch": 8, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 14.917583, "speed_pp": 34.321915, "t_tg": 0.000000, "speed_tg": nan, "t": 14.917583, "speed": 34.321915}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_3  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4054714 tid 4054714 thread 0 bound to OS proc set {0}
OMP: pid 4054714 tid 4054782 thread 2 bound to OS proc set {8}
OMP: pid 4054714 tid 4054783 thread 3 bound to OS proc set {12}
OMP: pid 4054714 tid 4054781 thread 1 bound to OS proc set {4}
OMP: pid 4054714 tid 4054794 thread 14 bound to OS proc set {56}
OMP: pid 4054714 tid 4054792 thread 12 bound to OS proc set {48}
OMP: pid 4054714 tid 4054793 thread 13 bound to OS proc set {52}
OMP: pid 4054714 tid 4054791 thread 11 bound to OS proc set {44}
OMP: pid 4054714 tid 4054784 thread 4 bound to OS proc set {16}
OMP: pid 4054714 tid 4054788 thread 8 bound to OS proc set {32}
OMP: pid 4054714 tid 4054790 thread 10 bound to OS proc set {40}
OMP: pid 4054714 tid 4054785 thread 5 bound to OS proc set {20}
OMP: pid 4054714 tid 4054787 thread 7 bound to OS proc set {28}
OMP: pid 4054714 tid 4054786 thread 6 bound to OS proc set {24}
OMP: pid 4054714 tid 4054789 thread 9 bound to OS proc set {36}
OMP: pid 4054714 tid 4054795 thread 15 bound to OS proc set {60}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 16, "n_threads_batch": 16, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 10.475981, "speed_pp": 48.873707, "t_tg": 0.000000, "speed_tg": nan, "t": 10.475981, "speed": 48.873707}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_4  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4058130 tid 4058130 thread 0 bound to OS proc set {0}
OMP: pid 4058130 tid 4058200 thread 4 bound to OS proc set {10}
OMP: pid 4058130 tid 4058199 thread 3 bound to OS proc set {8}
OMP: pid 4058130 tid 4058204 thread 8 bound to OS proc set {21}
OMP: pid 4058130 tid 4058203 thread 7 bound to OS proc set {18}
OMP: pid 4058130 tid 4058202 thread 6 bound to OS proc set {16}
OMP: pid 4058130 tid 4058208 thread 12 bound to OS proc set {32}
OMP: pid 4058130 tid 4058197 thread 1 bound to OS proc set {2}
OMP: pid 4058130 tid 4058215 thread 19 bound to OS proc set {51}
OMP: pid 4058130 tid 4058211 thread 15 bound to OS proc set {40}
OMP: pid 4058130 tid 4058201 thread 5 bound to OS proc set {13}
OMP: pid 4058130 tid 4058216 thread 20 bound to OS proc set {54}
OMP: pid 4058130 tid 4058198 thread 2 bound to OS proc set {5}
OMP: pid 4058130 tid 4058207 thread 11 bound to OS proc set {29}
OMP: pid 4058130 tid 4058210 thread 14 bound to OS proc set {37}
OMP: pid 4058130 tid 4058205 thread 9 bound to OS proc set {24}
OMP: pid 4058130 tid 4058206 thread 10 bound to OS proc set {27}
OMP: pid 4058130 tid 4058214 thread 18 bound to OS proc set {48}
OMP: pid 4058130 tid 4058217 thread 21 bound to OS proc set {56}
OMP: pid 4058130 tid 4058212 thread 16 bound to OS proc set {43}
OMP: pid 4058130 tid 4058213 thread 17 bound to OS proc set {46}
OMP: pid 4058130 tid 4058209 thread 13 bound to OS proc set {35}
OMP: pid 4058130 tid 4058218 thread 22 bound to OS proc set {59}
OMP: pid 4058130 tid 4058219 thread 23 bound to OS proc set {62}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 24, "n_threads_batch": 24, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 8.928968, "speed_pp": 57.341450, "t_tg": 0.000000, "speed_tg": nan, "t": 8.928968, "speed": 57.341450}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_5  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4063322 tid 4063322 thread 0 bound to OS proc set {0}
OMP: pid 4063322 tid 4063395 thread 7 bound to OS proc set {14}
OMP: pid 4063322 tid 4063400 thread 12 bound to OS proc set {24}
OMP: pid 4063322 tid 4063389 thread 1 bound to OS proc set {2}
OMP: pid 4063322 tid 4063392 thread 4 bound to OS proc set {8}
OMP: pid 4063322 tid 4063391 thread 3 bound to OS proc set {6}
OMP: pid 4063322 tid 4063390 thread 2 bound to OS proc set {4}
OMP: pid 4063322 tid 4063393 thread 5 bound to OS proc set {10}
OMP: pid 4063322 tid 4063394 thread 6 bound to OS proc set {12}
OMP: pid 4063322 tid 4063396 thread 8 bound to OS proc set {16}
OMP: pid 4063322 tid 4063416 thread 28 bound to OS proc set {56}
OMP: pid 4063322 tid 4063398 thread 10 bound to OS proc set {20}
OMP: pid 4063322 tid 4063399 thread 11 bound to OS proc set {22}
OMP: pid 4063322 tid 4063397 thread 9 bound to OS proc set {18}
OMP: pid 4063322 tid 4063403 thread 15 bound to OS proc set {30}
OMP: pid 4063322 tid 4063404 thread 16 bound to OS proc set {32}
OMP: pid 4063322 tid 4063418 thread 30 bound to OS proc set {60}
OMP: pid 4063322 tid 4063402 thread 14 bound to OS proc set {28}
OMP: pid 4063322 tid 4063417 thread 29 bound to OS proc set {58}
OMP: pid 4063322 tid 4063419 thread 31 bound to OS proc set {62}
OMP: pid 4063322 tid 4063407 thread 19 bound to OS proc set {38}
OMP: pid 4063322 tid 4063406 thread 18 bound to OS proc set {36}
OMP: pid 4063322 tid 4063401 thread 13 bound to OS proc set {26}
OMP: pid 4063322 tid 4063415 thread 27 bound to OS proc set {54}
OMP: pid 4063322 tid 4063405 thread 17 bound to OS proc set {34}
OMP: pid 4063322 tid 4063414 thread 26 bound to OS proc set {52}
OMP: pid 4063322 tid 4063408 thread 20 bound to OS proc set {40}
OMP: pid 4063322 tid 4063412 thread 24 bound to OS proc set {48}
OMP: pid 4063322 tid 4063410 thread 22 bound to OS proc set {44}
OMP: pid 4063322 tid 4063411 thread 23 bound to OS proc set {46}
OMP: pid 4063322 tid 4063409 thread 21 bound to OS proc set {42}
OMP: pid 4063322 tid 4063413 thread 25 bound to OS proc set {50}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 32, "n_threads_batch": 32, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 8.271211, "speed_pp": 61.901459, "t_tg": 0.000000, "speed_tg": nan, "t": 8.271211, "speed": 61.901459}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_6  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4070339 tid 4070339 thread 0 bound to OS proc set {0}
OMP: pid 4070339 tid 4070406 thread 1 bound to OS proc set {1}
OMP: pid 4070339 tid 4070407 thread 2 bound to OS proc set {3}
OMP: pid 4070339 tid 4070420 thread 15 bound to OS proc set {24}
OMP: pid 4070339 tid 4070412 thread 7 bound to OS proc set {11}
OMP: pid 4070339 tid 4070408 thread 3 bound to OS proc set {4}
OMP: pid 4070339 tid 4070416 thread 11 bound to OS proc set {17}
OMP: pid 4070339 tid 4070415 thread 10 bound to OS proc set {16}
OMP: pid 4070339 tid 4070409 thread 4 bound to OS proc set {6}
OMP: pid 4070339 tid 4070417 thread 12 bound to OS proc set {19}
OMP: pid 4070339 tid 4070413 thread 8 bound to OS proc set {13}
OMP: pid 4070339 tid 4070418 thread 13 bound to OS proc set {21}
OMP: pid 4070339 tid 4070410 thread 5 bound to OS proc set {8}
OMP: pid 4070339 tid 4070437 thread 32 bound to OS proc set {52}
OMP: pid 4070339 tid 4070411 thread 6 bound to OS proc set {9}
OMP: pid 4070339 tid 4070440 thread 35 bound to OS proc set {56}
OMP: pid 4070339 tid 4070424 thread 19 bound to OS proc set {30}
OMP: pid 4070339 tid 4070414 thread 9 bound to OS proc set {14}
OMP: pid 4070339 tid 4070428 thread 23 bound to OS proc set {37}
OMP: pid 4070339 tid 4070419 thread 14 bound to OS proc set {22}
OMP: pid 4070339 tid 4070441 thread 36 bound to OS proc set {58}
OMP: pid 4070339 tid 4070436 thread 31 bound to OS proc set {50}
OMP: pid 4070339 tid 4070425 thread 20 bound to OS proc set {32}
OMP: pid 4070339 tid 4070439 thread 34 bound to OS proc set {55}
OMP: pid 4070339 tid 4070443 thread 38 bound to OS proc set {61}
OMP: pid 4070339 tid 4070433 thread 28 bound to OS proc set {45}
OMP: pid 4070339 tid 4070423 thread 18 bound to OS proc set {29}
OMP: pid 4070339 tid 4070438 thread 33 bound to OS proc set {53}
OMP: pid 4070339 tid 4070421 thread 16 bound to OS proc set {26}
OMP: pid 4070339 tid 4070442 thread 37 bound to OS proc set {60}
OMP: pid 4070339 tid 4070444 thread 39 bound to OS proc set {63}
OMP: pid 4070339 tid 4070429 thread 24 bound to OS proc set {39}
OMP: pid 4070339 tid 4070422 thread 17 bound to OS proc set {27}
OMP: pid 4070339 tid 4070435 thread 30 bound to OS proc set {48}
OMP: pid 4070339 tid 4070434 thread 29 bound to OS proc set {47}
OMP: pid 4070339 tid 4070426 thread 21 bound to OS proc set {34}
OMP: pid 4070339 tid 4070432 thread 27 bound to OS proc set {43}
OMP: pid 4070339 tid 4070427 thread 22 bound to OS proc set {35}
OMP: pid 4070339 tid 4070430 thread 25 bound to OS proc set {40}
OMP: pid 4070339 tid 4070431 thread 26 bound to OS proc set {42}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 40, "n_threads_batch": 40, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 8.236768, "speed_pp": 62.160305, "t_tg": 0.000000, "speed_tg": nan, "t": 8.236768, "speed": 62.160305}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_7  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4079083 tid 4079083 thread 0 bound to OS proc set {0}
OMP: pid 4079083 tid 4079151 thread 2 bound to OS proc set {2}
OMP: pid 4079083 tid 4079150 thread 1 bound to OS proc set {1}
OMP: pid 4079083 tid 4079161 thread 12 bound to OS proc set {16}
OMP: pid 4079083 tid 4079162 thread 13 bound to OS proc set {17}
OMP: pid 4079083 tid 4079152 thread 3 bound to OS proc set {4}
OMP: pid 4079083 tid 4079165 thread 16 bound to OS proc set {21}
OMP: pid 4079083 tid 4079177 thread 28 bound to OS proc set {37}
OMP: pid 4079083 tid 4079155 thread 6 bound to OS proc set {8}
OMP: pid 4079083 tid 4079164 thread 15 bound to OS proc set {20}
OMP: pid 4079083 tid 4079183 thread 34 bound to OS proc set {46}
OMP: pid 4079083 tid 4079176 thread 27 bound to OS proc set {36}
OMP: pid 4079083 tid 4079153 thread 4 bound to OS proc set {5}
OMP: pid 4079083 tid 4079163 thread 14 bound to OS proc set {18}
OMP: pid 4079083 tid 4079154 thread 5 bound to OS proc set {6}
OMP: pid 4079083 tid 4079173 thread 24 bound to OS proc set {32}
OMP: pid 4079083 tid 4079196 thread 47 bound to OS proc set {63}
OMP: pid 4079083 tid 4079157 thread 8 bound to OS proc set {10}
OMP: pid 4079083 tid 4079156 thread 7 bound to OS proc set {9}
OMP: pid 4079083 tid 4079158 thread 9 bound to OS proc set {12}
OMP: pid 4079083 tid 4079175 thread 26 bound to OS proc set {35}
OMP: pid 4079083 tid 4079160 thread 11 bound to OS proc set {14}
OMP: pid 4079083 tid 4079159 thread 10 bound to OS proc set {13}
OMP: pid 4079083 tid 4079184 thread 35 bound to OS proc set {47}
OMP: pid 4079083 tid 4079195 thread 46 bound to OS proc set {62}
OMP: pid 4079083 tid 4079168 thread 19 bound to OS proc set {25}
OMP: pid 4079083 tid 4079182 thread 33 bound to OS proc set {44}
OMP: pid 4079083 tid 4079180 thread 31 bound to OS proc set {41}
OMP: pid 4079083 tid 4079167 thread 18 bound to OS proc set {24}
OMP: pid 4079083 tid 4079192 thread 43 bound to OS proc set {58}
OMP: pid 4079083 tid 4079193 thread 44 bound to OS proc set {59}
OMP: pid 4079083 tid 4079189 thread 40 bound to OS proc set {54}
OMP: pid 4079083 tid 4079169 thread 20 bound to OS proc set {27}
OMP: pid 4079083 tid 4079179 thread 30 bound to OS proc set {40}
OMP: pid 4079083 tid 4079185 thread 36 bound to OS proc set {48}
OMP: pid 4079083 tid 4079171 thread 22 bound to OS proc set {29}
OMP: pid 4079083 tid 4079174 thread 25 bound to OS proc set {33}
OMP: pid 4079083 tid 4079172 thread 23 bound to OS proc set {31}
OMP: pid 4079083 tid 4079178 thread 29 bound to OS proc set {39}
OMP: pid 4079083 tid 4079181 thread 32 bound to OS proc set {43}
OMP: pid 4079083 tid 4079166 thread 17 bound to OS proc set {23}
OMP: pid 4079083 tid 4079186 thread 37 bound to OS proc set {50}
OMP: pid 4079083 tid 4079187 thread 38 bound to OS proc set {51}
OMP: pid 4079083 tid 4079194 thread 45 bound to OS proc set {60}
OMP: pid 4079083 tid 4079191 thread 42 bound to OS proc set {56}
OMP: pid 4079083 tid 4079170 thread 21 bound to OS proc set {28}
OMP: pid 4079083 tid 4079190 thread 41 bound to OS proc set {55}
OMP: pid 4079083 tid 4079188 thread 39 bound to OS proc set {52}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 48, "n_threads_batch": 48, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 8.329027, "speed_pp": 61.471764, "t_tg": 0.000000, "speed_tg": nan, "t": 8.329027, "speed": 61.471764}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_8  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4089603 tid 4089603 thread 0 bound to OS proc set {0}
OMP: pid 4089603 tid 4089672 thread 3 bound to OS proc set {3}
OMP: pid 4089603 tid 4089671 thread 2 bound to OS proc set {2}
OMP: pid 4089603 tid 4089670 thread 1 bound to OS proc set {1}
OMP: pid 4089603 tid 4089673 thread 4 bound to OS proc set {4}
OMP: pid 4089603 tid 4089675 thread 6 bound to OS proc set {6}
OMP: pid 4089603 tid 4089674 thread 5 bound to OS proc set {5}
OMP: pid 4089603 tid 4089680 thread 11 bound to OS proc set {12}
OMP: pid 4089603 tid 4089684 thread 15 bound to OS proc set {17}
OMP: pid 4089603 tid 4089676 thread 7 bound to OS proc set {8}
OMP: pid 4089603 tid 4089681 thread 12 bound to OS proc set {13}
OMP: pid 4089603 tid 4089720 thread 51 bound to OS proc set {59}
OMP: pid 4089603 tid 4089697 thread 28 bound to OS proc set {32}
OMP: pid 4089603 tid 4089679 thread 10 bound to OS proc set {11}
OMP: pid 4089603 tid 4089717 thread 48 bound to OS proc set {55}
OMP: pid 4089603 tid 4089696 thread 27 bound to OS proc set {31}
OMP: pid 4089603 tid 4089719 thread 50 bound to OS proc set {58}
OMP: pid 4089603 tid 4089682 thread 13 bound to OS proc set {15}
OMP: pid 4089603 tid 4089683 thread 14 bound to OS proc set {16}
OMP: pid 4089603 tid 4089678 thread 9 bound to OS proc set {10}
OMP: pid 4089603 tid 4089718 thread 49 bound to OS proc set {56}
OMP: pid 4089603 tid 4089685 thread 16 bound to OS proc set {18}
OMP: pid 4089603 tid 4089677 thread 8 bound to OS proc set {9}
OMP: pid 4089603 tid 4089713 thread 44 bound to OS proc set {51}
OMP: pid 4089603 tid 4089721 thread 52 bound to OS proc set {60}
OMP: pid 4089603 tid 4089701 thread 32 bound to OS proc set {37}
OMP: pid 4089603 tid 4089700 thread 31 bound to OS proc set {35}
OMP: pid 4089603 tid 4089704 thread 35 bound to OS proc set {40}
OMP: pid 4089603 tid 4089716 thread 47 bound to OS proc set {54}
OMP: pid 4089603 tid 4089723 thread 54 bound to OS proc set {62}
OMP: pid 4089603 tid 4089695 thread 26 bound to OS proc set {30}
OMP: pid 4089603 tid 4089699 thread 30 bound to OS proc set {34}
OMP: pid 4089603 tid 4089694 thread 25 bound to OS proc set {29}
OMP: pid 4089603 tid 4089714 thread 45 bound to OS proc set {52}
OMP: pid 4089603 tid 4089712 thread 43 bound to OS proc set {49}
OMP: pid 4089603 tid 4089686 thread 17 bound to OS proc set {19}
OMP: pid 4089603 tid 4089698 thread 29 bound to OS proc set {33}
OMP: pid 4089603 tid 4089703 thread 34 bound to OS proc set {39}
OMP: pid 4089603 tid 4089707 thread 38 bound to OS proc set {44}
OMP: pid 4089603 tid 4089724 thread 55 bound to OS proc set {63}
OMP: pid 4089603 tid 4089715 thread 46 bound to OS proc set {53}
OMP: pid 4089603 tid 4089705 thread 36 bound to OS proc set {41}
OMP: pid 4089603 tid 4089688 thread 19 bound to OS proc set {22}
OMP: pid 4089603 tid 4089687 thread 18 bound to OS proc set {20}
OMP: pid 4089603 tid 4089711 thread 42 bound to OS proc set {48}
OMP: pid 4089603 tid 4089706 thread 37 bound to OS proc set {42}
OMP: pid 4089603 tid 4089692 thread 23 bound to OS proc set {26}
OMP: pid 4089603 tid 4089708 thread 39 bound to OS proc set {45}
OMP: pid 4089603 tid 4089691 thread 22 bound to OS proc set {25}
OMP: pid 4089603 tid 4089702 thread 33 bound to OS proc set {38}
OMP: pid 4089603 tid 4089710 thread 41 bound to OS proc set {47}
OMP: pid 4089603 tid 4089709 thread 40 bound to OS proc set {46}
OMP: pid 4089603 tid 4089722 thread 53 bound to OS proc set {61}
OMP: pid 4089603 tid 4089690 thread 21 bound to OS proc set {24}
OMP: pid 4089603 tid 4089693 thread 24 bound to OS proc set {27}
OMP: pid 4089603 tid 4089689 thread 20 bound to OS proc set {23}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 56, "n_threads_batch": 56, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 8.353947, "speed_pp": 61.288399, "t_tg": 0.000000, "speed_tg": nan, "t": 8.353947, "speed": 61.288399}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9

To display your profiling results:
#######################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                               COMMAND                                                                                                #
#######################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_9  #
#######################################################################################################################################################################################################################################


* [MAQAO] Info: Detected 1 Lprof instances in ip-172-31-46-37.ec2.internal. 
If this is incorrect, rerun with number-processes-per-node=X
OMP: pid 4101947 tid 4101947 thread 0 bound to OS proc set {0}
OMP: pid 4101947 tid 4102016 thread 3 bound to OS proc set {3}
OMP: pid 4101947 tid 4102028 thread 15 bound to OS proc set {15}
OMP: pid 4101947 tid 4102025 thread 12 bound to OS proc set {12}
OMP: pid 4101947 tid 4102014 thread 1 bound to OS proc set {1}
OMP: pid 4101947 tid 4102024 thread 11 bound to OS proc set {11}
OMP: pid 4101947 tid 4102021 thread 8 bound to OS proc set {8}
OMP: pid 4101947 tid 4102027 thread 14 bound to OS proc set {14}
OMP: pid 4101947 tid 4102015 thread 2 bound to OS proc set {2}
OMP: pid 4101947 tid 4102023 thread 10 bound to OS proc set {10}
OMP: pid 4101947 tid 4102026 thread 13 bound to OS proc set {13}
OMP: pid 4101947 tid 4102022 thread 9 bound to OS proc set {9}
OMP: pid 4101947 tid 4102017 thread 4 bound to OS proc set {4}
OMP: pid 4101947 tid 4102019 thread 6 bound to OS proc set {6}
OMP: pid 4101947 tid 4102032 thread 19 bound to OS proc set {19}
OMP: pid 4101947 tid 4102029 thread 16 bound to OS proc set {16}
OMP: pid 4101947 tid 4102031 thread 18 bound to OS proc set {18}
OMP: pid 4101947 tid 4102018 thread 5 bound to OS proc set {5}
OMP: pid 4101947 tid 4102037 thread 24 bound to OS proc set {24}
OMP: pid 4101947 tid 4102020 thread 7 bound to OS proc set {7}
OMP: pid 4101947 tid 4102030 thread 17 bound to OS proc set {17}
OMP: pid 4101947 tid 4102039 thread 26 bound to OS proc set {26}
OMP: pid 4101947 tid 4102036 thread 23 bound to OS proc set {23}
OMP: pid 4101947 tid 4102038 thread 25 bound to OS proc set {25}
OMP: pid 4101947 tid 4102033 thread 20 bound to OS proc set {20}
OMP: pid 4101947 tid 4102045 thread 32 bound to OS proc set {32}
OMP: pid 4101947 tid 4102035 thread 22 bound to OS proc set {22}
OMP: pid 4101947 tid 4102048 thread 35 bound to OS proc set {35}
OMP: pid 4101947 tid 4102061 thread 48 bound to OS proc set {48}
OMP: pid 4101947 tid 4102068 thread 55 bound to OS proc set {55}
OMP: pid 4101947 tid 4102034 thread 21 bound to OS proc set {21}
OMP: pid 4101947 tid 4102059 thread 46 bound to OS proc set {46}
OMP: pid 4101947 tid 4102072 thread 59 bound to OS proc set {59}
OMP: pid 4101947 tid 4102043 thread 30 bound to OS proc set {30}
OMP: pid 4101947 tid 4102062 thread 49 bound to OS proc set {49}
OMP: pid 4101947 tid 4102063 thread 50 bound to OS proc set {50}
OMP: pid 4101947 tid 4102056 thread 43 bound to OS proc set {43}
OMP: pid 4101947 tid 4102064 thread 51 bound to OS proc set {51}
OMP: pid 4101947 tid 4102060 thread 47 bound to OS proc set {47}
OMP: pid 4101947 tid 4102047 thread 34 bound to OS proc set {34}
OMP: pid 4101947 tid 4102044 thread 31 bound to OS proc set {31}
OMP: pid 4101947 tid 4102067 thread 54 bound to OS proc set {54}
OMP: pid 4101947 tid 4102071 thread 58 bound to OS proc set {58}
OMP: pid 4101947 tid 4102042 thread 29 bound to OS proc set {29}
OMP: pid 4101947 tid 4102058 thread 45 bound to OS proc set {45}
OMP: pid 4101947 tid 4102069 thread 56 bound to OS proc set {56}
OMP: pid 4101947 tid 4102046 thread 33 bound to OS proc set {33}
OMP: pid 4101947 tid 4102070 thread 57 bound to OS proc set {57}
OMP: pid 4101947 tid 4102040 thread 27 bound to OS proc set {27}
OMP: pid 4101947 tid 4102065 thread 52 bound to OS proc set {52}
OMP: pid 4101947 tid 4102073 thread 60 bound to OS proc set {60}
OMP: pid 4101947 tid 4102052 thread 39 bound to OS proc set {39}
OMP: pid 4101947 tid 4102057 thread 44 bound to OS proc set {44}
OMP: pid 4101947 tid 4102055 thread 42 bound to OS proc set {42}
OMP: pid 4101947 tid 4102041 thread 28 bound to OS proc set {28}
OMP: pid 4101947 tid 4102051 thread 38 bound to OS proc set {38}
OMP: pid 4101947 tid 4102049 thread 36 bound to OS proc set {36}
OMP: pid 4101947 tid 4102053 thread 40 bound to OS proc set {40}
OMP: pid 4101947 tid 4102066 thread 53 bound to OS proc set {53}
OMP: pid 4101947 tid 4102074 thread 61 bound to OS proc set {61}
OMP: pid 4101947 tid 4102075 thread 62 bound to OS proc set {62}
OMP: pid 4101947 tid 4102054 thread 41 bound to OS proc set {41}
OMP: pid 4101947 tid 4102050 thread 37 bound to OS proc set {37}
OMP: pid 4101947 tid 4102076 thread 63 bound to OS proc set {63}
{"n_kv_max": 16384, "n_batch": 2048, "n_ubatch": 512, "flash_attn": -1, "is_pp_shared": 0, "n_gpu_layers": -1, "n_threads": 64, "n_threads_batch": 64, "pp": 128, "tg": 0, "pl": 4, "n_kv": 512, "t_pp": 11.234514, "speed_pp": 45.573845, "t_tg": 0.000000, "speed_tg": nan, "t": 11.234514, "speed": 45.573845}





Your experiment path is /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10

To display your profiling results:
########################################################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                                COMMAND                                                                                                #
########################################################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/run/oneview_runs/multicore/armclang/maqao_2025-11-26_15-26-08/tools/lprof_npsu_run_10  #
########################################################################################################################################################################################################################################

×