options

Loops Index

Columns Filter

Level Max Thread Time / Walltime 1x6 (%) Max Thread Time / Walltime 1x72 (%) Max Thread Time / Walltime 1x96 (%) Max Thread Time / Walltime 1x120 (%) Max Thread Time / Walltime 1x128 (%) Max Thread Time / Walltime 1x144 (%) Max Thread Time / Walltime 1x168 (%) Max Thread Time / Walltime 1x192 (%) Exclusive Coverage 1x6 (%) Exclusive Coverage 1x72 (%) Exclusive Coverage 1x96 (%) Exclusive Coverage 1x120 (%) Exclusive Coverage 1x128 (%) Exclusive Coverage 1x144 (%) Exclusive Coverage 1x168 (%) Exclusive Coverage 1x192 (%) Inclusive Coverage 1x6 (%) Inclusive Coverage 1x72 (%) Inclusive Coverage 1x96 (%) Inclusive Coverage 1x120 (%) Inclusive Coverage 1x128 (%) Inclusive Coverage 1x144 (%) Inclusive Coverage 1x168 (%) Inclusive Coverage 1x192 (%) Max Exclusive Time Over Threads 1x6 (s) Max Exclusive Time Over Threads 1x72 (s) Max Exclusive Time Over Threads 1x96 (s) Max Exclusive Time Over Threads 1x120 (s) Max Exclusive Time Over Threads 1x128 (s) Max Exclusive Time Over Threads 1x144 (s) Max Exclusive Time Over Threads 1x168 (s) Max Exclusive Time Over Threads 1x192 (s) Max Inclusive Time Over Threads 1x6 (s) Max Inclusive Time Over Threads 1x72 (s) Max Inclusive Time Over Threads 1x96 (s) Max Inclusive Time Over Threads 1x120 (s) Max Inclusive Time Over Threads 1x128 (s) Max Inclusive Time Over Threads 1x144 (s) Max Inclusive Time Over Threads 1x168 (s) Max Inclusive Time Over Threads 1x192 (s) Exclusive Time w.r.t. Wall Time 1x6 (s) Exclusive Time w.r.t. Wall Time 1x72 (s) Exclusive Time w.r.t. Wall Time 1x96 (s) Exclusive Time w.r.t. Wall Time 1x120 (s) Exclusive Time w.r.t. Wall Time 1x128 (s) Exclusive Time w.r.t. Wall Time 1x144 (s) Exclusive Time w.r.t. Wall Time 1x168 (s) Exclusive Time w.r.t. Wall Time 1x192 (s) Inclusive Time w.r.t. Wall Time 1x6 (s) Inclusive Time w.r.t. Wall Time 1x72 (s) Inclusive Time w.r.t. Wall Time 1x96 (s) Inclusive Time w.r.t. Wall Time 1x120 (s) Inclusive Time w.r.t. Wall Time 1x128 (s) Inclusive Time w.r.t. Wall Time 1x144 (s) Inclusive Time w.r.t. Wall Time 1x168 (s) Inclusive Time w.r.t. Wall Time 1x192 (s) Nb Threads 1x6 Nb Threads 1x72 Nb Threads 1x96 Nb Threads 1x120 Nb Threads 1x128 Nb Threads 1x144 Nb Threads 1x168 Nb Threads 1x192 GFLOPS 1x6 GFLOPS 1x72 GFLOPS 1x96 GFLOPS 1x120 GFLOPS 1x128 GFLOPS 1x144 GFLOPS 1x168 GFLOPS 1x192 Vectorization Ratio (%) Vector Length Use (%) Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing 1x6 Speedup If Perfect Load Balancing 1x72 Speedup If Perfect Load Balancing 1x96 Speedup If Perfect Load Balancing 1x120 Speedup If Perfect Load Balancing 1x128 Speedup If Perfect Load Balancing 1x144 Speedup If Perfect Load Balancing 1x168 Speedup If Perfect Load Balancing 1x192 Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency (1x6) Efficiency (1x6) Potential Speed-Up (%) (1x72) Efficiency (1x72) Potential Speed-Up (%) (1x96) Efficiency (1x96) Potential Speed-Up (%) (1x120) Efficiency (1x120) Potential Speed-Up (%) (1x128) Efficiency (1x128) Potential Speed-Up (%) (1x144) Efficiency (1x144) Potential Speed-Up (%) (1x168) Efficiency (1x168) Potential Speed-Up (%) (1x192) Efficiency (1x192) Potential Speed-Up (%) Level Max Thread Time / Walltime Exclusive Coverage Inclusive Coverage Max Exclusive Time Over Threads Max Inclusive Time Over Threads Exclusive Time w.r.t. Wall Time Inclusive Time w.r.t. Wall Time Nb Threads GFLOPS Vectorization Ratio Vector Length Use Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency Efficiency Potential Speed-Up
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8
Loop idSource LocationSource FunctionLevelMax Thread Time / Walltime 1x6 (%)Max Thread Time / Walltime 1x72 (%)Max Thread Time / Walltime 1x96 (%)Max Thread Time / Walltime 1x120 (%)Max Thread Time / Walltime 1x128 (%)Max Thread Time / Walltime 1x144 (%)Max Thread Time / Walltime 1x168 (%)Max Thread Time / Walltime 1x192 (%)Exclusive Coverage 1x6 (%)Exclusive Coverage 1x72 (%)Exclusive Coverage 1x96 (%)Exclusive Coverage 1x120 (%)Exclusive Coverage 1x128 (%)Exclusive Coverage 1x144 (%)Exclusive Coverage 1x168 (%)Exclusive Coverage 1x192 (%)Inclusive Coverage 1x6 (%)Inclusive Coverage 1x72 (%)Inclusive Coverage 1x96 (%)Inclusive Coverage 1x120 (%)Inclusive Coverage 1x128 (%)Inclusive Coverage 1x144 (%)Inclusive Coverage 1x168 (%)Inclusive Coverage 1x192 (%)Max Exclusive Time Over Threads 1x6 (s)Max Exclusive Time Over Threads 1x72 (s)Max Exclusive Time Over Threads 1x96 (s)Max Exclusive Time Over Threads 1x120 (s)Max Exclusive Time Over Threads 1x128 (s)Max Exclusive Time Over Threads 1x144 (s)Max Exclusive Time Over Threads 1x168 (s)Max Exclusive Time Over Threads 1x192 (s)Max Inclusive Time Over Threads 1x6 (s)Max Inclusive Time Over Threads 1x72 (s)Max Inclusive Time Over Threads 1x96 (s)Max Inclusive Time Over Threads 1x120 (s)Max Inclusive Time Over Threads 1x128 (s)Max Inclusive Time Over Threads 1x144 (s)Max Inclusive Time Over Threads 1x168 (s)Max Inclusive Time Over Threads 1x192 (s)Exclusive Time w.r.t. Wall Time 1x6 (s)Exclusive Time w.r.t. Wall Time 1x72 (s)Exclusive Time w.r.t. Wall Time 1x96 (s)Exclusive Time w.r.t. Wall Time 1x120 (s)Exclusive Time w.r.t. Wall Time 1x128 (s)Exclusive Time w.r.t. Wall Time 1x144 (s)Exclusive Time w.r.t. Wall Time 1x168 (s)Exclusive Time w.r.t. Wall Time 1x192 (s)Inclusive Time w.r.t. Wall Time 1x6 (s)Inclusive Time w.r.t. Wall Time 1x72 (s)Inclusive Time w.r.t. Wall Time 1x96 (s)Inclusive Time w.r.t. Wall Time 1x120 (s)Inclusive Time w.r.t. Wall Time 1x128 (s)Inclusive Time w.r.t. Wall Time 1x144 (s)Inclusive Time w.r.t. Wall Time 1x168 (s)Inclusive Time w.r.t. Wall Time 1x192 (s)Nb Threads 1x6Nb Threads 1x72Nb Threads 1x96Nb Threads 1x120Nb Threads 1x128Nb Threads 1x144Nb Threads 1x168Nb Threads 1x192GFLOPS 1x6GFLOPS 1x72GFLOPS 1x96GFLOPS 1x120GFLOPS 1x128GFLOPS 1x144GFLOPS 1x168GFLOPS 1x192Vectorization Ratio (%)Vector Length Use (%)Speedup If No Scalar IntegerSpeedup If FP VectorizedSpeedup If Fully VectorizedSpeedup If Perfect Load Balancing 1x6Speedup If Perfect Load Balancing 1x72Speedup If Perfect Load Balancing 1x96Speedup If Perfect Load Balancing 1x120Speedup If Perfect Load Balancing 1x128Speedup If Perfect Load Balancing 1x144Speedup If Perfect Load Balancing 1x168Speedup If Perfect Load Balancing 1x192Stride 0Stride 1Stride nStride UnknownStride IndirectArray Access Efficiency(1x6) Efficiency(1x6) Potential Speed-Up (%)(1x72) Efficiency(1x72) Potential Speed-Up (%)(1x96) Efficiency(1x96) Potential Speed-Up (%)(1x120) Efficiency(1x120) Potential Speed-Up (%)(1x128) Efficiency(1x128) Potential Speed-Up (%)(1x144) Efficiency(1x144) Potential Speed-Up (%)(1x168) Efficiency(1x168) Potential Speed-Up (%)(1x192) Efficiency(1x192) Potential Speed-Up (%)
3017libggml-cpu.so - sgemm.cpp:144-464 [...]void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 4>(long, long, long)Innermost2.400.000.000.000.000.000.000.003.140.000.000.000.000.000.000.003.140.000.000.000.000.000.000.001.300.000.000.000.000.000.000.001.300.000.000.000.000.000.000.001.140.000.000.000.000.000.000.001.140.000.000.000.000.000.000.0060000000191.860.000.000.000.000.000.000.00NANANANANA1.210000000NANANANANA0.0010
3018libggml-cpu.so - sgemm.cpp:144-464 [...]void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 4>(long, long, long)InBetween0.960.000.000.000.000.000.000.001.150.000.000.000.000.000.000.004.290.000.000.000.000.000.000.000.520.000.000.000.000.000.000.001.540.000.000.000.000.000.000.000.420.000.000.000.000.000.000.001.550.000.000.000.000.000.000.0060000000172.350.000.000.000.000.000.000.0076.0829.631.061.492.561.3100000000003050.0010
3325libggml-cpu.so - quants.c:298-355 [...]quantize_row_q8_0Single3.263.513.643.693.513.683.243.270.850.080.060.050.040.040.030.030.850.080.060.050.040.040.030.031.761.741.741.751.711.791.621.261.761.741.741.751.711.791.621.260.310.020.020.010.010.010.010.010.310.020.020.010.010.010.010.011111111117.39220.32291.82363.08397.86428.10551.69813.2760.729.6611.312.681111111102000100.00101.05-01.05-01.04-01.07-01.02-01.13-01.450
2078libggml-cpu.so - vec.h:89-89ggml_compute_forward_soft_maxInnermost0.620.200.220.180.170.200.250.180.760.110.160.140.130.140.170.110.760.110.160.140.130.140.170.110.340.100.110.090.080.100.130.070.340.100.110.090.080.100.130.070.270.030.050.040.040.040.050.020.270.030.050.040.040.040.050.02672961201281441681902.3418.5310.2913.1213.7510.958.6919.65100501121.292.932.222.132.222.372.372.9202000100.00100.660.040.360.10.340.090.330.080.270.10.180.140.360.07
2064libggml-cpu.so - vec.h:1444-1445ggml_compute_forward_soft_maxInnermost0.540.100.080.070.070.060.060.100.640.070.050.040.040.030.030.030.640.070.050.040.040.030.030.030.290.050.040.040.040.030.030.040.290.050.040.040.040.030.030.040.230.020.020.010.010.010.010.010.230.020.020.010.010.010.010.016729612012814416719210.1195.46148.08162.48189.18191.24199.96312.0306.2511161.322.352.523.183.252.843.175.7701000100.00100.890.010.90.011.04-00.9900.900.8701.03-0
386libggml-cpu.so - mmq.cpp:303-1392 [...]void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int,...Innermost0.480.130.130.110.100.120.100.100.540.110.090.080.080.070.050.080.540.110.090.080.080.070.050.080.260.060.060.050.050.060.050.040.260.060.060.050.050.060.050.040.200.030.030.020.030.020.020.020.200.030.030.020.030.020.020.02669901121281281471750.000.000.000.000.000.000.000.0090.9138.761.4711.411.41.92.032.081.952.392.562.323009085.94100.490.050.440.050.430.040.350.050.360.050.40.030.380.05
6libggml-cpu.so - ggml-cpu.c:3204-3207ggml_cpu_fp32_to_fp16Single0.500.000.000.000.000.000.000.000.510.000.000.000.000.000.000.000.510.000.000.000.000.000.000.000.270.000.000.000.000.000.000.000.270.000.000.000.000.000.000.000.180.000.000.000.000.000.000.000.180.000.000.000.000.000.000.00600000000.100.000.000.000.000.000.000.001001001111.54000000002000100.0010
913libggml-cpu.so - binary-ops.cpp:18-32 [...]ggml_compute_forward_mulInnermost0.340.110.070.090.070.070.070.090.410.060.040.040.040.030.030.040.410.060.040.040.040.030.030.040.190.050.040.050.040.040.040.040.190.050.040.050.040.040.040.040.150.020.010.010.010.010.010.010.150.020.010.010.010.010.010.01664861031281281281713.5026.7339.5144.5943.5847.7271.6960.7306.2511.06161.322.72.433.313.132.942.943.6803000100.00100.670.020.70.010.620.010.610.010.570.010.570.010.540.02
1303libggml-cpu.so - vec.h:1084-1101 [...]ggml_vec_soft_max_f32Single0.440.080.070.070.050.050.060.050.400.040.030.030.020.020.020.020.400.040.030.030.020.020.020.020.240.040.040.040.030.030.030.020.240.040.040.040.030.030.030.020.140.010.010.010.010.010.010.000.140.010.010.010.010.010.010.0067296120128144168192148.141900.532186.132619.123127.133109.823231.275215.9088.5875.121.051.051.231.763.623.544.573.583.674.545.090.5002058.33101.07-00.900.9300.9500.8700.7701.13-0
1299libggml-cpu.so - vec.h:1084-1115 [...]ggml_vec_swiglu_f32Single0.310.080.090.120.140.080.100.090.370.040.050.040.070.040.040.060.370.040.050.040.070.040.040.060.170.040.050.050.070.040.050.040.170.040.050.050.070.040.050.040.130.010.010.010.020.010.010.010.130.010.010.010.020.010.010.0166486103128128128171143.261397.201362.901575.09916.891461.191638.381410.769898.131111.332.672.974.063.352.813.32.280.5003056.25100.830.010.610.020.570.020.30.050.440.020.410.020.30.05
815libggml-cpu.so - binary-ops.cpp:10-32 [...]ggml_compute_forward_add_non_quantizedInnermost0.310.090.080.070.100.120.090.120.320.050.050.040.070.060.040.060.320.050.050.040.070.060.040.060.170.050.040.040.050.060.050.050.170.050.040.040.050.060.050.050.120.020.010.010.020.020.010.010.120.020.010.010.020.020.010.01664861031281281261704.4933.1236.6341.6623.1927.2945.1542.1806.2511.06161.52.672.492.422.322.862.573.4403000100.00100.630.020.50.020.460.020.250.050.260.050.310.030.310.04
2075libggml-cpu.so - vec.h:677-682ggml_compute_forward_soft_maxInnermost0.290.070.090.110.080.070.100.080.320.050.060.050.040.040.050.030.320.050.060.050.040.040.050.030.160.040.040.050.040.040.050.030.160.040.040.050.040.040.050.030.120.020.020.010.010.010.010.010.120.020.020.010.010.010.010.01670941171171291521523.9642.9028.2040.1836.6242.3036.92104.151001001111.422.292.593.452.932.753.063.9101000100.00100.630.020.420.030.40.030.430.020.420.020.270.030.590.01
1820libggml-cpu.so - ops.cpp:4325-4326ggml_compute_forward_rms_normInnermost0.250.100.100.110.100.080.080.100.290.080.070.070.070.050.050.060.290.080.070.070.070.050.050.060.130.050.050.050.050.040.040.040.130.050.050.050.050.040.040.040.100.020.020.020.020.020.020.010.100.020.020.020.020.020.020.016729612012814416819210.1241.8748.2148.1649.7761.4859.6764.8110031.2511.382.931.362.142.412.442.342.42.62.9401000100.00100.370.050.310.050.250.050.230.050.260.040.240.040.240.05
2128libggml-cpu.so - ops.cpp:6446-6457ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool)Innermost0.220.050.050.080.060.060.090.060.200.020.030.020.030.030.020.020.200.020.030.020.030.030.020.020.120.020.030.040.030.030.050.030.120.020.030.040.030.030.050.030.070.010.010.010.010.010.010.000.070.010.010.010.010.010.010.006729612012814316319113.15142.12107.00130.34114.91117.83126.10227.5006.2511.5161.783.833.065.713.673.885.626.20202075.00100.8900.540.010.50.010.40.020.380.020.320.020.550.01
2069libggml-cpu.so - ops.cpp:5915-5916ggml_compute_forward_soft_maxInnermost0.280.060.080.040.040.030.050.030.190.020.020.010.010.010.010.010.190.020.020.010.010.010.010.010.150.030.040.020.020.020.030.010.150.030.040.020.020.020.030.010.070.010.010.000.000.000.000.000.070.010.010.000.000.000.000.006729612012714416816418.22275.02217.14443.68387.55509.20440.91928.641005011.1222.255.766.984.954.884.86.945.7502000100.00101.11-00.7500.8600.800.9200.6901.46-0
1libggml-cpu.so - ggml-cpu.c:3215-3218ggml_cpu_fp32_to_fp16Single0.160.010.010.010.020.020.010.010.150.000.000.000.000.000.000.000.150.000.000.000.000.000.000.000.080.010.010.010.010.010.010.010.080.010.010.010.010.010.010.010.050.000.000.000.000.000.000.000.050.000.000.000.000.000.000.0053513213110150.013.540.000.001.991.960.000.00100251.17141.421.5112.212.141102000100.001031.1012.4504.7903.2802.15-06.24-04.160
3015libggml-cpu.so - sgemm.cpp:144-407 [...]void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 4>(long, long, long)Innermost0.140.000.000.000.000.000.000.000.140.000.000.000.000.000.000.000.140.000.000.000.000.000.000.000.080.000.000.000.000.000.000.000.080.000.000.000.000.000.000.000.050.000.000.000.000.000.000.000.050.000.000.000.000.000.000.0060000000175.850.000.000.000.000.000.000.00NANANANANA1.550000000NANANANANA0.0010
712libggml-cpu.so - mmq.cpp:2068-2078ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::$_2::operator()(int, int) const::{lambda()#1}::operator()() constSingle0.120.070.080.070.100.080.120.060.140.050.050.040.050.040.040.040.140.050.050.040.050.040.040.040.060.040.040.040.050.040.060.030.060.040.040.040.050.040.060.030.050.010.020.010.020.010.010.010.050.010.020.010.020.010.010.01670911121241371501589.7626.4119.0328.5724.7024.2036.9039.04NANANANANA1.392.382.552.633.032.914.812.61NANANANANA0.00100.280.030.20.040.20.030.140.040.150.040.160.030.190.03
4libggml-cpu.so - ggml-impl.h:346-404 [...]ggml_cpu_fp32_to_fp16Single0.091.351.341.230.210.250.220.230.101.060.590.230.130.120.110.130.101.060.590.230.130.120.110.130.050.670.640.580.100.120.110.090.050.670.640.580.100.120.110.090.040.340.180.070.040.040.030.030.040.340.180.070.040.040.030.03672961201281371421552.600.841.141.422.092.671.141.869.098.521.463.7513.881.462.013.668.762.483.022.692.7302000100.00100.011.050.010.580.030.220.040.130.040.120.040.110.040.12
97libggml-cpu.so - ggml-cpu.c:1291-1297ggml_compute_forward_mul_matInnermost0.011.371.811.701.591.611.691.590.010.311.011.341.401.191.021.160.010.311.011.341.401.191.021.160.010.680.870.810.780.780.850.610.010.680.870.810.780.780.850.610.000.100.310.400.430.370.330.250.000.100.310.400.430.370.330.25672961201281421641873.232.060.930.990.941.161.081.74011.38112.4626.992.882.041.812.142.562.46NANANANANA0.001000.3101.0101.3401.401.1901.0201.16
3066libggml-cpu.so - sgemm.cpp:144-464 [...]void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)InBetween0.000.140.140.130.090.090.090.130.000.110.080.070.060.060.050.060.001.130.940.700.570.530.510.390.000.070.060.060.050.040.050.050.000.480.420.330.280.260.280.140.000.030.030.020.020.020.020.010.000.360.290.210.180.160.160.08072961201281441681920.002223.483052.763492.723849.604319.664560.015584.7576.3829.71.061.492.5302.12.642.862.382.652.813.860002050.0010101010101010
3065libggml-cpu.so - sgemm.cpp:144-464 [...]void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)Innermost0.000.920.790.640.500.460.480.310.001.020.860.630.510.470.460.330.001.020.860.630.510.470.460.330.000.460.380.300.250.230.240.120.000.460.380.300.250.230.240.120.000.320.260.190.160.140.150.070.000.320.260.190.160.140.150.07072961201281441681920.00661.20817.941142.071363.301482.561453.183101.69NANANANANA01.431.481.641.571.571.651.74NANANANANA0.0010101010101010
×