천천히, 제대로: Intel MKL 예제를 Microsoft Visual C++로 빌드하기

Intel MKL 예제를 Microsoft Visual C++로 빌드하기

인텔 프로세서 시스템에서 아래의 영역에 해당하는 수학 계산을 빠르게 수행하고자 한다면 Intel MKL 라이브러리를 사용할 수 있습니다.

Linear Algebra
Fast Fourier Transforms (FFT)
Vector Statistics & Data Fitting
Vector Math & Miscellaneous Solvers

이 문서는 Intel MKL이 제공하는 예제 파일을 Microsoft Visual C++ 로 컴파일하고 링크하여 실행 파일을 만드는 과정을 소개합니다.

빌드 환경

다음은 이 문서를 작성하는 과정에서 Intel MKL 예제를 빌드하기 위하여 사용한 환경입니다.

시스템

운영체제: Windows 10 (64비트)
프로세서: Intel Core i7

설치 제품

IDE: Microsoft Visual Studio Community 2019 (version 16)
라이브러리: Intel Math Kernel Library 2019 Update 5

환경 변수

명령 프롬프트 창을 엽니다.

아래 스크립트를 실행하여 환경 변수INCLUDE, LIB, 그리고 PATH를 설정합니다.

@echo off

set CPRO_PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows
set MKLROOT=%CPRO_PATH%\mkl
set REDIST=%CPRO_PATH%\redist

set INCLUDE=%MKLROOT%\include;%INCLUDE%

set LIB=%MKLROOT%\lib\intel64;%LIB%
set PATH=%REDIST%\intel64\mkl;%PATH%

REM for OpenMP intel thread
set LIB=%CPRO_PATH%\compiler\lib\intel64;%LIB%
set PATH=%REDIST%\intel64\compiler;%PATH%

REM for TBB thread
set LIB=%CPRO_PATH%\tbb\lib\intel64\vc_mt;%LIB%
set PATH=%REDIST%\intel64\tbb\vc_mt;%PATH%

call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64

아래 스크립트를 실행하여 Visual C++ 사용 환경을 만듭니다. 64-bit로 빌드하기 위하여 vcvarsall.bat의 인자로 amd64를 지정합니다. 인자를 지정하지 않으면 32-bit로 빌드할 것입니다.
```
@echo off

call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
```

예제-1. ex_nslqp_c

아래 위치에 있는 예제 압축 파일을 작업하고자 하는 폴더 <working folder>에 풉니다.
```
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\examples\examples_core_c.zip
```
명령 프롬프트 창에서 아래의 폴더로 이동합니다.
```
<working folder>\solverc
```

아래 스크립트를 실행하여 ex_nlsqp_c 예제를 빌드합니다.

@echo off

nmake dllintel64 threading=parallel compiler=msvs function="ex_nlsqp_c"
nmake libintel64 threading=parallel compiler=msvs function="ex_nlsqp_c"
nmake dllintel64 threading=sequential compiler=msvs function="ex_nlsqp_c"
nmake libintel64 threading=sequential compiler=msvs function="ex_nlsqp_c"

아래 폴더에서 빌드 결과로 ex_nlsqp_c.exe 파일이 생성되었는지 확인합니다.

_results\msvs_lp64_parallel_intel64
_results\msvs_lp64_parallel_intel64_dll
_results\msvs_lp64_sequential_intel64
_results\msvs_lp64_sequential_intel64_dll

ex_nlsqp_c.exe 파일을 실행하면 아래와 같은 메시지가 표시됩니다.

>_results\msvs_lp64_sequential_intel64_dll\ex_nlsqp_c.exe
|         dtrnlsp Powell............PASS

예제-2. matrix_multiplication

아래 사이트를 방문하여 예제 압축 파일을 다운로드하고 작업하고자 하는 폴더 <working folder>에 풉니다.
- Intel® Software Development Products Samples and Tutorials
  - Intel® Parallel Studio XE for Windows* - Sample Bundle (Version: Intel® Parallel Studio XE 2019)

명령 프롬프트 창에서 아래의 폴더로 이동합니다.
```
<working folder>\mkl\mkl_c_samples\matrix_multiplication\src
```

FOR 루프 계산

아래 스크립트를 실행하여 matrix_multiplication 예제를 빌드합니다.

@echo off

set THREADING=sequential
set DLL_SUFF=
set EXAMPLE=matrix_multiplication
set OMP_LIB=
cl.exe /c %EXAMPLE%.c
link.exe mkl_intel_lp64%DLL_SUFF%.lib mkl_core%DLL_SUFF%.lib mkl_%THREADING%%DLL_SUFF%.lib %OMP_LIB% %EXAMPLE%.obj

현재 폴더에 matrix_multiplication.exe 파일이 생성되었는지 확인합니다.

matrix_multiplication.exe 파일을 실행하면 아래와 같은 메시지가 표시됩니다.

>matrix_multiplication.exe
 
 This example measures performance of rcomputing the real matrix product
 C=alpha*A*B+beta*C using a triple nested loop, where A, B, and C are
 matrices and alpha and beta are double precision scalars

 Initializing data for matrix multiplication C=A*B for matrix
 A(2000x200) and matrix B(200x1000)

 Allocating memory for matrices aligned on 64-byte boundary for better
 performance

 Intializing matrix data

 Making the first run of matrix product using triple nested loop
 to get stable run time measurements

 Measuring performance of matrix product using triple nested loop

 == Matrix multiplication using triple nested loop completed ==
 == at 1219.72656 milliseconds ==

 Deallocating memory

 Example completed.

계산 소요 시간:

1219.72656 milliseconds

CBLAS API + 멀티쓰레드 계산

아래 스크립트를 실행하여 dgemm_threading_effect_example 예제를 빌드합니다.

@echo off

set THREADING=intel_thread
set DLL_SUFF=
set EXAMPLE=dgemm_threading_effect_example
set OMP_LIB=libiomp5md.lib
cl.exe /c %EXAMPLE%.c
link.exe mkl_intel_lp64%DLL_SUFF%.lib mkl_core%DLL_SUFF%.lib mkl_%THREADING%%DLL_SUFF%.lib %OMP_LIB% %EXAMPLE%.obj

현재 폴더에 dgemm_threading_effect_example.exe 파일이 생성되었는지 확인합니다.

dgemm_threading_effect_example.exe 파일을 실행하면 아래와 같은 메시지가 표시됩니다.

>dgemm_threading_effect_example.exe

 This example demonstrates threading impact on computing real matrix product
 C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are
 matrices and alpha and beta are double precision scalars

 Initializing data for matrix multiplication C=A*B for matrix
 A(2000x200) and matrix B(200x1000)

 Allocating memory for matrices aligned on 64-byte boundary for better
 performance

 Intializing matrix data

 Finding max number of threads Intel(R) MKL can use for parallel runs

 Running Intel(R) MKL from 1 to 4 threads

 Requesting Intel(R) MKL to use 1 thread(s)

 Making the first run of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface to get stable run time measurements

 Measuring performance of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface on 1 thread(s)

 == Matrix multiplication using Intel(R) MKL dgemm completed ==
 == at 17.27277 milliseconds using 1 thread(s) ==

 Requesting Intel(R) MKL to use 2 thread(s)

 Making the first run of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface to get stable run time measurements

 Measuring performance of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface on 2 thread(s)

 == Matrix multiplication using Intel(R) MKL dgemm completed ==
 == at 9.67240 milliseconds using 2 thread(s) ==

 Requesting Intel(R) MKL to use 3 thread(s)

 Making the first run of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface to get stable run time measurements

 Measuring performance of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface on 3 thread(s)

 == Matrix multiplication using Intel(R) MKL dgemm completed ==
 == at 8.70466 milliseconds using 3 thread(s) ==

 Requesting Intel(R) MKL to use 4 thread(s)

 Making the first run of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface to get stable run time measurements

 Measuring performance of matrix product using Intel(R) MKL dgemm function
 via CBLAS interface on 4 thread(s)

 == Matrix multiplication using Intel(R) MKL dgemm completed ==
 == at 7.69962 milliseconds using 4 thread(s) ==

 Deallocating memory

 It is highly recommended to define LOOP_COUNT for this example on your
 computer as 130 to have total execution time about 1 second for reliability
 of measurements

 Example completed.

계산 소요 시간:

17.3 milliseconds using 1 thread(s)
9.7 milliseconds using 2 thread(s)
8.7 milliseconds using 3 thread(s)
7.7 milliseconds using 4 thread(s)

정리

계산 소요 시간 비교

행렬곱을 계산하기 위하여 Intel MKL이 제공하는 CBLAS API와 멀티쓰레드를 사용하면 FOR 루프를 사용할 때에 비하여 소요 시간이 1/158 수준으로 단축됨을 알 수 있었습니다.

FOR 루프 계산: 1219.7 milliseconds
CBLAS API + 멀티쓰레드 계산: 7.7 milliseconds using 4 thread(s)

스크립트 및 소스 파일

본문에서 소개한 환경 설정 스크립트, 빌드 스크립트, 그리고 행결곱 계산 소스 파일을 GitHub에 올려 놓았습니다.

참고 문서

Written with StackEdit.

천천히, 제대로

페이지

2025년 7월 15일 화요일

Intel MKL 예제를 Microsoft Visual C++로 빌드하기

빌드 환경

예제-1. ex_nslqp_c

예제-2. matrix_multiplication

정리

참고 문서

댓글 없음:

댓글 쓰기

차등 정보보호 - 개념