3.4 检测BLAS和LAPACK数学库

NOTE:此示例代码可以在 https://github.com/dev-cafe/cmake-cookbook/tree/v1.0/chapter-03/recipe-04 中找到，有一个C++示例。该示例在CMake 3.5版(或更高版本)中是有效的，并且已经在GNU/Linux、macOS和Windows上进行过测试。

许多数据算法严重依赖于矩阵和向量运算。例如：矩阵-向量和矩阵-矩阵乘法，求线性方程组的解，特征值和特征向量的计算或奇异值分解。这些操作在代码库中非常普遍，因为操作的数据量比较大，因此高效的实现有绝对的必要。幸运的是，有专家库可用：基本线性代数子程序(BLAS)和线性代数包(LAPACK)，为许多线性代数操作提供了标准API。供应商有不同的实现，但都共享API。虽然，用于数学库底层实现，实际所用的编程语言会随着时间而变化(Fortran、C、Assembly)，但是也都是Fortran调用接口。考虑到调用街扩，本示例中的任务要链接到这些库，并展示如何用不同语言编写的库。

准备工作

为了展示数学库的检测和连接，我们编译一个C++程序，将矩阵的维数作为命令行输入，生成一个随机的方阵A，一个随机向量b，并计算线性系统方程: Ax = b。另外，将对向量b的进行随机缩放。这里，需要使用的子程序是BLAS中的DSCAL和LAPACK中的DGESV来求线性方程组的解。示例C++代码的清单( linear-algebra.cpp)：

#include "CxxBLAS.hpp"
#include "CxxLAPACK.hpp"

#include <iostream>
#include <random>
#include <vector>

int main(int argc, char** argv) {
  if (argc != 2) {
    std::cout << "Usage: ./linear-algebra dim" << std::endl;
    return EXIT_FAILURE;
  }

  // Generate a uniform distribution of real number between -1.0 and 1.0
  std::random_device rd;
  std::mt19937 mt(rd());
  std::uniform_real_distribution<double> dist(-1.0, 1.0);

  // Allocate matrices and right-hand side vector
  int dim = std::atoi(argv[1]);
  std::vector<double> A(dim * dim);
  std::vector<double> b(dim);
  std::vector<int> ipiv(dim);
  // Fill matrix and RHS with random numbers between -1.0 and 1.0
  for (int r = 0; r < dim; r++) {
    for (int c = 0; c < dim; c++) {
      A[r + c * dim] = dist(mt);
    }
    b[r] = dist(mt);
  }

  // Scale RHS vector by a random number between -1.0 and 1.0
  C_DSCAL(dim, dist(mt), b.data(), 1);
  std::cout << "C_DSCAL done" << std::endl;

  // Save matrix and RHS
  std::vector<double> A1(A);
  std::vector<double> b1(b);
  int info;
  info = C_DGESV(dim, 1, A.data(), dim, ipiv.data(), b.data(), dim);
  std::cout << "C_DGESV done" << std::endl;
  std::cout << "info is " << info << std::endl;

  double eps = 0.0;
  for (int i = 0; i < dim; ++i) {
    double sum = 0.0;
    for (int j = 0; j < dim; ++j)
      sum += A1[i + j * dim] * b[j];
    eps += std::abs(b1[i] - sum);
  }
  std::cout << "check is " << eps << std::endl;

  return 0;
}

使用C++11的随机库来生成-1.0到1.0之间的随机分布。C_DSCAL和C_DGESV分别是到BLAS和LAPACK库的接口。为了避免名称混淆，将在下面来进一步讨论CMake模块：

文件CxxBLAS.hpp用extern "C"封装链接BLAS:

#pragma once
#include "fc_mangle.h"
#include <cstddef>
#ifdef __cplusplus
extern "C" {
#endif
extern void DSCAL(int *n, double *alpha, double *vec, int *inc);
#ifdef __cplusplus
}
#endif
void C_DSCAL(size_t length, double alpha, double *vec, int inc);

对应的实现文件CxxBLAS.cpp:

#include "CxxBLAS.hpp"

#include <climits>

// see http://www.netlib.no/netlib/blas/dscal.f
void C_DSCAL(size_t length, double alpha, double *vec, int inc) {
  int big_blocks = (int)(length / INT_MAX);
  int small_size = (int)(length % INT_MAX);
  for (int block = 0; block <= big_blocks; block++) {
    double *vec_s = &vec[block * inc * (size_t)INT_MAX];
    signed int length_s = (block == big_blocks) ? small_size : INT_MAX;
    ::DSCAL(&length_s, &alpha, vec_s, &inc);
  }
}

CxxLAPACK.hpp和CxxLAPACK.cpp为LAPACK调用执行相应的转换。

具体实施

对应的CMakeLists.txt包含以下构建块:

我们定义了CMake最低版本，项目名称和支持的语言:

cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
project(recipe-04 LANGUAGES CXX C Fortran)

使用C++11标准:

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

此外，我们验证Fortran和C/C++编译器是否能协同工作，并生成头文件，这个文件可以处理名称混乱。两个功能都由FortranCInterface模块提供:
```
include(FortranCInterface)

FortranCInterface_VERIFY(CXX)

FortranCInterface_HEADER(
  fc_mangle.h
  MACRO_NAMESPACE "FC_"
  SYMBOLS DSCAL DGESV
  )
```

然后，找到BLAS和LAPACK:

find_package(BLAS REQUIRED)
find_package(LAPACK REQUIRED)

接下来，添加一个库，其中包含BLAS和LAPACK包装器的源代码，并链接到LAPACK_LIBRARIES，其中也包含BLAS_LIBRARIES:

add_library(math "")

target_sources(math
  PRIVATE
    CxxBLAS.cpp
    CxxLAPACK.cpp
  )

target_include_directories(math
  PUBLIC
    ${CMAKE_CURRENT_SOURCE_DIR}
    ${CMAKE_CURRENT_BINARY_DIR}
  )

target_link_libraries(math
  PUBLIC
      ${LAPACK_LIBRARIES}
  )

注意，目标的包含目录和链接库声明为PUBLIC，因此任何依赖于数学库的附加目标也将在其包含目录中。

最后，我们添加一个可执行目标并链接math：

add_executable(linear-algebra "")

target_sources(linear-algebra
  PRIVATE
      linear-algebra.cpp
  )

target_link_libraries(linear-algebra
  PRIVATE
      math
  )

配置时，我们可以关注相关的打印输出:

$ mkdir -p build
$ cd build
$ cmake ..

...
-- Detecting Fortran/C Interface
-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling
-- Verifying Fortran/C Compiler Compatibility
-- Verifying Fortran/C Compiler Compatibility - Success
...
-- Found BLAS: /usr/lib/libblas.so
...
-- A library with LAPACK API found.
...

最后，构建并测试可执行文件:

$ cmake --build .
$ ./linear-algebra 1000

C_DSCAL done
C_DGESV done
info is 0
check is 1.54284e-10

工作原理

FindBLAS.cmake和FindLAPACK.cmake将在标准位置查找BLAS和LAPACK库。对于前者，该模块有SGEMM函数的Fortran实现，一般用于单精度矩阵乘积。对于后者，该模块有CHEEV函数的Fortran实现，用于计算复杂厄米矩阵的特征值和特征向量。查找在CMake内部，通过编译一个小程序来完成，该程序调用这些函数，并尝试链接到候选库。如果失败，则表示相应库不存于系统上。

生成机器码时，每个编译器都会处理符号混淆，不幸的是，这种操作并不通用，而与编译器相关。为了解决这个问题，我们使用FortranCInterface模块( https://cmake.org/cmake/help/v3.5/module/FortranCInterface.html )验证Fortran和C/C++能否混合编译，然后生成一个Fortran-C接口头文件fc_mangle.h，这个文件用来解决编译器性的问题。然后，必须将生成的fc_mann .h包含在接口头文件CxxBLAS.hpp和CxxLAPACK.hpp中。为了使用FortranCInterface，我们需要在LANGUAGES列表中添加C和Fortran支持。当然，也可以定义自己的预处理器定义，但是可移植性会差很多。

我们将在第9章中更详细地讨论Fortran和C的互操作性。

NOTE:目前，BLAS和LAPACK的许多实现已经在Fortran外附带了一层C包装。这些包装器多年来已经标准化，称为CBLAS和LAPACKE。

准备工作

具体实施

工作原理

更多信息