3

I have been struggling getting CUDA to work on Mathematica 11.3 and Manjaro Linux (kernel 4.19.16-1-MANJARO). Trying some commands I get the following:

in: CUDAQ[]
out: False
in: CUDAMap[Cos,1.0*Range[10]]
out: CUDAMap: CUDA was not able to find a valid CUDA driver.
in: CUDADriverVersion[]
out: CUDADriverVersion: CUDALink was not able to locate the NVIDIA driver binary.

However, Mathematica is able to find my GPU (as long as I run it from the terminal using "optirun mathematica"), since I can run neural network training with TargetDevice->"GPU". Also, running SystemInformation[], Mathematica find my dedicated graphics card (nVidia GTX950m)

I have installed the CUDAResources,

in: CUDAResourcesInformation[]
out: {{"Name" -> "CUDAResources", "Version" -> "11.3.154", 
   "BuildNumber" -> "", "Qualifier" -> "Lin64", 
   "WolframVersion" -> "11.3", "SystemID" -> {"Linux-x86-64"}, 
   "Description" -> "{ToolkitVersion -> v9.1, MinimumDriver -> 290}", 
   "Category" -> "", "Creator" -> "", "Publisher" -> "", 
   "Support" -> "", "Internal" -> False, 
   "Location" -> 
   "/home/bjorn/.Mathematica/Paclets/Repository/CUDAResources-Lin64-11.\
   3.154", "Context" -> {}, "Enabled" -> True, "Loading" -> Manual, 
   "Hash" -> "2bcd82c65870e597344b0444ebbc5c27"}}

I have the latest drivers:

$ pacman -Qs |grep nvidia
 local/lib32-nvidia-utils 1:415.27-1
 local/linux414-nvidia 1:415.27-2 (linux414-extramodules)
 local/linux419-nvidia 1:415.27-2 (linux419-extramodules)
 local/mhwd-nvidia 1:415.27-1
   MHWD module-ids for nvidia 415.27
 local/nvidia-utils 1:415.27-1
 local/opencl-nvidia 1:415.27-1

I have installed CUDA:

$ pacman -Qs |grep cuda
   local/cuda 10.0.130-2

I can run CUDA outside of Mathematica using the samples that come with the CUDA installation, e.g.:

$ optirun bin/x86_64/linux/release/deviceQuery

  CUDA Device Query (Runtime API) version (CUDART static linking)

  Detected 1 CUDA Capable device(s)

  Device 0: "GeForce GTX 950M"
    CUDA Driver Version / Runtime Version          10.0 / 10.0
    CUDA Capability Major/Minor version number:    5.0
    Total amount of global memory:                 4046 MBytes (4242604032 bytes)
    ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
    GPU Max Clock rate:                            1124 MHz (1.12 GHz)
    Memory Clock rate:                             1001 Mhz
    Memory Bus Width:                              128-bit
    L2 Cache Size:                                 2097152 bytes
    Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
    Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
    Total amount of constant memory:               65536 bytes
    Total amount of shared memory per block:       49152 bytes
    Total number of registers available per block: 65536
    Warp size:                                     32
    Maximum number of threads per multiprocessor:  2048
    Maximum number of threads per block:           1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch:                          2147483647 bytes
    Texture alignment:                             512 bytes
    Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
    Run time limit on kernels:                     Yes
    Integrated GPU sharing Host Memory:            No
    Support host page-locked memory mapping:       Yes
    Alignment requirement for Surfaces:            Yes
    Device has ECC support:                        Disabled
    Device supports Unified Addressing (UVA):      Yes
    Device supports Compute Preemption:            No
    Supports Cooperative Kernel Launch:            No
    Supports MultiDevice Co-op Kernel Launch:      No
    Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
    Compute Mode:
       < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
  Result = PASS

I do not think it is a problem with the paths, as was the issue in Are you able to use CUDA on Linux?

$ ls /usr/lib64/ |grep libnvidia
  ...
  libnvidia-tls.so
  libnvidia-tls.so.415.27

$ ls /usr/lib64/ |grep libcuda
  libcuda.so
  libcuda.so.1
  libcuda.so.415.27
a20
  • 912
  • 5
  • 17

1 Answers1

4

After struggling for two days, I finally figure it out minutes after asking for help... so typical :) Nevertheless, I post my solution here in case anyone else gets stuck.

The Mathematica documentation ( https://reference.wolfram.com/language/CUDALink/tutorial/Setup.html#271291502 ) claims that the default paths for the path variables are:

$NVIDIA_DRIVER_LIBRARY /usr/lib64/libnvidia-tls.so
$CUDA_LIBRARY_PATH /usr/lib64/libcuda.so

if the path variables are not already defined. This appears to be incorrect. The variables were undefined for me, but explicitly defining them solved the problem.

In your ~/.bashrc , add the following lines (assuming your lib-files are installed at the default locations):

export NVIDIA_DRIVER_LIBRARY_PATH=/usr/lib64/libnvidia-tls.so
export CUDA_LIBRARY_PATH=/usr/lib64/libcuda.so  

Re-initate the .bashrc file by running:

$ source .bashrc

Start Mathematica with optirun (if you have both integrated and dedicated graphics cards):

$ optirun mathematica
in: Needs["CUDALink`"]
in: CUDAQ[]
out: True
a20
  • 912
  • 5
  • 17