|
楼主 |
发表于 2020-1-30 22:06:15
|
显示全部楼层
顺便说下具体的情况,Einstein跑FGRPB1G 大约在1分钟的时候出现卡死,表现为驱动界面及GPU-Z界面无响应,任务直接报错,随后GPU占用率锁定在99%,必须重启才能恢复
- 16:00:11 (6976): [normal]: This Einstein[url=home.php?mod=space&uid=92741]@home[/url] App was built at: May 8 2019 13:29:27
- 16:00:11 (6976): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe'.
- 16:00:11 (6976): [debug]: 1e+016 fp, 5.1e+009 fp/s, 2050312 s, 569h31m52s11
- 16:00:11 (6976): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
- command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah1062L33.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.526880e-07 --ldiBins 30 --f0start 380.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah1062L33_0388_15577681.dat --debug 1 --device 0 -o LATeah1062L33_388.0_0_0.0_15577681_2_0.out
- output files: 'LATeah1062L33_388.0_0_0.0_15577681_2_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1062L33_388.0_0_0.0_15577681_2_0' 'LATeah1062L33_388.0_0_0.0_15577681_2_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1062L33_388.0_0_0.0_15577681_2_1'
- 16:00:11 (6976): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
- 16:00:11 (6976): [debug]: Set up communication with graphics process.
- boinc_get_opencl_ids returned [0000000003ba5080 , 00007ffa5d54bfd0]
- Using OpenCL platform provided by: Advanced Micro Devices, Inc.
- Using OpenCL device "gfx1012" by: Advanced Micro Devices, Inc.
- Max allocation limit: 4244635648
- Global mem size: 4278190080
- Couldn't create OpenCL command queue (error: -6)!
- OpenCL shutdown complete!
- initialize_ocl returned error [2013]
- OCL context null
- OCL queue null
- Error generating generic FFT context object [5]
- 16:00:24 (6976): [CRITICAL]: ERROR: MAIN() returned with error '5'
- FPU status flags:
- 16:00:35 (6976): [normal]: done. calling boinc_finish(69).
- 16:00:35 (6976): called boinc_finish
复制代码
PG也是在1分钟左右出现进度条停滞,GPU负载消失,只能手动终止任务
- <core_client_version>7.14.2</core_client_version>
- <![CDATA[
- <message>
- aborted by user</message>
- <stderr_txt>
- geneferocl 3.3.3-2 (Windows/OpenCL/32-bit)
- Copyright 2001-2018, Yves Gallot
- Copyright 2009, Mark Rodenkirch, David Underbakke
- Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
- Copyright 2011-2014, Michael Goetz, Ronald Schneider
- Copyright 2011-2018, Iain Bethune
- Genefer is free source code, under the MIT license.
- Running on platform 'AMD Accelerated Parallel Processing', device 'gfx1012', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 1.2 AMD-APP (3004.8)' and driver '3004.8 (PAL,LC)'.
- 11 computeUnits @ 1737MHz, memSize=3072MB, cacheSize=16kB, cacheLineSize=64B, localMemSize=64kB, maxWorkGroupSize=256.
- Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
- Command line: projects/www.primegrid.com/geneferocl_windows_3.3.3-2.exe -boinc -q 72776242^65536+1
- Normal priority change succeeded.
- Checking available transform implementations...
- OCL transform is past its b limit.
- OCL3 transform is past its b limit.
- OCL4 transform is past its b limit.
- OCL5 transform is past its b limit.
- Using OCL2 transform
- Starting initialization...
- Initialization complete (0.120 seconds).
- Testing 72776242^65536+1...
- Estimated time for 72776242^65536+1 is 0:01:59
- maxErr exceeded for 72776242^65536+1, 1.0000 > 0.4500
- Errors occurred for all available transform implementations
- Waiting 10 minutes before attempting to continue from last checkpoint...
- </stderr_txt>
- ]]>
复制代码
Milkyway运行过程正常,但验证错误率高达15%以上
|
|