Python Hardware Electronics Programming Astrophotography Games
Nvidia Modular
diagnostic software -
MODS
RkBlog > EN > Hardware
Nvidia MODS or “Modular diagnostic software” is an Nvidia
Share
internal set of tools for GPU diagnostic. Those tools did leak out and
are now used by third party repair shops when troubleshooting
broken GPUs. Lets take a look what MODS can do and how to use it.
Modular diagnostic software overview
MODS is available as a collection of two tools that can run various
tests that check all aspects of a graphics card - from VRAM chips to
GPU chip specifics. It's used by OEM to validate a card or by repair
technician to help track down broken part of the product.
The software versions that went public are distributed as ZIP
archives containing a miniature Linux distribution with all
dependencies and drivers. The intent is to boot it from a bootable
flash drive, execute tests and look at the results (which are also
saved as a text file on the flash drive). Mods comes with a PDF file
containing full documentation on how to use it.
Creating bootable MODS flash drive
You will have to Google out sites offering MODS for download.
Usually it's some Russian forums or sites related to third party
repair. You will also find some tutorials or repair examples on
YouTube. You should not attempt a repair if you have no experience
with this.
From what I could find there are two versions - 367.38.1 with all
tools and documentation and partial 400.184 containing only “mods”
and “mats” tools (there could be a newer version as well). The
367.38.1 version does not support Turing cards so if you have an RTX
or GTX 16XX card you will need those newer two files as well (from
what I see only “mats” works).
Assuming you have the ZIP file we can create the bootable flash
drive:
Use Rufus to create bootable FreeDOS on the Flash drive
Extract the MODS zip file and copy it contents onto the flash drive
Edit [Link] and add such lines at the end of the file:
copy c:\mods\[Link] c:\mods\pkgname
copy c:\mods\[Link] c:\mods\runmods
\grub --config-file="find --set-root /tiny/kernel; configfile /dos2l
Note that if you have a different version of the tools the
“[Link]” file name would have to be corrected here. This will
give you a bootable MODS flash drive that boots into Linux via
FreeDOS.
On boot it will execute tests defined in /mods/ARGS file, for
example:
[Link]
-test 3
-mfg
-null_display
-poll_interrupts
-pstate [Link]
-no_thermal_slowdown
-matsinfo
You can edit this file and set preferred set of tests or execute them
manually after the system boots. For more options on usage and
customization of the software stack you can watch this video:
▸
Using Nvidia MODS
How to use MODS
There are two main tools in this software stack - mods and mats.
First one is used to test the GPU the second one is used to test the
VRAM chips. Weird artifacts or famous Turing “xd” artifacts are
usually associated with damaged VRAM chip. Other symptoms may
be related to the GPU chip itself or some component on the board.
“mods” won't tell you everything but if you are a repair specialist it
should help.
For end users/gamers those tools could be quickly used to see if
their GPU is working correctly, especially when buying used cards.
Mods can run explicit tests from a list (check the PDF for details) or
two sets of tests - quicker OEM one or full suite:
mods [Link] -mfg (for CEM testing)
mods [Link] -oqa (for OEM outgoing QA testing)
Running “mods” creates a “[Link]” file containing all output from
all tests run.
MODS start: Thu Nov 12 [Link] 2020
Warning : test specifications should be used to control p-states
Command Line : [Link] -test 3 -test 18 -test 19 -test 52 -test 1
CPU
Foundry : GenuineIntel
Name : Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
Family : 6
Model : 14
Stepping : 10
Version
MODS : 367.38
OperatingSystem: Linux (x86_64)
Kernel : 4.1.2-gentoo
KernelDriver : 3.63
HostName : tinylinux
Smbios version [0x302] is not supported
gpu 0 [Link] 0.0
---------------------------
Device Id : GP104
...
“mats” can be used to test VRAM chips, for example:
./mats -e 10
This will start displaying weird colors on the screen and after it's
done it will print a report (and save it as “[Link]”). The result can
look like so:
mats version 400.184. Testing TU106 with 50 MB of memory starting w
Read Error Count: 0
Write Error Count: 0
Unknown Error Count: 0
=== MEMORY ERRORS BY SUBPARTITION ===
SUBPART READ ERRORS WRITE ERRORS UNKNOWN ERRS
------- ----------- ------------ ------------
FBIOA0 0 0 0
FBIOA1 0 0 0
FBIOB0 0 0 0
FBIOB1 0 0 0
FBIOC0 0 0 0
FBIOC1 0 0 0
FBIOD0 0 0 0
FBIOD1 0 0 0
Failing Bits:
None
Error Code = 00000000 (OK)
####### #### ###### ######
######## ###### ######## ########
## ## ## ## ## # ## #
## ## ## ## ### ###
######## ######## #### ####
####### ######## ### ###
## ## ## # ## # ##
## ## ## ######## ########
## ## ## ###### ######
This lists every memory channel (FBIO) / chip and errors for each if
any occurred. Starting from bottom right chip you can identify each
VRAM chip with the subpart label (starting with higher bits first):
VRAM chip labels on TU106
If you would get some errors on some of the chips then that could
indicate a problem with that chip - or problems with memory
controller on the GPU or circuitry leading to the memory chip.
If you are interested in GPU repair or analyzing graphics card state,
power lines and alike I would recommend checking YouTube videos
where repair specialists go over fixing broken GPUs. I would not
recommend any attempts at fixing a valuable GPU if you have no
prior experience with it.
Hardware benchmarks and reviews,
12 November 2020,
Piotr
Share
Maliński
Comment article