Skip to content

Conversation

@rjoursler
Copy link
Contributor

Enables source location forwarding for some simple wrapper functionality. This enables better debugging by enabling developers to focus on the higher level details when using a debugger.

@rjoursler rjoursler requested a review from a team as a code owner April 16, 2025 20:02
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Apr 16, 2025
@rjoursler rjoursler force-pushed the rjoursle/gemm_debug branch from 5e4bfd6 to 00a7fca Compare April 16, 2025 20:47
@rjoursler
Copy link
Contributor Author

make test
disable test_device_cpu
disable test_device_gpu

emul32High(1, dst.ud(), src0, src1, loc);
else
mul(1, dst, src0, src1);
mul(1, dst, src0, src1, loc);
Copy link
Contributor

@petercad petercad Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be thinking about a more general mechanism for "scoping" location information in this way, instead of passing around lots of SourceLocation objects. Something like this:

auto mulHigh = [&](..., SourceLocationScoper scoper = {this}) {
// rest of code as before
};

And then SourceLocationScoper is defined something like:

template <HW hw>
class SourceLocationScoper {
    SourceLocation loc;
    BinaryCodeGenerator *generator = nullptr;
public:
    explicit SourceLocationScoper(ngen::BinaryCodeGenerator<hw> *g)
                : loc{std::source_location::current()}, generator(g) {
        g->enterLocationScope(loc);
    }
    ~SourceLocationScoper() {
        generator->exitLocationScope();
    }
}

The new generator methods {enter,exit}LocationScope would set/clear a location override (new variable in the generator class). You'd have a counter as well to properly support nested scopes (only the outermost is honored).

You could also use this method internally inside nGEN (e.g. pseudo-instructions, etc.) to avoid lots of passing around of loc objects.

Copy link
Contributor Author

@rjoursler rjoursler Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see what the benefit is here. Is this just supposed to be a performance optimization? For release builds, Source Location will be an empty object, so I expect it would mostly be optimized away anyway. If the goal is to just avoid forwarding loc in the source code, I don't see much benefit as forwarding the same location for a "large" operation seems misguided anyway.

In general, I agree we could use some improvement here, I just haven't been able to come up with a good mechanism. The core problem is that we have multiple source locations that we could reasonably map to each instruction, so I don't see a general mechanism we could use to pick the "right" location as it depends on the use and what is normally being debugged.

Copy link
Contributor

@petercad petercad Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for better readability/easier programmability, rather than performance. The idea is that you only need to change one line to combine location information:

auto mulHigh = [&](..., SourceLocationScoper scoper = {this}) {
      emul32High(...);    // don't have to pass loc here
      mul(...);           // don't have to pass loc here
};

instead of modifying every single instruction:

auto mulHigh = [&](..., SourceLocation loc = {}) {
      emul32High(..., loc);    // have to pass loc here
      mul(..., loc);           // have to pass loc here
};

When there are a lot of nested instructions, it's easy to miss a loc and just a lot of code changes to make. Since this pattern is appearing in lots of places, it'd be nice to simplify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants