Skip to content

fix mstest sdk eval timeout#558

Merged
Evangelink merged 5 commits into
mainfrom
dev/amauryleve/fix-mstest-sdk-eval-timeout
Apr 21, 2026
Merged

fix mstest sdk eval timeout#558
Evangelink merged 5 commits into
mainfrom
dev/amauryleve/fix-mstest-sdk-eval-timeout

Conversation

@Evangelink
Copy link
Copy Markdown
Member

No description provided.

The fixture csproj had MSTest.Sdk/3.8.0 while the prompt claimed the user
already upgraded to 4.1.0. This mismatch caused the model to edit the
csproj and run dotnet build, triggering slow MSTest SDK NuGet downloads
that exceeded the 240s timeout on CI.

- Update fixture csproj to MSTest.Sdk/4.1.0 to match the prompt
- Add specific CS error codes to the prompt so the model provides direct
  fixes instead of building
- Update rubric item to reflect the project already being on 4.1.0
Reduce scope of the 'Fix multiple v4 breaking changes' scenario:
- Remove 3 lower-signal breaking changes from the fixture (AreEqual params,
  IsInstanceOfType out param, Properties.Contains) to cut agent work
- Add max_turns: 5 to cap agent iterations
- Lower timeout from 240 to 180 seconds
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
migrate-mstest-v3-to-v4 Migrate custom TestMethodAttribute from Execute to ExecuteAsync 2.0/5 → 3.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Replace ExpectedExceptionAttribute with Assert.ThrowsExactly 3.7/5 → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [1]
migrate-mstest-v3-to-v4 Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout 2.7/5 ⏰ → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill, edit ✅ 0.07 [2]
migrate-mstest-v3-to-v4 Handle net6.0 target framework dropped in MSTest v4 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ⚠️ NOT ACTIVATED ✅ 0.07
migrate-mstest-v3-to-v4 Fix TestMethodAttribute CallerInfo constructor breaking change 4.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [3]
migrate-mstest-v3-to-v4 Understand behavioral changes after MSTest v4 upgrade 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Handle MSTest.Sdk and MTP changes in v4 2.0/5 → 3.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [4]
migrate-mstest-v3-to-v4 Full MSTest v3 to v4 migration with multiple breaking changes 5.0/5 → 5.0/5 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill, create, bash ✅ 0.07
migrate-mstest-v3-to-v4 Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout 3.7/5 → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [5]
migrate-mstest-v3-to-v4 Correctly identify MSTest v3 project and recommend v4 migration 4.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [6]

[1] ⚠️ High run-to-run variance (CV=2.44) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=0.60) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=30.26) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -17.6% due to: judgment, quality
[4] ⚠️ High run-to-run variance (CV=0.60) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=0.93) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -2.8% due to: judgment
[6] ⚠️ High run-to-run variance (CV=1.09) — consider re-running with --runs 5

timeout — run(s) hit the (180s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

Skill Coverage Report

Plugin Skill Covered Coverage
⚠️ dotnet-test migrate-mstest-v3-to-v4 32/41 78%
Uncovered: dotnet-test/migrate-mstest-v3-to-v4
  • [Validation] All MSTest packages updated to 4.x (line 449)
  • [Validation] Project builds with zero errors (line 450)
  • [Validation] All tests pass with dotnet test (line 451)
  • [Validation] Behavioral changes reviewed and addressed (line 458)
  • [CodePattern] [TestMethod] (line 266)
  • [CodePattern] [Timeout] (line 213)
  • [CodePattern] sealed (line 119)
  • [CodePattern] [TestMethodAttribute] (line 173)
  • [CodePattern] Assert.IsInstanceOfType (line 252)

@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
migrate-mstest-v3-to-v4 Migrate custom TestMethodAttribute from Execute to ExecuteAsync 1.7/5 → 3.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill, read_bash ✅ 0.08
migrate-mstest-v3-to-v4 Replace ExpectedExceptionAttribute with Assert.ThrowsExactly 3.7/5 → 4.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [1]
migrate-mstest-v3-to-v4 Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout 3.0/5 ⏰ → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08
migrate-mstest-v3-to-v4 Handle net6.0 target framework dropped in MSTest v4 4.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [2]
migrate-mstest-v3-to-v4 Fix TestMethodAttribute CallerInfo constructor breaking change 3.3/5 → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill, read_bash / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [3]
migrate-mstest-v3-to-v4 Understand behavioral changes after MSTest v4 upgrade 3.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08
migrate-mstest-v3-to-v4 Handle MSTest.Sdk and MTP changes in v4 2.0/5 → 3.3/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill, report_intent ✅ 0.08 [4]
migrate-mstest-v3-to-v4 Full MSTest v3 to v4 migration with multiple breaking changes 4.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [5]
migrate-mstest-v3-to-v4 Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout 4.0/5 → 4.0/5 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [6]
migrate-mstest-v3-to-v4 Verified MSTest v3 to v4 package update with build and test 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08
migrate-mstest-v3-to-v4 Fix sealed custom TestMethodAttribute with Timeout changes 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08
migrate-mstest-v3-to-v4 Fix TestMethodAttribute and TestMethod display name constructor 5.0/5 → 5.0/5 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [7]
migrate-mstest-v3-to-v4 Fix Assert.IsInstanceOfType out parameter removal 5.0/5 → 5.0/5 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [8]
migrate-mstest-v3-to-v4 Address TreatDiscoveryWarningsAsErrors and behavioral changes 2.3/5 → 3.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08
migrate-mstest-v3-to-v4 Correctly identify MSTest v3 project and recommend v4 migration 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.08 [9]

[1] ⚠️ High run-to-run variance (CV=1.47) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=1.21) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=2.68) — consider re-running with --runs 5
[4] ⚠️ High run-to-run variance (CV=0.92) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=1.29) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=1.59) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -6.5% due to: judgment
[7] ⚠️ High run-to-run variance (CV=1.33) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=2.12) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -0.5% due to: tokens (111356 → 176456)
[9] ⚠️ High run-to-run variance (CV=0.75) — consider re-running with --runs 5

timeout — run(s) hit the (180s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

@Evangelink Evangelink marked this pull request as ready for review April 21, 2026 07:46
Copilot AI review requested due to automatic review settings April 21, 2026 07:46
@Evangelink Evangelink enabled auto-merge (squash) April 21, 2026 07:46
@Evangelink Evangelink merged commit cb7c10b into main Apr 21, 2026
34 of 36 checks passed
@Evangelink Evangelink deleted the dev/amauryleve/fix-mstest-sdk-eval-timeout branch April 21, 2026 07:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR streamlines the MSTest v3→v4 migration eval scenario to reduce complexity and help avoid evaluation timeouts by removing some fixture breakages and tightening the scenario prompt/rubric.

Changes:

  • Simplifies the v3-assert-changes fixture by removing some Assert/TestContext-related examples and replacing a TestContext check with a trivial passing test.
  • Updates the Goal 3 eval prompt/rubric/assertions to focus on fewer MSTest v4 breaking changes and reduces the scenario timeout/turns.
Show a summary per file
File Description
tests/dotnet-test/migrate-mstest-v3-to-v4/fixtures/v3-assert-changes/CalculatorTests.cs Removes some previously-included breaking-change examples and simplifies a test method.
tests/dotnet-test/migrate-mstest-v3-to-v4/eval.yaml Narrows the Goal 3 prompt/rubric/assertions and adjusts max turns + timeout.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

pattern: "(ClassCleanup|ClassCleanupBehavior)"
- type: "output_matches"
pattern: "(ContainsKey|IDictionary)"
pattern: "(TestTimeout|int\\.MaxValue|Infinite)"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output_matches regex includes TestTimeout and Infinite, which are present in the broken fixture code and may also appear in compiler errors. This can let the scenario pass even if the agent never replaces TestTimeout.Infinite with the MSTest v4-compatible form (e.g., Timeout(int.MaxValue)). Tighten the assertion to only match the expected replacement (such as int\.MaxValue or a numeric timeout) and avoid matching the deprecated symbol names.

Suggested change
pattern: "(TestTimeout|int\\.MaxValue|Infinite)"
pattern: "(int\\.MaxValue|Timeout\\(\\s*\\d+\\s*\\))"

Copilot uses AI. Check for mistakes.
Comment on lines 59 to +63
- name: "Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout"
prompt: |
I upgraded from MSTest 3.8 to MSTest 4.0 and my test project has many compilation errors.
The errors mention ThrowsException, ClassCleanupBehavior, Contains, TestTimeout, and
IsInstanceOfType. Can you review my code and fix all breaking changes?
I upgraded from MSTest 3.8 to MSTest 4.0 and my test project has compilation errors.
The errors mention ThrowsException, ClassCleanupBehavior, and TestTimeout.
Can you review my code and fix the breaking changes?
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario name still mentions TestContext, but the updated prompt no longer lists TestContext-related compilation errors and the fixture no longer exercises a TestContext breaking change. Consider renaming this scenario to remove TestContext (or reintroduce a concrete TestContext breaking-change example) so the name stays aligned with what the scenario actually validates.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants