Skip to content

Alter zpool status output to indicate MMP suspended the pool #7296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 15, 2018

Conversation

ofaaland
Copy link
Contributor

@ofaaland ofaaland commented Mar 13, 2018

Description

When the pool is suspended, record whether it was due to an I/O error
or due to MMP writes failing to succeed within the required time.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Motivation and Context

Before this change, the 'zpool status' output did not indicate MMP might be the reason the pool was suspended, and instructed the user to run 'zpool clear' which is confusing (there may be no device FAULTED) and incorrect (only a reboot clears the condition at this time). The latter is a bug which should be fixed, but the instructions to the user should be correct in the meantime.

Similarly, the libzfs code for parsing the config nvlist and producing an enum value reflecting the pool status did not reflect an MMP failure when that had occurred.

How Has This Been Tested?

I ran zloop and the mmp portion of the ZTS locally.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

"system could import the pool undetected.\n"));
(void) printf(gettext("action: Make sure the pool's devices "
"are connected, then reboot your system and import the "
"pool.\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mind the 80 char limit in the output, both of these lines need to be manually wrapped with \n\t\.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -6467,6 +6467,15 @@ status_callback(zpool_handle_t *zhp, void *data)
"to be recovered.\n"));
break;

case ZPOOL_STATUS_IO_FAILURE_MMP:
(void) printf(gettext("status: The pool is suspended because "
"multihost writes failed or were delayed, and another "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my ear the phrasing here doesn't flow quite right, maybe replace ", and another system..." with "; another system...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ofaaland ofaaland force-pushed the b_zpool_status_mmp branch 2 times, most recently from f7dd71e to 17d79ec Compare March 13, 2018 23:08
@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

Merging #7296 into master will decrease coverage by 0.16%.
The diff coverage is 23.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7296      +/-   ##
==========================================
- Coverage   76.36%    76.2%   -0.17%     
==========================================
  Files         328      328              
  Lines      104023   104011      -12     
==========================================
- Hits        79439    79263     -176     
- Misses      24584    24748     +164
Flag Coverage Δ
#kernel 76.04% <33.33%> (-0.02%) ⬇️
#user 65.52% <23.07%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de4f8d5...543d0a6. Read the comment docs.

ZIO_SUSPEND_NONE = 0,
ZIO_SUSPEND_IOERR,
ZIO_SUSPEND_MMP,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add zio_suspend_reason_t as a typedef and use that throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.

Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Signed-off-by: Olaf Faaland <[email protected]>
@ofaaland ofaaland force-pushed the b_zpool_status_mmp branch from 17d79ec to 543d0a6 Compare March 14, 2018 19:33
@openzfs openzfs deleted a comment from ofaaland Mar 14, 2018
@behlendorf behlendorf merged commit cec3a0a into openzfs:master Mar 15, 2018
@ofaaland ofaaland deleted the b_zpool_status_mmp branch April 2, 2018 17:32
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 16, 2018
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.

Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#7296
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request May 4, 2018
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.

Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#7296
tonyhutter pushed a commit that referenced this pull request May 10, 2018
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.

Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7296
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants