summaryrefslogtreecommitdiff
path: root/src/test/isolation/isolationtester.c
AgeCommit message (Collapse)Author
2016-08-30Fix a bunch of places that called malloc and friends with no NULL check.Tom Lane
Where possible, use palloc or pg_malloc instead; otherwise, insert explicit NULL checks. Generally speaking, these are places where an actual OOM is quite unlikely, either because they're in client programs that don't allocate all that much, or they're very early in process startup so that we'd likely have had a fork() failure instead. Hence, no back-patch, even though this is nominally a bug fix. Michael Paquier, with some adjustments by me Discussion: <CAB7nPqRu07Ot6iht9i9KRfYLpDaF2ZuUv5y_+72uP23ZAGysRg@mail.gmail.com>
2016-04-27Use memmove() not memcpy() to slide some pointers down.Tom Lane
The previous coding here was formally undefined, though it seems to accidentally work on most platforms in the buildfarm. Caught by some OpenBSD platforms in which libc contains an assertion check for overlapping areas passed to memcpy(). Thomas Munro
2016-03-09Handle invalid libpq sockets in more placesPeter Eisentraut
Also, make error messages consistent. From: Michael Paquier <[email protected]>
2016-02-22Create a function to reliably identify which sessions block which others.Tom Lane
This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-12Revert "isolationtester: don't repeat the is-it-waiting query when retrying ↵Tom Lane
a step." This mostly reverts commit 9c9782f066e0ce5424b8706df2cce147cb78170f. I left in the parts that rearranged removal of completed waiting steps; but the idea of not rechecking a step's blocked-ness isn't working.
2016-02-12isolationtester: don't repeat the is-it-waiting query when retrying a step.Tom Lane
If we're retrying a step, then we already decided it was blocked on a lock, and there's no need to recheck that. The original coding of commit 38f8bdcac4982215beb9f65a19debecaf22fd470 resulted in a large number of is-it-waiting queries when dealing with multiple concurrently-blocked sessions, which is fairly pointless and also results in test failures in CLOBBER_CACHE_ALWAYS builds, where the is-it-waiting query is quite slow. This definition also permits appending pg_sleep() calls to steps where it's needed to control the order of finish of concurrent steps. Before, that did not work nicely because we'd decide that a step performing a sleep was not blocked and hang up waiting for it to finish, rather than noticing the completion of the concurrent step we're supposed to notice first. In passing, revise handling of removal of completed waiting steps to make it a bit less messy.
2016-02-12Re-pgindent isolationtester.c.Tom Lane
Need to do some more hacking on this, and got annoyed that it's not indent clean.
2016-02-12Fix whitespacePeter Eisentraut
2016-02-11Code review for isolationtester changes.Tom Lane
Fix a few oversights in 38f8bdcac4982215beb9f65a19debecaf22fd470: don't leak memory in run_permutation(), remember when we've issued a cancel rather than issuing another one every 10ms, fix some typos in comments.
2016-02-11Modify the isolation tester so that multiple sessions can wait.Robert Haas
This allows testing of deadlock scenarios. Scenarios that would previously have been considered invalid are now simply taken as a scenario in which more than one backend will wait.
2015-08-15Reject isolation test specifications with duplicate step names.Robert Haas
alter-table-1.spec has such a case, so change one instance of step rx1 to rx3 instead.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-02-15Centralize getopt-related declarations in a new header file pg_getopt.h.Tom Lane
We used to have externs for getopt() and its API variables scattered all over the place. Now that we find we're going to need to tweak the variable declarations for Cygwin, it seems like a good idea to have just one place to tweak. In this commit, the variables are declared "#ifndef HAVE_GETOPT_H". That may or may not work everywhere, but we'll soon find out. Andres Freund
2013-12-20isolationtester: Ensure stderr is unbuffered, tooAlvaro Herrera
2013-12-19Make stdout unbufferedAlvaro Herrera
This ensures that all stdout output is flushed immediately, to match stderr. This eliminates the need for fflush(stdout) calls sprinkled all over the place. Per Daniel Wood in message [email protected]
2013-11-18Replace appendPQExpBuffer(..., <constant>) with appendPQExpBufferStrHeikki Linnakangas
Arguably makes the code a bit more readable, and might give a small performance gain. David Rowley
2013-11-08Fix pg_isolation_regress to work outside its build directory.Robert Haas
This makes it possible to, for example, use the isolation tester to test a contrib module. Andres Freund
2013-10-22Replace pg_asprintf() with psprintf().Tom Lane
This eliminates an awkward coding pattern that's also unnecessarily inconsistent with backend coding. psprintf() is now the thing to use everywhere.
2013-10-13Add use of asprintf()Peter Eisentraut
Add asprintf(), pg_asprintf(), and psprintf() to simplify string allocation and composition. Replacement implementations taken from NetBSD. Reviewed-by: Álvaro Herrera <[email protected]> Reviewed-by: Asif Naeem <[email protected]>
2013-10-04isolationtester: Allow tuples to be returned in more placesAlvaro Herrera
Previously, isolationtester would forbid returning tuples in session-specific teardown (but not global teardown), as well as in global setup. Allow these places to return tuples, too.
2013-04-07In isolationtester, retry after EINTR return from select(2).Tom Lane
Per report from Jaime Casanova. Very curious that no one else has seen this failure ... but the code is clearly wrong as-is.
2013-04-03Minor robustness improvements for isolationtester.Tom Lane
Notice and complain about PQcancel() failures. Also, don't dump core if an error PGresult doesn't contain severity and message subfields, as it might not if it was generated by libpq itself. (We have a longstanding TODO item to improve that, but in the meantime isolationtester had better cope.) I tripped across the latter item while investigating a trouble report on buildfarm member spoonbill. As for the former, there's no evidence that PQcancel failure is actually involved in spoonbill's problem, but it still seems like a bad idea to ignore an error return code.
2013-02-28Flush stderr and stdout in isolation tester.Andrew Dunstan
This is a possibly vain attempt to fix a buffering issue observed for some MSVC builds.
2013-01-23isolationtester: add a few fflush(stderr) callsAlvaro Herrera
The lack of them is causing failures in some BF members. Per Andrew Dunstan.
2013-01-23Improve concurrency of foreign key lockingAlvaro Herrera
This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
2012-09-05Allow isolation tests to specify multiple setup blocks.Kevin Grittner
Each setup block is run as a single PQexec submission, and some statements such as VACUUM cannot be combined with others in such a block. Backpatch to 9.2. Kevin Grittner and Tom Lane
2012-06-10Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian
commit-fest.
2012-01-28Add simple tests of EvalPlanQual using the isolationtester infrastructure.Tom Lane
Much more could be done here, but at least now we have *some* automated test coverage of that mechanism. In particular this tests the writable-CTE case reported by Phil Sorber. In passing, remove isolationtester's arbitrary restriction on the number of steps in a permutation list. I used this so that a single spec file could be used to run several related test scenarios, but there are other possible reasons to want a step series that's not exactly a permutation. Improve documentation and fix a couple other nits as well.
2012-01-14Detect invalid permutations in isolationtesterAlvaro Herrera
isolationtester is now able to continue running other permutations when it detects that one of them is invalid, which is useful during initial development of spec files. Author: Alexander Shulgin
2012-01-14Avoid NULL pointer dereference in isolationtesterAlvaro Herrera
2012-01-11Validate number of steps specified in permutationAlvaro Herrera
A permutation that specifies more steps than defined causes isolationtester to crash, so avoid that. Using less steps than defined should probably not be a problem, but no spec currently does that.
2011-11-04Unbreak isolationtester on Win32Alvaro Herrera
I broke it in a previous commit because I neglected to install the necessary incantations to have getopt() work on Windows. Per red blots in buildfarm.
2011-11-03Implement a dry-run mode for isolationtesterAlvaro Herrera
This mode prints out the permutations that would be run by the given spec file, in the same format used by the permutation lines in spec files. This helps in building new spec files. Author: Alexander Shulgin, with some tweaks by me
2011-10-25Add debugging aid in isolationtesterAlvaro Herrera
2011-09-27Remove dependency on error ordering in isolation testsAlvaro Herrera
We now report errors reported by the just-unblocked and unblocking transactions identically; this should fix relatively common buildfarm failures reported by animals that are failing the "wrong" session.
2011-09-27Fix typoAlvaro Herrera
2011-08-18Add an SSI regression test that tests all interesting permutations in theHeikki Linnakangas
order of begin, prepare, and commit of three concurrent transactions that have conflicts between them. The test runs for a quite long time, and the expected output file is huge, but this test caught some serious bugs during development, so seems worthwhile to keep. The test uses prepared transactions, so it fails if the server has max_prepared_transactions=0. Because of that, it's marked as "ignore" in the schedule file. Dan Ports
2011-07-19Make isolationtester more robust on locked commandsAlvaro Herrera
Noah Misch diagnosed the buildfarm problems in the isolation tests partly as failure to differentiate backends properly; the old code was using backend IDs, which is not good enough because a new backend might use an already used ID. Use PIDs instead. Also, the code was purposely careless about other concurrent activity, because it isn't expected; and in fact, it doesn't affect the vast majority of the time. However, it can be observed that autovacuum can block tables for long enough to cause sporadic failures. The new code accounts for that by ignoring locks held by processes not explicitly declared in our spec file. Author: Noah Misch
2011-07-12Add support for blocked commands in isolationtesterAlvaro Herrera
This enables us to test that blocking commands (such as foreign keys checks that conflict with some other lock) act as intended. The set of tests that this adds is pretty minimal, but can easily be extended by adding new specs. The intention is that this will serve as a basis for ensuring that further tweaks of locking implementation preserve (or improve) existing behavior. Author: Noah Misch
2011-05-08Fix some portability issues in isolation regression test driver.Tom Lane
Remove random system #includes in favor of using postgres_fe.h. (The alternative to that is letting this module grow its own configuration testing ability...) Also fix the "make clean" target to actually clean things up. Per local testing.
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-02-07Implement genuine serializable isolation level.Heikki Linnakangas
Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen