Skip to content

Commit f303596

Browse files
JelteFCommitfest Bot
authored and
Commitfest Bot
committed
Bump postmaster soft open file limit (RLIMIT_NOFILE) when necessary
The default open file limit of 1024 on Linux is extremely low. The reason that this hasn't changed change is because doing so would break legacy programs that use the select(2) system call in hard to debug ways. So instead programs that want to opt-in to a higher open file limit are expected to bump their soft limit to their hard limit on startup. Details on this are very well explained in a blogpost by the systemd author[1]. There's also a similar change done by the Go language[2]. This starts bumping postmaster its soft open file limit when we realize that we'll run into the soft limit with the requested max_files_per_process GUC. We do so by slightly changing the meaning of the max_files_per_process GUC. The actual (not publicly exposed) limit is max_safe_fds, previously this would be set to: max_files_per_process - already_open_files - NUM_RESERVED_FDS After this change we now try to set max_safe_fds to max_files_per_process if the system allows that. This is deemed more natural to understand for users, because now the limit of files that they can open is actually what they configured in max_files_per_process. Adding this infrastructure to change RLIMIT_NOFILE when needed is especially useful for the AIO work that Andres is doing, because io_uring consumes a lot of file descriptors. Even without looking at AIO there is a large number of reports from people that require changing their soft file limit before starting Postgres, sometimes falling back to lowering max_files_per_process when they fail to do so[3-8]. It's also not all that strange to fail at setting the soft open file limit because there are multiple places where one can configure such limits and usually only one of them is effective (which one depends on how Postgres is started). In cloud environments its also often not possible for user to change the soft limit, because they don't control the way that Postgres is started. One thing to note is that we temporarily restore the original soft limit when shell-ing out to other executables. This is done as a precaution in case those executables are using select(2). [1]: https://0pointer.net/blog/file-descriptor-limits.html [2]: golang/go#46279 [3]: https://serverfault.com/questions/785330/getting-too-many-open-files-error-for-postgres [4]: https://serverfault.com/questions/716982/how-to-raise-max-no-of-file-descriptors-for-daemons-running-on-debian-jessie [5]: https://www.postgresql.org/message-id/flat/CAKtc8vXh7NvP_qWj8EqqorPY97bvxSaX3h5u7a9PptRFHW5x7g%40mail.gmail.com [6]: https://www.postgresql.org/message-id/flat/113ce31b0908120955w77029099i7ececc053084095a%40mail.gmail.com [7]: abiosoft/colima#836 [8]: https://www.postgresql.org/message-id/flat/29663.1007738957%40sss.pgh.pa.us#2079ec9e2d8b251593812a3711bfe9e9
1 parent df02bb9 commit f303596

File tree

1 file changed

+184
-15
lines changed
  • src/backend/storage/file

1 file changed

+184
-15
lines changed

src/backend/storage/file/fd.c

+184-15
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,13 @@ int max_files_per_process = 1000;
158158
*/
159159
int max_safe_fds = FD_MINFREE; /* default if not changed */
160160

161+
#ifdef HAVE_GETRLIMIT
162+
static bool saved_original_max_open_files;
163+
static struct rlimit original_max_open_files;
164+
static struct rlimit custom_max_open_files;
165+
#endif
166+
167+
161168
/* Whether it is safe to continue running after fsync() fails. */
162169
bool data_sync_retry = false;
163170

@@ -946,6 +953,152 @@ InitTemporaryFileAccess(void)
946953
#endif
947954
}
948955

956+
/*
957+
* Returns true if the passed in highestfd is the last one that we're allowed
958+
* to open based on our. This should only be called if
959+
*/
960+
static bool
961+
IsOpenFileLimit(int highestfd)
962+
{
963+
#ifdef HAVE_GETRLIMIT
964+
if (!saved_original_max_open_files)
965+
{
966+
return false;
967+
}
968+
969+
return highestfd >= custom_max_open_files.rlim_cur - 1;
970+
#else
971+
return false;
972+
#endif
973+
}
974+
975+
/*
976+
* Increases the open file limit (RLIMIT_NOFILE) by the requested amount.
977+
* Returns true if successful, false otherwise.
978+
*/
979+
static bool
980+
IncreaseOpenFileLimit(int extra_files)
981+
{
982+
#ifdef HAVE_GETRLIMIT
983+
struct rlimit rlim;
984+
985+
if (!saved_original_max_open_files)
986+
{
987+
return false;
988+
}
989+
990+
rlim = custom_max_open_files;
991+
992+
/* If we're already at the max we reached our limit */
993+
if (rlim.rlim_cur == original_max_open_files.rlim_max)
994+
return false;
995+
996+
/* Otherwise try to increase the soft limit to what we need */
997+
rlim.rlim_cur = Min(rlim.rlim_cur + extra_files, rlim.rlim_max);
998+
999+
if (setrlimit(RLIMIT_NOFILE, &rlim) != 0)
1000+
{
1001+
/* We made sure not to exceed the hard limit, so this shouldn't fail */
1002+
ereport(WARNING, (errmsg("setrlimit failed: %m")));
1003+
return false;
1004+
}
1005+
1006+
custom_max_open_files = rlim;
1007+
1008+
elog(LOG, "increased open file limit to %ld", (long) rlim.rlim_cur);
1009+
1010+
return true;
1011+
#else
1012+
return false;
1013+
#endif
1014+
}
1015+
1016+
/*
1017+
* Saves the original open file limit (RLIMIT_NOFILE) the first time when this
1018+
* is called. If called again it's a no-op.
1019+
*
1020+
* Returns true if successful, false otherwise.
1021+
*/
1022+
static void
1023+
SaveOriginalOpenFileLimit(void)
1024+
{
1025+
#ifdef HAVE_GETRLIMIT
1026+
int status;
1027+
1028+
if (saved_original_max_open_files)
1029+
{
1030+
/* Already saved, no need to do it again */
1031+
return;
1032+
}
1033+
1034+
status = getrlimit(RLIMIT_NOFILE, &original_max_open_files);
1035+
if (status != 0)
1036+
{
1037+
ereport(WARNING, (errmsg("getrlimit failed: %m")));
1038+
return;
1039+
}
1040+
1041+
custom_max_open_files = original_max_open_files;
1042+
saved_original_max_open_files = true;
1043+
return;
1044+
#endif
1045+
}
1046+
1047+
/*
1048+
* UseOriginalOpenFileLimit --- Makes the process use the original open file
1049+
* limit that was present at postmaster start.
1050+
*
1051+
* This should be called before spawning subprocesses that might use select(2)
1052+
* which can only handle file descriptors up to 1024.
1053+
*/
1054+
static void
1055+
UseOriginalOpenFileLimit(void)
1056+
{
1057+
#ifdef HAVE_GETRLIMIT
1058+
if (!saved_original_max_open_files)
1059+
{
1060+
return;
1061+
}
1062+
1063+
if (custom_max_open_files.rlim_cur == original_max_open_files.rlim_cur)
1064+
{
1065+
/* Not changed, so no need to call setrlimit at all */
1066+
return;
1067+
}
1068+
1069+
if (setrlimit(RLIMIT_NOFILE, &original_max_open_files) != 0)
1070+
{
1071+
ereport(WARNING, (errmsg("setrlimit failed: %m")));
1072+
}
1073+
#endif
1074+
}
1075+
1076+
/*
1077+
* UseCustomOpenFileLimit --- Makes the process use our custom open file limit
1078+
* after that we configured based on the max_files_per_process GUC.
1079+
*/
1080+
static void
1081+
UseCustomOpenFileLimit(void)
1082+
{
1083+
#ifdef HAVE_GETRLIMIT
1084+
if (!saved_original_max_open_files)
1085+
{
1086+
return;
1087+
}
1088+
1089+
if (custom_max_open_files.rlim_cur == original_max_open_files.rlim_cur)
1090+
{
1091+
/* Not changed, so no need to call setrlimit at all */
1092+
return;
1093+
}
1094+
1095+
if (setrlimit(RLIMIT_NOFILE, &custom_max_open_files) != 0)
1096+
{
1097+
ereport(WARNING, (errmsg("setrlimit failed: %m")));
1098+
}
1099+
#endif
1100+
}
1101+
9491102
/*
9501103
* count_usable_fds --- count how many FDs the system will let us open,
9511104
* and estimate how many are already open.
@@ -969,38 +1122,39 @@ count_usable_fds(int max_to_probe, int *usable_fds, int *already_open)
9691122
int highestfd = 0;
9701123
int j;
9711124

972-
#ifdef HAVE_GETRLIMIT
973-
struct rlimit rlim;
974-
int getrlimit_status;
975-
#endif
976-
9771125
size = 1024;
9781126
fd = (int *) palloc(size * sizeof(int));
9791127

980-
#ifdef HAVE_GETRLIMIT
981-
getrlimit_status = getrlimit(RLIMIT_NOFILE, &rlim);
982-
if (getrlimit_status != 0)
983-
ereport(WARNING, (errmsg("getrlimit failed: %m")));
984-
#endif /* HAVE_GETRLIMIT */
1128+
SaveOriginalOpenFileLimit();
9851129

9861130
/* dup until failure or probe limit reached */
9871131
for (;;)
9881132
{
9891133
int thisfd;
9901134

991-
#ifdef HAVE_GETRLIMIT
992-
9931135
/*
9941136
* don't go beyond RLIMIT_NOFILE; causes irritating kernel logs on
9951137
* some platforms
9961138
*/
997-
if (getrlimit_status == 0 && highestfd >= rlim.rlim_cur - 1)
998-
break;
999-
#endif
1139+
if (IsOpenFileLimit(highestfd))
1140+
{
1141+
if (!IncreaseOpenFileLimit(max_to_probe - used))
1142+
break;
1143+
}
10001144

10011145
thisfd = dup(2);
10021146
if (thisfd < 0)
10031147
{
1148+
/*
1149+
* Eventhough we do the pre-check above, it's still possible that
1150+
* the call to dup fails with EMFILE. This can happen if the last
1151+
* file descriptor was already assigned to an "already open" file.
1152+
* One example of this happening, is if we're already at the soft
1153+
* limit when we call count_usable_fds.
1154+
*/
1155+
if (errno == EMFILE && IncreaseOpenFileLimit(max_to_probe - used))
1156+
continue;
1157+
10041158
/* Expect EMFILE or ENFILE, else it's fishy */
10051159
if (errno != EMFILE && errno != ENFILE)
10061160
elog(WARNING, "duplicating stderr file descriptor failed after %d successes: %m", used);
@@ -2750,6 +2904,7 @@ pg_system(const char *command, uint32 wait_event_info)
27502904
{
27512905
int rc;
27522906

2907+
UseOriginalOpenFileLimit();
27532908
fflush(NULL);
27542909
pgstat_report_wait_start(wait_event_info);
27552910

@@ -2772,6 +2927,7 @@ pg_system(const char *command, uint32 wait_event_info)
27722927
PostRestoreCommand();
27732928

27742929
pgstat_report_wait_end();
2930+
UseCustomOpenFileLimit();
27752931
return rc;
27762932
}
27772933

@@ -2805,6 +2961,19 @@ OpenPipeStream(const char *command, const char *mode)
28052961
ReleaseLruFiles();
28062962

28072963
TryAgain:
2964+
2965+
/*
2966+
* It would be great if we could call UseOriginalOpenFileLimit here, but
2967+
* since popen() also opens a file in the current process (this side of the
2968+
* pipe) we cannot do so safely. Because we might already have many more
2969+
* files open than the original limit.
2970+
*
2971+
* The only way to address this would be implementing a custom popen() that
2972+
* calls UseOriginalOpenFileLimit only around the actual fork call, but
2973+
* that seems too much effort to handle the corner case where this external
2974+
* command uses both select() and tries to open more files than select()
2975+
* allows for.
2976+
*/
28082977
fflush(NULL);
28092978
pqsignal(SIGPIPE, SIG_DFL);
28102979
errno = 0;

0 commit comments

Comments
 (0)