WEBVTT 00:00:00.000 --> 00:00:03.240 - Hello YouTube, hello Michael. 00:00:03.240 --> 00:00:05.040 - Hello Michael. 00:00:05.040 --> 00:00:08.280 - It's great to have you here. 00:00:08.280 --> 00:00:10.640 We're gonna talk databases, it's gonna be so fun. 00:00:10.640 --> 00:00:12.080 - We are, thank you for having me, 00:00:12.080 --> 00:00:13.440 I'm looking forward to it. 00:00:13.440 --> 00:00:14.280 - Absolutely. 00:00:14.280 --> 00:00:15.800 All right, folks out there in the audience, 00:00:15.800 --> 00:00:18.740 feel free to ask questions if you're watching live. 00:00:18.740 --> 00:00:19.720 Thanks for watching the replay 00:00:19.720 --> 00:00:21.440 if you're watching afterwards. 00:00:21.440 --> 00:00:23.480 And with that, let's kick it off. 00:00:23.480 --> 00:00:25.380 Michael, welcome to Talk Python to Me. 00:00:25.380 --> 00:00:28.640 - Thank you, thank you for having me. 00:00:28.640 --> 00:00:29.960 It's a pleasure. 00:00:29.960 --> 00:00:31.200 - Yeah, it's a pleasure to have you. 00:00:31.200 --> 00:00:35.080 I'm honestly pretty psyched to talk about database stuff. 00:00:35.080 --> 00:00:39.240 I feel like databases are these magical centerpieces 00:00:39.240 --> 00:00:41.400 of so many applications, 00:00:41.400 --> 00:00:46.400 and yet, yet they are so unoptimized in many situations 00:00:46.400 --> 00:00:49.600 that if you could just sprinkle a few of these ideas 00:00:49.600 --> 00:00:51.400 that we're gonna talk about on them, 00:00:51.400 --> 00:00:54.640 they'll make the entire app and the entire experience 00:00:54.640 --> 00:00:56.440 for everyone involved better, right? 00:00:56.440 --> 00:00:58.640 Easier for the developers to have code that goes fast 00:00:58.640 --> 00:00:59.800 if their queries are fast, 00:00:59.800 --> 00:01:03.640 customers will be happy that it's not like, wait, did I click the button? Oh, yeah, it's still 00:01:03.640 --> 00:01:07.560 spinning. That's right. You know, like all of these experiences, just like every time I have them, 00:01:07.560 --> 00:01:14.040 I'm just like, remain calm. Somebody didn't put an index somewhere. There should have been one. And 00:01:14.040 --> 00:01:18.440 if this thing will probably work eventually, I don't know if you have those feelings as you go 00:01:18.440 --> 00:01:25.800 around the web. Yes, 100%. I might have taken it a bit far in trying to fix them. But yes, 00:01:25.800 --> 00:01:30.640 I'm going to send a support request and here's the explain document that I've come up with 00:01:30.640 --> 00:01:31.640 for you. 00:01:31.640 --> 00:01:32.640 Fantastic. 00:01:32.640 --> 00:01:33.640 All right. 00:01:33.640 --> 00:01:39.600 Well, we're going to dive into specifically Postgres, but we were chatting a bit before 00:01:39.600 --> 00:01:41.440 we hit record. 00:01:41.440 --> 00:01:46.760 And this much of this guidance expands beyond Postgres, right? 00:01:46.760 --> 00:01:48.680 Even beyond relational potentially. 00:01:48.680 --> 00:01:50.920 Yeah, absolutely. 00:01:50.920 --> 00:01:56.400 And going back to something you said just briefly then, the beauty of databases is they've 00:01:56.400 --> 00:02:01.180 worried so much about performance and you can get up and running, you can get a lot 00:02:01.180 --> 00:02:05.680 of performance out of them without doing even the basics right sometimes. 00:02:05.680 --> 00:02:09.040 But at a certain scale that starts to fall down. 00:02:09.040 --> 00:02:14.600 So yeah, once you're past the first few stages, quite often the performance problems you're 00:02:14.600 --> 00:02:16.560 hitting tend to be database related. 00:02:16.560 --> 00:02:19.000 So yeah, that's what I'm excited about. 00:02:19.000 --> 00:02:25.800 But it's not, yes, most of my recent experiences with Postgres, but the performance is mostly 00:02:25.800 --> 00:02:31.900 about trying to work out how to do less work or how to not do the work in the first place. 00:02:31.900 --> 00:02:37.080 And that's the same whether you're, whichever database you're working with, or yeah, data 00:02:37.080 --> 00:02:38.440 in general, basically. 00:02:38.440 --> 00:02:41.680 Yeah, databases seem magical. 00:02:41.680 --> 00:02:45.240 They take all this data and they come up with answers incredibly quickly, but they're not 00:02:45.240 --> 00:02:46.240 magical. 00:02:46.240 --> 00:02:48.660 and then they know how to sort of estimate 00:02:48.660 --> 00:02:50.920 and plan those together and take guesses. 00:02:50.920 --> 00:02:55.920 And our conversation today will be how to understand 00:02:55.920 --> 00:02:58.200 those different things that databases can do, 00:02:58.200 --> 00:03:00.080 how we can interrogate them 00:03:00.080 --> 00:03:01.480 to tell us what they're trying to do 00:03:01.480 --> 00:03:03.560 and then what changes we can make to make them faster 00:03:03.560 --> 00:03:05.360 and some tools along the way. 00:03:05.360 --> 00:03:09.160 But before we do that, how'd you get into databases 00:03:09.160 --> 00:03:13.000 and being associated with databases? 00:03:13.000 --> 00:03:15.200 - Yeah, so I'm gonna start with the database. 00:03:15.200 --> 00:03:19.640 being associated with PG Mustard and all the stuff you're doing? 00:03:19.640 --> 00:03:20.640 Yes. 00:03:20.640 --> 00:03:26.760 So I, it goes back in university, I studied mathematics. 00:03:26.760 --> 00:03:30.760 So that involved a tiny bit of programming, but really not much, just a little bit of 00:03:30.760 --> 00:03:34.400 statistics that I quickly dropped. 00:03:34.400 --> 00:03:36.600 Tiny bit of mechanics that I dropped as well. 00:03:36.600 --> 00:03:37.600 Very pure. 00:03:37.600 --> 00:03:40.320 There must be something about math in the air. 00:03:40.320 --> 00:03:44.120 The last three episodes I've done have all been like people with PhDs in math and doing 00:03:44.120 --> 00:03:48.200 all sorts of interesting stuff. So and I have a math background as well. But yeah. 00:03:48.200 --> 00:03:49.480 Amazing. 00:03:49.480 --> 00:03:52.280 Yeah, it is. So it started in math. Yeah. 00:03:52.280 --> 00:03:59.800 Yeah, absolutely. No PhD here. But I then went to a... Well, I thought I wanted to be a management 00:03:59.800 --> 00:04:05.320 consultant for my sins back in the day. And then the financial crisis happened. So luckily, 00:04:05.320 --> 00:04:10.840 that got scuppered. And I ended up finding a job at a company called Red Gate, which 00:04:11.720 --> 00:04:13.240 I understand you're actually familiar with. 00:04:13.240 --> 00:04:20.200 So they did database tools, and were predominantly for SQL Server, but also developer tools for 00:04:20.200 --> 00:04:24.240 .NET as you've seen, and then expanded out from there. 00:04:24.240 --> 00:04:30.520 So I started at a time where their most popular product was called SQL Compare for comparing 00:04:30.520 --> 00:04:32.080 databases. 00:04:32.080 --> 00:04:35.520 And just a few months into when I was there, they launched a product called SQL Source 00:04:35.520 --> 00:04:36.520 Control. 00:04:36.520 --> 00:04:40.120 So it got into kind of the change management side of things. 00:04:40.120 --> 00:04:46.120 And luckily got so many opportunities there to do various roles and landed mostly in product 00:04:46.120 --> 00:04:47.120 management. 00:04:47.120 --> 00:04:51.040 Luckily, got a chance to take on their Oracle tools team. 00:04:51.040 --> 00:04:53.320 So it's a tiny team compared to the rest of the company. 00:04:53.320 --> 00:04:57.160 And it was the first foray into their tools for Oracle. 00:04:57.160 --> 00:05:00.600 We also span off a couple of tools for MySQL as well. 00:05:00.600 --> 00:05:06.960 And in working in that small team, we started to see, I wouldn't say the rise of Postgres 00:05:06.960 --> 00:05:12.640 because it's been going for so long, but the kind of the resurgence and the slow and steady 00:05:12.640 --> 00:05:18.080 improvements from that. So yeah, that's that was my kind of first five years of getting 00:05:18.080 --> 00:05:24.240 to know databases. And then what a cool place to do it, right? I mean, there are some very 00:05:24.240 --> 00:05:32.640 hot database companies, you know, Postgres, the MongoDB folks, obviously, Oracle, Microsoft, 00:05:32.640 --> 00:05:36.300 where they make them, but then there's only a few places 00:05:36.300 --> 00:05:38.800 where they really say, and what was missing? 00:05:38.800 --> 00:05:40.260 Let's just fix all of that, right? 00:05:40.260 --> 00:05:41.740 Let's make the developer experience 00:05:41.740 --> 00:05:43.180 really right on databases. 00:05:43.180 --> 00:05:46.740 And I feel like Redgate was very much like that, 00:05:46.740 --> 00:05:49.180 like SQL Compare and the source control stuff. 00:05:49.180 --> 00:05:51.080 That was all very neat. 00:05:51.080 --> 00:05:51.920 - Yeah, absolutely. - Probably still is. 00:05:51.920 --> 00:05:52.740 I haven't touched it for a while, 00:05:52.740 --> 00:05:55.060 but I don't mean to speak of it totally in the past. 00:05:55.060 --> 00:05:57.620 That's my experiences that was in the past. 00:05:57.620 --> 00:05:59.660 - Yeah, and they're doing some cool things now. 00:05:59.660 --> 00:06:02.660 They're doing some of their products across database now. 00:06:02.660 --> 00:06:04.140 They're doing some Postgres things. 00:06:04.140 --> 00:06:06.900 They acquired FlywayDB, which a lot of people use. 00:06:06.900 --> 00:06:12.500 But yes, the thing that first attracted me to them 00:06:12.500 --> 00:06:14.740 was their slogan, believe it or not. 00:06:14.740 --> 00:06:17.120 It was ingeniously simple software, 00:06:17.120 --> 00:06:20.860 which is something about, 00:06:20.860 --> 00:06:22.340 I guess it's the math background as well, 00:06:22.340 --> 00:06:24.060 about getting to a simple solution. 00:06:24.060 --> 00:06:26.940 And we've talked about databases sometimes feeling magic. 00:06:26.940 --> 00:06:29.700 But even if you realize they're not magic, 00:06:29.700 --> 00:06:33.020 they're still complicated or there's certain complexity 00:06:33.020 --> 00:06:34.300 to dealing with them. 00:06:34.300 --> 00:06:36.620 And if you can make that easier for folks, 00:06:36.620 --> 00:06:38.980 or at least help them with better defaults, 00:06:38.980 --> 00:06:42.580 like people designing databases are exceptionally smart 00:06:42.580 --> 00:06:44.580 and they pick the defaults for a reason. 00:06:44.580 --> 00:06:46.700 But in the real world, 00:06:46.700 --> 00:06:50.860 sometimes you can afford to be slightly less conservative 00:06:50.860 --> 00:06:54.660 or choose a certain perspective on something 00:06:54.660 --> 00:06:56.740 and make a tool that's only for a certain subset 00:06:56.740 --> 00:07:02.260 of the users. So you can be a bit more opinionated as a tools vendor versus a platform provider. 00:07:02.260 --> 00:07:07.940 And I like that kind of space where you can differentiate a little bit more on usability 00:07:07.940 --> 00:07:14.260 and performance, maybe a little bit less on the absolute fundamentals. 00:07:14.260 --> 00:07:20.180 Sure. Well, there's so many different situations in which databases are used, right? Are you 00:07:20.180 --> 00:07:25.780 Google? Are you talk Python? Or are you a single little app that one person is using? 00:07:25.780 --> 00:07:32.780 Like these are all very unrecognizable to each other in significant ways. 00:07:32.780 --> 00:07:38.880 Yeah absolutely and the mad thing is they can all run off the same type of database. 00:07:38.880 --> 00:07:39.880 So you could run. 00:07:39.880 --> 00:07:40.880 Yeah that is mad. 00:07:40.880 --> 00:07:41.880 Yeah. 00:07:41.880 --> 00:07:46.380 Yeah that's a really good point that you can run off. 00:07:46.380 --> 00:07:52.020 Basically everyone installs the same database, does some sort of schema creation thing and 00:07:52.020 --> 00:07:54.500 goes from there. 00:07:54.500 --> 00:08:01.060 Absolutely. Or even Mongo, right? Run well. I think that's the key thing. Run well, probably 00:08:01.060 --> 00:08:08.260 most I mean, Facebook still runs off, albeit an admittedly highly modified, very impressive 00:08:08.260 --> 00:08:15.380 MySQL setup. And I think just today we saw a couple, well, maybe in the last two days, 00:08:15.380 --> 00:08:23.060 we've seen a couple of really impressive companies launching SQLite on the edge, you know, so there's 00:08:23.940 --> 00:08:31.420 Really, pretty much any database that's made it to this point can probably support you 00:08:31.420 --> 00:08:35.420 at any scale if you know enough about it or if you want it well. 00:08:35.420 --> 00:08:37.420 Yeah, for sure. 00:08:37.420 --> 00:08:38.420 How interesting. 00:08:38.420 --> 00:08:45.460 All right, now before we get into optimizing these, first let's get the order right. 00:08:45.460 --> 00:08:47.780 Understanding the performance and then optimizing, right? 00:08:47.780 --> 00:08:52.220 Not take a shot at optimizing and then we'll figure out if we got the right spot. 00:08:52.220 --> 00:08:55.980 like profiling and performance optimization of code. 00:08:55.980 --> 00:08:58.660 Let's just talk for a minute about 00:08:58.660 --> 00:09:04.260 of we were no matter what, 00:09:04.260 --> 00:09:06.880 GUI tools for managing databases. 00:09:06.880 --> 00:09:09.820 Because I saw that you were using 00:09:09.820 --> 00:09:11.380 in some presentations you're doing, 00:09:11.380 --> 00:09:16.380 you're using DBeaver, which is pretty interesting. 00:09:16.380 --> 00:09:18.460 And I feel like a lot of people 00:09:18.460 --> 00:09:20.120 don't end up using these tools. 00:09:20.120 --> 00:09:21.700 So maybe you could highlight a couple 00:09:21.700 --> 00:09:24.900 that I think sort of stand out for just interesting database stuff. 00:09:24.900 --> 00:09:29.980 I mean, coming from a tool vendor, of course. 00:09:29.980 --> 00:09:32.980 Yeah, no, go ahead. 00:09:32.980 --> 00:09:33.980 - I do apologize. 00:09:33.980 --> 00:09:36.420 I got a tiny bit of lag there. 00:09:36.420 --> 00:09:44.340 But yes, so if you can hear me clearly, then I do like dbeaver for doing demos. 00:09:44.340 --> 00:09:49.900 It's very easy to run multiple queries in a row that you don't have to type out or copy 00:09:49.900 --> 00:09:52.140 paste around. 00:09:52.140 --> 00:09:59.780 It's also exceptional for post GIS, which is for geographic data. 00:09:59.780 --> 00:10:05.140 I know a few people are really keen on its visualizations for that. 00:10:05.140 --> 00:10:11.300 I tend to not use it day to day anymore, but only because there's so many tools out there 00:10:11.300 --> 00:10:13.700 that you can use. 00:10:13.700 --> 00:10:19.140 I think you've listed a few, but a lot of people are very keen on the default one that 00:10:19.140 --> 00:10:21.180 that ships with Postgres called PSQL, 00:10:21.180 --> 00:10:23.060 which is a command line interface. 00:10:23.060 --> 00:10:24.660 Very, very powerful. 00:10:24.660 --> 00:10:27.500 Can do a lot of things that other tools can't, 00:10:27.500 --> 00:10:29.900 but has quite a steep learning curve in my opinion. 00:10:29.900 --> 00:10:32.900 There's some people doing really good tutorials on, 00:10:32.900 --> 00:10:39.340 I think Leticia is doing some really good conference talks 00:10:39.340 --> 00:10:41.620 at the moment on this and has done them in the past 00:10:41.620 --> 00:10:44.660 and has a website around PSQL tips. 00:10:44.660 --> 00:10:48.420 So yeah, there's loads of good gurus out there. 00:10:48.420 --> 00:10:52.060 My, at the moment I'm using a really modern one 00:10:52.060 --> 00:10:56.060 called ArchType, only recently launched, 00:10:56.060 --> 00:11:01.740 but I tend to try out tools probably earlier than most would. 00:11:01.740 --> 00:11:03.300 - But wait, this looks nice. 00:11:03.300 --> 00:11:04.140 - Right? 00:11:04.140 --> 00:11:05.300 (laughing) 00:11:05.300 --> 00:11:06.460 - It's real nice. 00:11:06.460 --> 00:11:10.460 ArchType, A-R-C-T-Y-P-E.com. 00:11:10.460 --> 00:11:11.580 - You got it. 00:11:11.580 --> 00:11:12.820 - Yeah, right on. 00:11:12.820 --> 00:11:14.500 This is cool. 00:11:14.500 --> 00:11:17.100 Okay, and this works on the different databases 00:11:17.100 --> 00:11:18.100 that you might. 00:11:18.100 --> 00:11:23.360 So they stripped back down to a couple, maybe a year or two ago. 00:11:23.360 --> 00:11:29.860 So it was Postgres and MySQL, I believe, but I think they're adding some back in. 00:11:29.860 --> 00:11:31.480 Not 100% sure though, sorry. 00:11:31.480 --> 00:11:34.040 I'm Postgres through and through. 00:11:34.040 --> 00:11:36.300 Yeah, it solves the problem. 00:11:36.300 --> 00:11:38.240 What else do we need to worry about, right? 00:11:38.240 --> 00:11:39.240 Absolutely. 00:11:39.240 --> 00:11:42.520 And if we're talking specifics, actually, because I'm on Postgres on a Mac, I think 00:11:42.520 --> 00:11:48.800 probably the best long-lived option and one that's been around for years, I would say 00:11:48.800 --> 00:11:56.180 is Postico, which is a Mac-only app. But, yes, it's lightning fast, which I -- so Postico 00:11:56.180 --> 00:11:59.180 with an O at the end. 00:11:59.180 --> 00:12:07.620 >> Cool. Okay. Oh, Mac-only. Okay. Yeah, that's pretty nice. But this other one, this 00:12:07.620 --> 00:12:10.820 archetype looks really nice. 00:12:10.820 --> 00:12:12.380 that looks on everything, right? 00:12:12.380 --> 00:12:14.340 - No, no, of course, you're absolutely right, they're not, 00:12:14.340 --> 00:12:16.060 but they need to work well. 00:12:16.060 --> 00:12:18.500 So the two others that I was gonna throw out there 00:12:18.500 --> 00:12:22.500 that are kind of nice, this one clearly is not for Postgres 00:12:22.500 --> 00:12:25.620 because it's DB browser for SQLite. 00:12:25.620 --> 00:12:28.700 But at the same time, I feel like a lot of people 00:12:28.700 --> 00:12:32.180 are interested in databases, generally not just Postgres, 00:12:32.180 --> 00:12:34.220 right, you can come across SQLite all over the place. 00:12:34.220 --> 00:12:37.140 So I think this one is nice, this one is free. 00:12:37.140 --> 00:12:40.740 And then have you done anything with Bkeeper? 00:12:40.740 --> 00:12:42.500 No, I have seen it around. 00:12:42.500 --> 00:12:45.100 - Yeah, Beekeeper's pretty nice. 00:12:45.100 --> 00:12:47.180 It gives you like auto-complete for your queries 00:12:47.180 --> 00:12:50.120 and ways to explore your schemas. 00:12:50.120 --> 00:12:51.500 But what it seems to not have really 00:12:51.500 --> 00:12:56.500 is any sort of exploration of the schema visually, right? 00:12:56.500 --> 00:13:01.260 There's not like a UML looking diagram type thing, so. 00:13:01.260 --> 00:13:03.540 - Yeah, one last one before we, 00:13:03.540 --> 00:13:06.180 I think a company that are doing great things 00:13:06.180 --> 00:13:08.740 in ID space in general, JetBrains, 00:13:08.740 --> 00:13:11.900 and their DataGrip product is very good as well. 00:13:11.900 --> 00:13:15.580 And very, very cross-platform in terms of different support 00:13:15.580 --> 00:13:16.780 for different databases. 00:13:16.780 --> 00:13:18.900 - Yeah, absolutely. 00:13:18.900 --> 00:13:22.580 I think, yeah, DataGrip is really great, right? 00:13:22.580 --> 00:13:25.820 Like if you use PyCharm, for example, 00:13:25.820 --> 00:13:29.540 and when you're doing the database features of PyCharm, 00:13:29.540 --> 00:13:31.100 it really is just DataGrip, right? 00:13:31.100 --> 00:13:33.860 If you just want the standalone edition. 00:13:33.860 --> 00:13:35.420 So super cool. 00:13:35.420 --> 00:13:36.700 Yeah, these are all really nice. 00:13:36.700 --> 00:13:39.580 and I discovered a few myself as well from this. 00:13:39.580 --> 00:13:42.260 So, awesome. 00:13:42.260 --> 00:13:44.820 I just think it's kind of important 00:13:44.820 --> 00:13:46.420 for people to have these tools 00:13:46.420 --> 00:13:50.500 and maybe you end up just on CLI tools like PSQL, 00:13:50.500 --> 00:13:52.100 but when you're starting, 00:13:52.100 --> 00:13:53.660 I think that that can be the difference of, 00:13:53.660 --> 00:13:55.820 I can do this or I can't do this. 00:13:55.820 --> 00:13:56.660 You know what I mean? 00:13:56.660 --> 00:13:58.580 Rather than efficiency. 00:13:58.580 --> 00:13:59.620 People might just look at it and go, 00:13:59.620 --> 00:14:01.780 that's too much for me, I'm out. 00:14:01.780 --> 00:14:03.900 - Yeah, well, it's also about discovery. 00:14:03.900 --> 00:14:06.060 I think sometimes you don't know what's there 00:14:06.060 --> 00:14:08.940 on what's possible and in a command line tool, 00:14:08.940 --> 00:14:12.140 it's really on you to figure that out. 00:14:12.140 --> 00:14:15.220 Whereas in a GUI, they can leave you hints, 00:14:15.220 --> 00:14:17.020 they can show you the tree of things. 00:14:17.020 --> 00:14:20.380 You can start to realize what is and isn't possible. 00:14:20.380 --> 00:14:24.740 And they've got a lot more freedom to educate you 00:14:24.740 --> 00:14:27.900 as a new user, whereas in a command line tool, 00:14:27.900 --> 00:14:30.200 there's very little they can do on that front. 00:14:30.200 --> 00:14:35.260 So yeah, not all of the GUIs take advantage of that, 00:14:35.260 --> 00:14:37.780 but the good ones. - The possibility is there. 00:14:37.780 --> 00:14:39.220 Yeah, the good ones. 00:14:39.220 --> 00:14:40.780 All right, I was talking about schema 00:14:40.780 --> 00:14:42.300 and I'd like to hear your thoughts on this. 00:14:42.300 --> 00:14:46.100 I'm sure you're more well versed in it than I am. 00:14:46.100 --> 00:14:51.100 When I think about schemas, I think about the tables, 00:14:51.100 --> 00:14:55.540 the maybe the columns, the types of columns that are there. 00:14:55.540 --> 00:14:57.380 So this one is a Varchar 16, 00:14:57.380 --> 00:14:58.980 this one is a float or whatever. 00:14:58.980 --> 00:15:02.980 Would you consider indexes part of the schema? 00:15:04.480 --> 00:15:05.740 - I would, yes. 00:15:05.740 --> 00:15:07.480 - Yeah, okay. 00:15:07.480 --> 00:15:13.580 - I think, yeah, I guess it depends on, 00:15:13.580 --> 00:15:16.200 definitions are hard. 00:15:16.200 --> 00:15:18.000 Two mathematicians discussing definitions, 00:15:18.000 --> 00:15:19.100 how long have you got? 00:15:19.100 --> 00:15:24.620 But I guess when I'm thinking about schema changes 00:15:24.620 --> 00:15:27.080 and deployment scripts and things like that, 00:15:27.080 --> 00:15:29.920 I think it's just as important 00:15:29.920 --> 00:15:31.320 for the functioning of your app. 00:15:31.320 --> 00:15:33.480 An index existing or not 00:15:33.480 --> 00:15:36.520 can be the difference between you effectively being up 00:15:36.520 --> 00:15:37.840 or being down. 00:15:37.840 --> 00:15:43.600 And if it's that important, then I'd probably say it does. 00:15:43.600 --> 00:15:46.880 It is integral. 00:15:46.880 --> 00:15:47.760 >> For sure. 00:15:47.760 --> 00:15:50.000 Quick audience question from William. 00:15:50.000 --> 00:15:52.040 What's the most powerful utility you personally 00:15:52.040 --> 00:15:54.960 got out of PSQL? 00:15:54.960 --> 00:15:56.400 >> Good question. 00:15:56.400 --> 00:15:57.920 >> What's the most powerful thing 00:15:57.920 --> 00:15:59.520 you find yourself doing with PSQL, 00:15:59.520 --> 00:16:03.400 maybe is a good way to phrase it. 00:16:03.400 --> 00:16:17.900 So, yeah, I actually don't, it feels like I'm probably not the best person to ask. 00:16:17.900 --> 00:16:20.140 I tend to use it for really quick things. 00:16:20.140 --> 00:16:25.960 So it's more of like, it's by far the easiest and quickest to throw a quick query into and 00:16:25.960 --> 00:16:28.280 get a really quick result back out of. 00:16:28.280 --> 00:16:31.360 There's zero load time or anything like that. 00:16:31.360 --> 00:16:32.360 Right. 00:16:32.360 --> 00:16:35.360 And you get text output straight back. 00:16:35.360 --> 00:16:39.360 You don't have to worry about any formatting or anything. 00:16:39.360 --> 00:16:45.360 Right. Open your terminal, type psql, some query, you see the answers right there. 00:16:45.360 --> 00:16:52.360 Yeah. But that's not like, there's probably people ripping their hair out and screaming at the screen saying, 00:16:52.360 --> 00:16:54.360 "Why didn't you mention this amazing feature of psql?" 00:16:54.360 --> 00:17:00.360 But the truth is I'm not a heavy database user in the grand scheme of things. 00:17:00.360 --> 00:17:02.960 You know, people doing DBA work. 00:17:02.960 --> 00:17:09.440 I make and help design and run a tools company, not a, like we don't have a massive database. 00:17:09.440 --> 00:17:13.240 I don't do tons of analytical queries day to day. 00:17:13.240 --> 00:17:18.280 So those, yeah, it's the DBAs of this world and the people that have got a lot of hands-on 00:17:18.280 --> 00:17:22.720 database experience probably do a lot more of that. 00:17:22.720 --> 00:17:29.400 But I would, in terms of recommendations, I think Leticia Avereux does a, has a website 00:17:29.400 --> 00:17:35.720 called psql tips and I think if you go to it, it just loads a random tip of about 100 00:17:35.720 --> 00:17:39.840 and something that she keeps adding to and I think that's a great, like that's a fun 00:17:39.840 --> 00:17:44.920 way to get some discovery or learn some features in a tool. 00:17:44.920 --> 00:17:45.920 Yeah, absolutely. 00:17:45.920 --> 00:17:50.880 All right, you spoke about the being in a tools place. 00:17:50.880 --> 00:17:53.640 You want to just give a quick shout out to PG Mustard and we'll come back and talk about 00:17:53.640 --> 00:17:54.640 it more later? 00:17:54.640 --> 00:17:55.640 Yeah, sure. 00:17:55.640 --> 00:17:57.560 That's sort of the second question I was gonna ask you, 00:17:57.560 --> 00:18:00.980 or not making a speedy progress, which is great. 00:18:00.980 --> 00:18:03.400 But, so you were at Redgate, 00:18:03.400 --> 00:18:06.400 and now you've gone on to start your own company, 00:18:06.400 --> 00:18:10.080 co-found your own company around database tools. 00:18:10.080 --> 00:18:11.000 - Yeah, absolutely. 00:18:11.000 --> 00:18:12.920 So I worked at a couple of companies in between, 00:18:12.920 --> 00:18:15.880 one of which was a big fan of Postgres. 00:18:15.880 --> 00:18:17.960 So there was a company called GoCardless, 00:18:17.960 --> 00:18:20.320 and the team there were exceptional, 00:18:20.320 --> 00:18:21.800 loved my time there, 00:18:21.800 --> 00:18:24.600 and they ran quite a big payments company 00:18:24.600 --> 00:18:27.960 by the time I left on a single Postgres database. 00:18:27.960 --> 00:18:30.680 And it was, yeah, the team were big fans 00:18:30.680 --> 00:18:32.180 and completely converted me. 00:18:32.180 --> 00:18:35.240 I was already keen on Postgres or the idea of it 00:18:35.240 --> 00:18:37.920 from Redgate time and hearing what customers 00:18:37.920 --> 00:18:39.640 were saying about it. 00:18:39.640 --> 00:18:41.440 So I was pretty sold on the idea. 00:18:41.440 --> 00:18:44.280 I loved the platform, I loved how extensive it was, 00:18:44.280 --> 00:18:46.200 I loved that you could see the source code, 00:18:46.200 --> 00:18:48.920 you could even contribute back to it. 00:18:48.920 --> 00:18:51.000 So I was a big fan of the platform. 00:18:51.000 --> 00:18:58.120 But my first ideas for a business were pretty lukewarm, let's say. 00:18:58.120 --> 00:19:00.920 So there's a regular product I love called P-SQL. 00:19:00.920 --> 00:19:05.120 So you mentioned autocomplete in, I think it was Beekeeper. 00:19:05.120 --> 00:19:10.600 They had an autocomplete tool in SSMS, which is SQL Server Management Studio, 00:19:10.600 --> 00:19:15.840 which people loved and it's an everyday use tool and you could easily see yourself 00:19:15.840 --> 00:19:17.600 making it better and better for years. 00:19:17.600 --> 00:19:19.560 So that was my kind of ideal tool. 00:19:20.200 --> 00:19:24.140 But building it would have entailed either integrating 00:19:24.140 --> 00:19:28.580 into the 25 IDs that we've looked at so far, 00:19:28.580 --> 00:19:32.220 or picking one and really maybe not being that useful. 00:19:32.220 --> 00:19:34.380 So ended up trying to build something 00:19:34.380 --> 00:19:36.020 a little bit more standalone. 00:19:36.020 --> 00:19:41.020 And there's a few really good open source query plan 00:19:41.020 --> 00:19:46.380 visualization tools, and they do a lot of work 00:19:46.380 --> 00:19:49.560 to help you see where the time's going in a query 00:19:49.560 --> 00:19:51.560 and give you some basic kind of guidance 00:19:51.560 --> 00:19:56.560 and maybe not basic, maybe super DBA friendly guidance. 00:19:56.560 --> 00:20:01.760 So it's often built by DBAs for DBAs type tools. 00:20:01.760 --> 00:20:04.720 - Advanced but maybe basic in his presentation. 00:20:04.720 --> 00:20:05.560 Something like that, right? 00:20:05.560 --> 00:20:07.400 It's just like, here's a wall of text. 00:20:07.400 --> 00:20:10.980 It speaks your language, go figure it out. 00:20:10.980 --> 00:20:11.960 - Yeah, exactly. 00:20:11.960 --> 00:20:15.280 And I quite often saw in different places, 00:20:15.280 --> 00:20:18.200 people would post links to them and say, 00:20:18.200 --> 00:20:19.360 I've put it into this tool, 00:20:19.360 --> 00:20:21.200 but what can I do to speed it up? 00:20:21.200 --> 00:20:24.200 So there was clearly still a missing step 00:20:24.200 --> 00:20:26.160 of what can we do about it? 00:20:26.160 --> 00:20:27.520 Okay, here's where the time's going, 00:20:27.520 --> 00:20:30.000 but what can I actually do about that? 00:20:30.000 --> 00:20:33.440 So tried to build something that would help solve that, 00:20:33.440 --> 00:20:36.400 solve that problem, and this is what we came up with. 00:20:36.400 --> 00:20:38.880 - Yeah, it's really cool. 00:20:38.880 --> 00:20:43.880 And the way it works is you give it explain, 00:20:43.880 --> 00:20:48.920 you have Postgres explain your query plan to it, to you, 00:20:48.920 --> 00:20:53.920 and then you give that query plan to bgmustard 00:20:53.920 --> 00:20:56.560 and it puts it into a graphical form 00:20:56.560 --> 00:20:59.920 and into a more digestible form. 00:20:59.920 --> 00:21:00.880 And it's interesting. 00:21:00.880 --> 00:21:03.080 But what's really interesting is it gives you advice. 00:21:03.080 --> 00:21:04.840 It says, you know what? 00:21:04.840 --> 00:21:07.540 This sort is happening without an index 00:21:07.540 --> 00:21:09.140 and that's where you're spending most of your time. 00:21:09.140 --> 00:21:11.240 Here's your big opportunity for a win. 00:21:11.240 --> 00:21:14.400 We'll come back to this, like I said, 00:21:14.400 --> 00:21:17.400 after we talk about finding these problems. 00:21:17.400 --> 00:21:18.320 But I just wanna point out, 00:21:18.320 --> 00:21:21.600 I had, we'd covered this on Talk Python. 00:21:21.600 --> 00:21:23.740 One of our guests mentioned it. 00:21:23.740 --> 00:21:26.440 I think it was Henik. 00:21:26.440 --> 00:21:29.680 And then we also talked about it on Python Bytes. 00:21:29.680 --> 00:21:31.400 My other show is just one of the topics. 00:21:31.400 --> 00:21:35.120 And I don't remember which place we talked about it. 00:21:35.120 --> 00:21:36.800 But somebody made the joke of, 00:21:36.800 --> 00:21:38.720 oh, this is my new consulting job. 00:21:38.720 --> 00:21:42.920 What I'm gonna do is I'm gonna buy a license to PG Mustard 00:21:42.920 --> 00:21:45.240 and I'm gonna go around and optimize people's databases 00:21:45.240 --> 00:21:46.720 by dropping the query plans in here 00:21:46.720 --> 00:21:48.160 and just taking its advice. 00:21:48.160 --> 00:21:52.440 which I thought was hilarious, but also plausible. 00:21:52.440 --> 00:21:55.640 - We have some consultants that use us. 00:21:55.640 --> 00:21:57.440 - I can imagine. 00:21:57.440 --> 00:22:01.240 I mean, it's really good visualizations 00:22:01.240 --> 00:22:05.360 and interesting, often useful advice that it gives. 00:22:05.360 --> 00:22:08.420 So, all right, that is all super cool, 00:22:08.420 --> 00:22:11.960 but let's go back more broadly to Postgres, 00:22:11.960 --> 00:22:14.280 but then maybe even, like I said, more general. 00:22:14.280 --> 00:22:16.280 I feel like a lot of this applies to 00:22:16.280 --> 00:22:20.280 MySQL, Microsoft SQL Server, and even Mongo. 00:22:20.280 --> 00:22:23.280 So let's just start out about 00:22:23.280 --> 00:22:27.280 maybe you're the unfortunate person who is running one of these websites 00:22:27.280 --> 00:22:31.280 that makes me think about the optimizations that were missed 00:22:31.280 --> 00:22:34.280 as I wait for the spinning thing to load. 00:22:34.280 --> 00:22:37.280 And it's your job to figure out what is wrong. 00:22:37.280 --> 00:22:42.280 So maybe we could talk about some of the techniques for finding 00:22:42.280 --> 00:22:44.280 problems with your database. 00:22:44.280 --> 00:22:46.280 Let's just start there. 00:22:46.280 --> 00:22:48.280 My database is slow. Help. 00:22:48.280 --> 00:22:50.280 Perfect. So 00:22:50.280 --> 00:22:52.280 a lot of databases 00:22:52.280 --> 00:22:54.280 will have some version 00:22:54.280 --> 00:22:56.280 of a slow query log 00:22:56.280 --> 00:22:58.280 or a built-in 00:22:58.280 --> 00:23:00.280 monitoring view 00:23:00.280 --> 00:23:02.280 or you'll have a 00:23:02.280 --> 00:23:04.280 monitoring tool set up 00:23:04.280 --> 00:23:06.280 yourself to 00:23:06.280 --> 00:23:08.280 log and aggregate this kind of thing. 00:23:08.280 --> 00:23:10.280 And 00:23:10.280 --> 00:23:15.480 Normally the starting point, so I guess there's a few starting points. 00:23:15.480 --> 00:23:20.640 Either customer reports, customer or team member or somebody reports something's slow 00:23:20.640 --> 00:23:23.800 or your database is on fire. 00:23:23.800 --> 00:23:27.200 It's you know maxed out on some something. 00:23:27.200 --> 00:23:30.800 - CPU memory disk, some combination thereof. 00:23:30.800 --> 00:23:34.960 - Yeah exactly or in the cloud maybe you've run out of your provision. 00:23:34.960 --> 00:23:39.960 Yeah anyway so there's that. 00:23:39.960 --> 00:23:42.720 But yeah, let's take the I've got one slow query. 00:23:42.720 --> 00:23:46.040 So in that case, you probably know or can get 00:23:46.040 --> 00:23:51.040 from the query log, the actual query that was being run. 00:23:51.040 --> 00:23:52.000 - Sure. 00:23:52.000 --> 00:23:55.720 Let me just, yeah, let me just make a little side comment 00:23:55.720 --> 00:23:58.280 here is like, if you're using an ORM, 00:23:58.280 --> 00:24:02.360 it can be a little bit harder to see what that is, right? 00:24:02.360 --> 00:24:05.040 You might have to attach some kind of real time 00:24:05.040 --> 00:24:09.760 logging tool to the database or get it to log it to a file 00:24:09.760 --> 00:24:11.600 and go track it down. 00:24:11.600 --> 00:24:14.780 But also, a lot of the ORMs have features 00:24:14.780 --> 00:24:17.200 that you can make it spit out the queries 00:24:17.200 --> 00:24:18.760 like in super verbose mode. 00:24:18.760 --> 00:24:21.160 So for example, SQLAlchemy, when you create the engine, 00:24:21.160 --> 00:24:23.140 you can say echo equals true, 00:24:23.140 --> 00:24:24.400 but it's not just SQLAlchemy, 00:24:24.400 --> 00:24:25.400 you can do it with many of them, 00:24:25.400 --> 00:24:28.240 and it'll start printing the queries, 00:24:28.240 --> 00:24:31.000 and maybe that's a place to start, right? 00:24:31.000 --> 00:24:34.120 - Yeah, absolutely, and some of them have, 00:24:34.120 --> 00:24:36.320 there's some tools now increasingly 00:24:36.320 --> 00:24:38.040 that let you even do the explain. 00:24:38.040 --> 00:24:40.680 So yeah, the first step is get the query, 00:24:40.680 --> 00:24:43.520 but in some of these frameworks and ORMs, 00:24:43.520 --> 00:24:47.200 you can even ask for the explain plan directly as well. 00:24:47.200 --> 00:24:49.800 Increasingly even with, 00:24:49.800 --> 00:24:52.160 and this is where we start to get a little bit more technical 00:24:52.160 --> 00:24:53.900 and a bit more database specific, 00:24:53.900 --> 00:24:57.240 but I think it's the same across many databases, 00:24:57.240 --> 00:25:01.680 but you want explain will give you the plan 00:25:01.680 --> 00:25:06.200 of what the database has chosen to be, 00:25:06.200 --> 00:25:10.360 probably the optimal route to do that query or to get you that data or to 00:25:10.360 --> 00:25:13.120 insert that data and 00:25:13.120 --> 00:25:21.000 But it if you if you want to get the performance information and get really start to dig into exactly why it's slow 00:25:21.000 --> 00:25:26.620 you need to run something called explain analyze which will also run the query so 00:25:26.620 --> 00:25:34.840 Be careful around that if you're running kind of big delete or something and probably not run it on production, please please please 00:25:35.560 --> 00:25:37.020 Why is this delete slow? 00:25:37.020 --> 00:25:38.380 Oh, wait, why is there no data? 00:25:38.380 --> 00:25:39.300 It's fast now. 00:25:39.300 --> 00:25:39.860 That's great. 00:25:39.860 --> 00:25:41.540 Wait, why is it fast? 00:25:41.540 --> 00:25:42.220 There's no data. 00:25:42.220 --> 00:25:44.020 Or you go to run it a second time 00:25:44.020 --> 00:25:46.100 and don't understand why it's not-- 00:25:46.100 --> 00:25:48.300 Yeah, it's different somehow. 00:25:48.300 --> 00:25:51.500 Yeah, so explain, analyze, we'll run the query behind the scenes. 00:25:51.500 --> 00:25:53.940 But instead of getting the data-- or instead of, let's say, 00:25:53.940 --> 00:25:56.300 getting the data back or getting a normal return, 00:25:56.300 --> 00:26:01.060 you get a query plan back with performance data in it 00:26:01.060 --> 00:26:04.340 with quite a lot of statistics. 00:26:04.340 --> 00:26:09.340 So yes, you just append that to the beginning of your query, 00:26:09.340 --> 00:26:15.240 run it as normal, and then it gives you the query plan. 00:26:15.240 --> 00:26:18.080 So that's step one is get that information 00:26:18.080 --> 00:26:19.660 from your database. 00:26:19.660 --> 00:26:21.980 - Right, and the way that that looks is 00:26:21.980 --> 00:26:24.620 instead of writing SELECT * FROM table where, 00:26:24.620 --> 00:26:28.420 you write EXPLAIN SELECT * FROM that, 00:26:28.420 --> 00:26:31.540 or EXPLAIN ANALYZE, depending on 00:26:31.540 --> 00:26:34.320 whether you want that timing information, right? 00:26:34.320 --> 00:26:39.320 - Exactly, so if you have, and this is probably, 00:26:39.320 --> 00:26:41.760 hopefully not many of you are in this situation ever, 00:26:41.760 --> 00:26:43.980 but if you have a query that never returns, 00:26:43.980 --> 00:26:46.700 let's say it's gonna take six hours maybe, 00:26:46.700 --> 00:26:50.660 you probably wanna get the explain plan, 00:26:50.660 --> 00:26:53.720 just to get a query plan, because explain analyze 00:26:53.720 --> 00:26:55.320 is gonna take as long as your query, 00:26:55.320 --> 00:26:57.280 because it is executing behind the scenes. 00:26:57.280 --> 00:27:00.440 - It runs it and figures out how much IO it's doing 00:27:00.440 --> 00:27:01.880 and stuff like that, right? 00:27:01.880 --> 00:27:04.040 - Yeah, exactly, oh, good question. 00:27:04.040 --> 00:27:08.080 So to add IO information, at least in Postgres, 00:27:08.080 --> 00:27:10.120 you need to include an additional parameter. 00:27:10.120 --> 00:27:13.000 So it would be explain analyze buffers. 00:27:13.000 --> 00:27:16.520 And this is part of my advice to people is, 00:27:16.520 --> 00:27:19.600 especially when, well, probably everybody, 00:27:19.600 --> 00:27:21.380 but especially when you're new, 00:27:21.380 --> 00:27:24.640 and you might need to ask for help from somebody else, 00:27:24.640 --> 00:27:28.000 if you add as many parameters you can to that explain, 00:27:28.000 --> 00:27:30.100 if you're in an IDE, just select them all. 00:27:30.100 --> 00:27:32.560 Once you're getting timing, 00:27:32.560 --> 00:27:34.280 there's not that much additional overhead 00:27:34.280 --> 00:27:37.040 to get in the other information as well. 00:27:37.040 --> 00:27:40.960 But in Postgres, we recommend buffers for both settings. 00:27:40.960 --> 00:27:44.760 And there's even a new one 00:27:44.760 --> 00:27:46.960 that gives you right ahead log information as well. 00:27:46.960 --> 00:27:49.020 So if you ask for all of it, 00:27:49.020 --> 00:27:50.320 then you only have to get it once. 00:27:50.320 --> 00:27:52.240 And even if you have to go to ask, 00:27:52.240 --> 00:27:53.360 maybe you're pasting that into a tool, 00:27:53.360 --> 00:27:55.400 but maybe you're asking a colleague for help, 00:27:55.400 --> 00:27:58.080 they might spot something in that extra information 00:27:58.080 --> 00:27:59.840 that helps them help you. 00:27:59.840 --> 00:28:01.400 - Right, right, right, okay. 00:28:02.360 --> 00:28:05.440 Michael in the audience asked a question which makes me 00:28:05.440 --> 00:28:06.920 think of my next one here. 00:28:06.920 --> 00:28:09.960 Says Google Cloud SQL just added some sweet query 00:28:09.960 --> 00:28:12.320 monitoring and planning visualization. 00:28:12.320 --> 00:28:14.560 So that's pretty nice. 00:28:14.560 --> 00:28:17.560 - Yeah, Google Cloud SQL doing some really cool things. 00:28:17.560 --> 00:28:20.480 They also just recently announced 00:28:20.480 --> 00:28:22.480 that you can do major version upgrades 00:28:22.480 --> 00:28:26.760 much more easily now, which they didn't have before. 00:28:26.760 --> 00:28:28.680 So yeah, the cloud providers are doing 00:28:28.680 --> 00:28:30.760 some really interesting things. 00:28:30.760 --> 00:28:39.720 The query plan visualization stuff is obviously very interesting to us, but there is a natural 00:28:39.720 --> 00:28:42.360 trade-off with performance as always. 00:28:42.360 --> 00:28:49.560 There's a trade-off in general between monitoring constantly for these things versus getting 00:28:49.560 --> 00:28:50.960 them when you need them. 00:28:50.960 --> 00:28:56.240 There's a small overhead, well I say small, but normally small overhead to constantly 00:28:56.240 --> 00:29:01.280 profiling constantly monitoring for these things and getting timings does have some overhead. 00:29:01.280 --> 00:29:06.080 So there is some advice out there to not do this in general for most applications. 00:29:06.080 --> 00:29:12.240 But we have customers that get the query plan for, they don't log it for every query, but they get it 00:29:12.240 --> 00:29:18.080 for every query so that if it's slow, they put it into the logs and it can be investigated later. 00:29:18.080 --> 00:29:43.880 Interesting. Yeah, so my question is, let's suppose I'm going to get some cloud database type thing, right? Managed Postgres, let's say, and I also happen to have Postgres running locally. Or maybe I'm fully insane. And I have SQL lite local, but then Postgres and yeah, that's gonna make make you worry. I'm sure but 00:29:45.200 --> 00:29:49.640 I run one of these explain analyze plans against them. 00:29:49.640 --> 00:29:50.640 Are they comparable? 00:29:50.640 --> 00:29:56.840 Do I need to make sure that I pointed at my production system when I'm asking these questions? 00:29:56.840 --> 00:30:01.720 Yeah, let's take, let's start with a Postgres, Postgres example. 00:30:01.720 --> 00:30:03.280 That's going to be easier. 00:30:03.280 --> 00:30:09.520 And yeah, my main, the main thing that's tricky there is, well, there's a few things to get 00:30:09.520 --> 00:30:10.520 right. 00:30:10.520 --> 00:30:19.360 Naturally, the schema being as similar as possible is pretty important. 00:30:19.360 --> 00:30:22.960 The config of the database as well, as similar as possible. 00:30:22.960 --> 00:30:27.200 So there's a bunch of config changes that you might have made on production that's pretty 00:30:27.200 --> 00:30:31.400 important to have locally as well, so that the planner's making similar decisions. 00:30:31.400 --> 00:30:39.600 And then the big one that we see people do, sadly, far too often, is having a really tiny 00:30:39.600 --> 00:30:46.200 data set locally. So, you know, generating 100 rows or even worse, inserting three rows 00:30:46.200 --> 00:30:56.600 or not having any data in a table. And databases, as we've discussed, magical or let's say clever. 00:30:56.600 --> 00:31:01.360 And they know that if there's the amount of data is really important to them in terms 00:31:01.360 --> 00:31:08.280 of their choices. So, you mentioned different, I think, the join algorithms. Postgres, for 00:31:08.280 --> 00:31:13.160 Yeah, has multiple join algorithms that can use and if there's very little data, it will 00:31:13.160 --> 00:31:17.000 choose one that has a very low startup cost. 00:31:17.000 --> 00:31:21.040 Even if it gets would get expensive at lots of data because it knows there's or it thinks 00:31:21.040 --> 00:31:22.520 there's not much. 00:31:22.520 --> 00:31:27.160 Whereas in a world it's saying if locally you've got not much data but then on production 00:31:27.160 --> 00:31:32.000 you've got a lot of data, it might choose a different join algorithm. 00:31:32.000 --> 00:31:36.000 So the choices it's making are very different depending on the amount of data. 00:31:36.000 --> 00:31:39.960 So that tends to be the one that trips people up most. 00:31:39.960 --> 00:31:44.620 But it's not just algorithms, it's also whether it would pick an index or not. 00:31:44.620 --> 00:31:46.480 That's the one that really confuses people. 00:31:46.480 --> 00:31:50.760 So you're like, I know there's an index in here, why isn't it using it? 00:31:50.760 --> 00:31:51.760 Exactly. 00:31:51.760 --> 00:31:53.120 It's a super common question. 00:31:53.120 --> 00:31:55.880 And it turns out you've only got 10 rows there. 00:31:55.880 --> 00:31:57.720 They all fit on a single block. 00:31:57.720 --> 00:32:01.640 And it's much faster for Postgres just to return that single block to you than it is 00:32:01.640 --> 00:32:04.600 to look up each entry in an index. 00:32:04.600 --> 00:32:07.340 So yeah, exactly. 00:32:07.340 --> 00:32:09.360 - Yeah, okay. 00:32:09.360 --> 00:32:13.060 So I did wanna give a shout out to a couple of tools 00:32:13.060 --> 00:32:15.100 for having fake data. 00:32:15.100 --> 00:32:17.260 Maybe you've got some recommendations as well, 00:32:17.260 --> 00:32:21.740 but there's really cool tools like this one at makaroo.com. 00:32:21.740 --> 00:32:23.420 Are you familiar with Makaroo? 00:32:23.420 --> 00:32:24.700 - I'm not, no. 00:32:24.700 --> 00:32:25.660 - So what's cool about this, 00:32:25.660 --> 00:32:27.500 like you can go and generate fake data, right? 00:32:27.500 --> 00:32:29.300 So of all sorts of things, 00:32:29.300 --> 00:32:30.400 but one of the problems is like, 00:32:30.400 --> 00:32:31.940 well, the fake data is too fake. 00:32:31.940 --> 00:32:34.420 It doesn't look like the right structure 00:32:34.420 --> 00:32:37.240 or it doesn't validate or, you know, 00:32:37.240 --> 00:32:39.700 it's also good just for developing things. 00:32:39.700 --> 00:32:41.460 But like, for example, you can have 00:32:41.460 --> 00:32:44.300 all sorts of stuff in here and it gives you, 00:32:44.300 --> 00:32:46.260 it has the thing that's cool is there's a library 00:32:46.260 --> 00:32:48.000 of types of things that I can get. 00:32:48.000 --> 00:32:51.140 So I could get like all sorts of stuff to do with cars, 00:32:51.140 --> 00:32:53.980 that has real car makes and models and years 00:32:53.980 --> 00:32:58.980 or VIN numbers or credit cards or genders 00:32:58.980 --> 00:33:00.460 or those types of things. 00:33:00.460 --> 00:33:05.000 And then you can go and you can export the schema 00:33:05.000 --> 00:33:11.080 as CSV or JSON or a create table SQL insert script, 00:33:11.080 --> 00:33:14.340 which is pretty cool. 00:33:14.340 --> 00:33:18.360 So if you need more real data, 00:33:18.360 --> 00:33:21.440 places like this for generating them or Faker, 00:33:21.440 --> 00:33:24.800 which is a package for Python and these other ones 00:33:24.800 --> 00:33:28.020 that you can just make enough real fake data, right? 00:33:29.360 --> 00:33:31.760 - Yeah, exactly, and this is super powerful 00:33:31.760 --> 00:33:36.760 in terms of getting as similar data as possible 00:33:36.760 --> 00:33:39.400 to your production system without having to worry 00:33:39.400 --> 00:33:42.080 about personal sensitive information. 00:33:42.080 --> 00:33:43.760 - Yeah, that's a big concern, right? 00:33:43.760 --> 00:33:48.080 It's like, if you take production data, 00:33:48.080 --> 00:33:49.520 let's suppose it's not that massive, 00:33:49.520 --> 00:33:52.080 and you give it to all the developers, 00:33:52.080 --> 00:33:53.320 somebody's gonna lose it. 00:33:53.320 --> 00:33:54.720 Somebody's gonna lose their laptop 00:33:54.720 --> 00:33:55.920 without the drive being encrypted. 00:33:55.920 --> 00:33:58.080 They're gonna put it on a USB stick to transfer it. 00:33:58.080 --> 00:34:00.760 that's just like fat 32 with no security 00:34:00.760 --> 00:34:02.320 or something terrible, right? 00:34:02.320 --> 00:34:04.320 - Yeah, absolutely. 00:34:04.320 --> 00:34:06.000 And then you might not even want people 00:34:06.000 --> 00:34:09.020 to even have select access to some of that stuff 00:34:09.020 --> 00:34:11.220 in certain environments. 00:34:11.220 --> 00:34:13.460 So yeah, but it's, you know, 00:34:13.460 --> 00:34:16.660 probably the more important thing in my world is, 00:34:16.660 --> 00:34:20.240 or the first stumbling block is the sheer amount of data. 00:34:20.240 --> 00:34:22.440 - Yeah, the number of rows, right? 00:34:22.440 --> 00:34:25.560 - Yeah, so even if, you know, 00:34:25.560 --> 00:34:27.240 some of these fields aren't perfect, 00:34:27.240 --> 00:34:30.600 maybe you've got numbers in as names and things like that. 00:34:30.600 --> 00:34:32.200 It's probably not gonna make a huge difference 00:34:32.200 --> 00:34:34.440 to performance unless you're really far out 00:34:34.440 --> 00:34:36.200 in terms of the size of that data. 00:34:36.200 --> 00:34:40.240 So if you're inserting kind of multi-kilobyte blobs 00:34:40.240 --> 00:34:42.840 instead of gender, 00:34:42.840 --> 00:34:45.180 then you're gonna maybe see some differences, 00:34:45.180 --> 00:34:49.000 but not as much as if there were a million rows 00:34:49.000 --> 00:34:50.120 versus if there were two. 00:34:50.120 --> 00:34:53.640 So step one is make sure you've got enough. 00:34:53.640 --> 00:34:54.480 And then step two, 00:34:54.480 --> 00:34:58.320 maybe if you need to make it a lot more realistic. 00:34:58.320 --> 00:35:01.000 - Yeah, size and data type, 00:35:01.000 --> 00:35:03.760 but not necessarily super precise, yeah. 00:35:03.760 --> 00:35:05.800 - Great point on data type as well, yes, 00:35:05.800 --> 00:35:07.720 especially when it comes to indexing. 00:35:07.720 --> 00:35:10.920 - Right, exactly, 'cause indexing a string case 00:35:10.920 --> 00:35:13.340 and sensitive is different than a number. 00:35:13.340 --> 00:35:17.920 - Yes, yes, great point on case and sensitivity. 00:35:17.920 --> 00:35:19.200 The other thing I wanted to call out 00:35:19.200 --> 00:35:21.640 on the Postgres side of this is it's got 00:35:21.640 --> 00:35:24.420 a really powerful generate series function 00:35:24.420 --> 00:35:28.620 that can be used to mock up data quite quickly and easily. 00:35:28.620 --> 00:35:30.300 Not as powerful as some of these, 00:35:30.300 --> 00:35:33.980 but simpler and built into Postgres already. 00:35:33.980 --> 00:35:35.860 So that's quite cool. 00:35:35.860 --> 00:35:40.140 And Ryan Booze from Timescale has been doing 00:35:40.140 --> 00:35:44.240 some quite fancy demos of that, 00:35:44.240 --> 00:35:48.380 of using that to do quite interesting time series, 00:35:48.380 --> 00:35:51.660 you know, huge amounts of data being populated via it. 00:35:51.660 --> 00:35:54.520 So it is surprisingly powerful as well. 00:35:54.520 --> 00:35:58.300 - All right, speaking of time series, 00:35:58.300 --> 00:36:00.980 Michael asks, "Any tips for storing machine learning 00:36:00.980 --> 00:36:03.660 "embeddings in Postgres?" 00:36:03.660 --> 00:36:06.780 Best thing he's seen is storing them as binary blobs. 00:36:06.780 --> 00:36:09.060 I have no advice here. 00:36:09.060 --> 00:36:11.140 Sorry, I don't have any experience in it. 00:36:11.140 --> 00:36:13.620 How about you? 00:36:13.620 --> 00:36:15.140 - I don't think I fully understood 00:36:15.140 --> 00:36:18.440 what would be different about it. 00:36:18.440 --> 00:36:21.300 In terms of binary blobs-- 00:36:21.300 --> 00:36:26.600 I think generally what happens with the machine learning stuff is people will train up these 00:36:26.600 --> 00:36:29.900 models and they just save them as just binary blobs. 00:36:29.900 --> 00:36:37.180 So it's more of a tips for storing binary data perhaps in general. 00:36:37.180 --> 00:36:42.380 So this is second hand. 00:36:42.380 --> 00:36:46.340 This is not something I have direct experience of, but I've read a lot about performance, 00:36:46.340 --> 00:36:48.160 believe it or not, over the years. 00:36:48.160 --> 00:36:55.160 It seems to me that the advice is in the early days, keep it simple. 00:36:55.160 --> 00:37:00.080 Maybe you want to store blobs in the database, but you could also -- there's very cheap blob 00:37:00.080 --> 00:37:06.600 storage out there, and you can store IDs to those in the database. 00:37:06.600 --> 00:37:10.480 But if you want to, if you want to only have one tool to start with, you can store them. 00:37:10.480 --> 00:37:11.880 The database allows you to. 00:37:11.880 --> 00:37:14.520 There's specific types in Postgres for that. 00:37:14.520 --> 00:37:17.400 Jason Beef support in Postgres is amazing. 00:37:17.400 --> 00:37:20.000 And I think probably led to, 00:37:20.000 --> 00:37:22.200 well, it was partly responsible 00:37:22.200 --> 00:37:24.040 for why it's popular today, I think. 00:37:24.040 --> 00:37:25.640 - I think so too. 00:37:25.640 --> 00:37:28.620 I think people see it as a significant, 00:37:28.620 --> 00:37:32.200 let's say a really good choice 00:37:32.200 --> 00:37:33.720 that's like some middle ground 00:37:33.720 --> 00:37:35.440 between a purely relational database 00:37:35.440 --> 00:37:37.280 and document databases like Mongo. 00:37:37.280 --> 00:37:39.920 They're like, but we can kind of have both, right? 00:37:39.920 --> 00:37:41.400 - Yeah, and I don't even know 00:37:41.400 --> 00:37:43.240 if you need to say the word kind of, 00:37:43.240 --> 00:37:46.760 because I think maybe there's, I don't know many, 00:37:46.760 --> 00:37:48.600 but maybe there's some use cases 00:37:48.600 --> 00:37:51.880 where Mongo is gonna really outperform Postgres 00:37:51.880 --> 00:37:55.160 using JSON-B, but I've not seen them 00:37:55.160 --> 00:37:57.320 and I don't know what they are. 00:37:57.320 --> 00:38:01.020 So it's super powerful and I guess the key difference 00:38:01.020 --> 00:38:04.280 between that and pure blob storage is you can index it. 00:38:04.280 --> 00:38:06.040 So you can, yeah. 00:38:06.040 --> 00:38:07.680 - You can index it and query it. 00:38:07.680 --> 00:38:11.920 You can index into, go to this object in JSON 00:38:11.920 --> 00:38:13.060 and then here and then here. 00:38:13.060 --> 00:38:15.760 and if they have that value, I want it back as an index. 00:38:15.760 --> 00:38:18.040 That's the magic. 00:38:18.040 --> 00:38:20.020 - Right, but it's not as performant 00:38:20.020 --> 00:38:22.480 as splitting those out into columns in a schema 00:38:22.480 --> 00:38:27.200 and indexing those, but it's a trade-off again. 00:38:27.200 --> 00:38:28.560 Back to trade-offs. 00:38:28.560 --> 00:38:29.680 - Yeah, yeah, yeah. 00:38:29.680 --> 00:38:33.000 Do you have a billion of those things or a hundred of them? 00:38:33.000 --> 00:38:34.720 'Cause if you got a hundred, you're probably fine. 00:38:34.720 --> 00:38:36.240 I mean, you probably weigh more than a hundred, 00:38:36.240 --> 00:38:39.560 but billions is there, you might be pushing your luck. 00:38:39.560 --> 00:38:41.960 Yeah, okay, pretty interesting there. 00:38:41.960 --> 00:38:44.920 Other real quick shout out just to like another thing that these guys, 00:38:44.920 --> 00:38:47.840 disclosure, they were sponsors of Talk Python 00:38:47.840 --> 00:38:49.440 but I found them before they were sponsors. 00:38:49.440 --> 00:38:51.440 I thought they were cool and started talking to them. 00:38:51.440 --> 00:38:55.520 It's this place called Tonic that you connect it to your real live data 00:38:55.520 --> 00:39:00.400 and it will generate like PPI cleaned up or stuff that looks like your real data, 00:39:00.400 --> 00:39:02.400 but it's not actually your real data. 00:39:02.400 --> 00:39:06.000 So you sort of base it off of your production data. 00:39:06.000 --> 00:39:08.000 Anyway, if people are looking for fake data, 00:39:08.000 --> 00:39:11.600 there's I guess my whole reason of going down that 00:39:11.600 --> 00:39:13.640 is there's not a lot of choices, 00:39:13.640 --> 00:39:17.180 not a lot of reasons to not have enough data. 00:39:17.180 --> 00:39:19.400 There's a lot of options for getting enough data 00:39:19.400 --> 00:39:23.200 at various versions of realism. 00:39:23.200 --> 00:39:24.940 - Yeah, good point. 00:39:24.940 --> 00:39:27.440 - Yeah, all right, so we were talking about, 00:39:27.440 --> 00:39:31.280 can I run this explain, analyze against my local machine 00:39:31.280 --> 00:39:35.360 and against, say, compare the results 00:39:35.360 --> 00:39:36.740 to how it's gonna be in production? 00:39:36.740 --> 00:39:38.700 So one of your big pieces of advice 00:39:38.700 --> 00:39:41.120 is make sure you have enough data 00:39:41.120 --> 00:39:45.200 and data types that are representative and probably relations, right? 00:39:45.200 --> 00:39:51.240 If you're doing joins, like you want to have like similar amount of related data and whatnot. 00:39:51.240 --> 00:39:58.120 Yeah, ideally, but again, in step one, and that website you talked about that has a spinning 00:39:58.120 --> 00:40:03.920 loader, they're going to be fine, you know, they're going to spot their problems without 00:40:03.920 --> 00:40:04.920 very accurate. 00:40:04.920 --> 00:40:05.920 Yeah. 00:40:05.920 --> 00:40:11.880 So if a certain, you know, if a foreign key isn't indexed and they're doing a lookup on 00:40:11.880 --> 00:40:15.600 that, they'll notice it even if the distribution is slightly off. 00:40:15.600 --> 00:40:21.840 So yeah, the nice thing about performance is you don't, often you're talking about 10x 00:40:21.840 --> 00:40:25.120 difference, 100x difference, you know, thousands sometimes. 00:40:25.120 --> 00:40:30.240 All the blog posts talk about, you know, how we made Postgres 20,000 times faster. 00:40:30.240 --> 00:40:31.240 But it's just- 00:40:31.240 --> 00:40:32.240 All the stuff was added in an index. 00:40:32.240 --> 00:40:33.240 Well, there you go. 00:40:33.240 --> 00:40:34.240 Yeah. 00:40:34.240 --> 00:40:36.320 And there's other things, right? 00:40:36.320 --> 00:40:39.400 There are, performance isn't just indexing, 00:40:39.400 --> 00:40:43.380 but when you're talking about the absolute basics, 00:40:43.380 --> 00:40:46.000 it's, yeah, it's normally, 00:40:46.000 --> 00:40:48.440 you're probably gonna get pretty close 00:40:48.440 --> 00:40:50.440 as soon as you've got even, 00:40:50.440 --> 00:40:52.280 like as soon as you've not just got 10 rows 00:40:52.280 --> 00:40:53.440 in your database. 00:40:53.440 --> 00:40:55.560 - Sure, sure, sure. 00:40:55.560 --> 00:40:57.760 Okay, so because there were interesting things 00:40:57.760 --> 00:41:00.240 that you'll find, maybe you'll only find this 00:41:00.240 --> 00:41:03.080 if you do the buffers also, 00:41:03.080 --> 00:41:08.080 in addition to just analyze is if you're doing a sort 00:41:08.080 --> 00:41:10.400 and you have a ton of data, 00:41:10.400 --> 00:41:12.760 it might say there's too much data to sort in memory. 00:41:12.760 --> 00:41:14.320 So it might do a, what was it called? 00:41:14.320 --> 00:41:16.000 A merged sort on disk. 00:41:16.000 --> 00:41:17.960 That's crazy. 00:41:17.960 --> 00:41:20.560 It writes data to the disk and sorts it as the database. 00:41:20.560 --> 00:41:22.240 You're like, wait a minute, what is it doing? 00:41:22.240 --> 00:41:23.640 It's taken out of memory. 00:41:23.640 --> 00:41:24.480 Are you insane? 00:41:24.480 --> 00:41:25.300 What is this? 00:41:25.300 --> 00:41:28.760 - Well, yeah, it's, I mean, it's super clever, as you said, 00:41:28.760 --> 00:41:32.840 but the alternative is risking an out of memory exception 00:41:32.840 --> 00:41:34.840 the whole thing should be. 00:41:34.840 --> 00:41:40.280 - Do you want it fast and wrong or write slow and correct and working, right? 00:41:40.280 --> 00:41:41.960 Like, take your pick, right? 00:41:41.960 --> 00:41:44.200 At some point, it can't do it otherwise. 00:41:44.200 --> 00:41:46.040 But that goes back to the real data. 00:41:46.040 --> 00:41:49.440 If you don't have enough data, you'll never see the problem as it's going to disk because 00:41:49.440 --> 00:41:51.120 why would three records go to disk? 00:41:51.120 --> 00:41:52.120 That would be crazy. 00:41:52.120 --> 00:41:53.120 - Yeah, exactly. 00:41:53.120 --> 00:42:00.960 So, well, this is a really good example because Postgres determines what it would do in memory 00:42:00.960 --> 00:42:07.720 versus on disk by a couple of parameters now, but the main one being work mem or working 00:42:07.720 --> 00:42:15.440 memory and that's set very low for most production workloads. So that's four megabytes by default 00:42:15.440 --> 00:42:21.840 and I've seen a really good one of the top Postgres consultants giving a talk at a conference 00:42:21.840 --> 00:42:27.040 even a few years ago. So this might even be out of date saying they just automatically 00:42:27.040 --> 00:42:32.600 change that to 16 megabytes wherever just always as a starting point and sometimes bump 00:42:32.600 --> 00:42:36.560 it you know for analytical workloads it could be a lot lot higher than that but they have 00:42:36.560 --> 00:42:40.760 there's loads of formulas you can see out there but that's an example of a default that 00:42:40.760 --> 00:42:44.200 probably is a little bit low in Postgres. 00:42:44.200 --> 00:42:49.920 But if it's running on the cheapest server you can get it somewhere like Linode and it's 00:42:49.920 --> 00:42:55.440 got you know 200 megs total well then you you want it to still be right and slow not 00:42:55.440 --> 00:42:58.240 wrong and crashing or fast and crashing. 00:42:58.240 --> 00:42:59.160 - Yeah, exactly. 00:42:59.160 --> 00:43:03.840 - Yeah, but you do need to know these defaults. 00:43:03.840 --> 00:43:06.580 I think part of being successful is like, 00:43:06.580 --> 00:43:08.040 okay, this is my context. 00:43:08.040 --> 00:43:09.560 So these are the defaults that make sense. 00:43:09.560 --> 00:43:12.620 And these are the ones that I should be tweaking. 00:43:12.620 --> 00:43:16.220 So in another presentation I've seen you give, 00:43:16.220 --> 00:43:18.960 you talked about the process that you go through 00:43:18.960 --> 00:43:21.320 for finding these, 00:43:21.320 --> 00:43:23.720 you sort of determine there's a problem, 00:43:23.720 --> 00:43:26.320 figuring out what it is and then solving it. 00:43:26.320 --> 00:43:28.920 You want to talk through your process there? 00:43:28.920 --> 00:43:29.840 - Yeah, absolutely. 00:43:29.840 --> 00:43:31.760 And this is a process I recommend 00:43:31.760 --> 00:43:34.880 and I'm guilty of not always following, 00:43:34.880 --> 00:43:36.920 but this helped me as well. 00:43:36.920 --> 00:43:37.760 So yeah. 00:43:37.760 --> 00:43:38.640 - Learn the rules first, 00:43:38.640 --> 00:43:39.720 so then you can break them later. 00:43:39.720 --> 00:43:40.560 There you go. 00:43:40.560 --> 00:43:41.920 - Right, exactly. 00:43:41.920 --> 00:43:44.120 But I tend to regret breaking these actually. 00:43:44.120 --> 00:43:46.240 So yeah, the first couple, 00:43:46.240 --> 00:43:47.920 I think we've already covered briefly, 00:43:47.920 --> 00:43:50.920 were run on a realistic data set. 00:43:50.920 --> 00:43:57.400 and then second one being make use of the parameters that explain has in your database. 00:43:57.400 --> 00:44:04.200 So for Postgres, use as many as your version supports is my, it tends to be my recommendation. 00:44:04.200 --> 00:44:08.800 So get a lot of information is the short version of that. 00:44:08.800 --> 00:44:15.080 With Postgres query plans, I think, especially if people are used to reading text format 00:44:15.080 --> 00:44:21.280 they'd be very easily forgiven for starting at the English-speaking world, start at the 00:44:21.280 --> 00:44:28.920 top, read left to right, makes loads of sense. But the plans are indented, so they're a list 00:44:28.920 --> 00:44:33.440 of operations that Postgres is doing, and the first one you see is the last one it does 00:44:33.440 --> 00:44:35.480 to give you the data back. 00:44:35.480 --> 00:44:40.360 Think of it like an onion, right? It's the outer shell of the thing that you're being 00:44:40.360 --> 00:44:41.360 given. 00:44:41.360 --> 00:44:45.600 Yeah, exactly. It'd be difficult to start in the inside of an onion, wouldn't it? 00:44:45.600 --> 00:44:53.680 So, yes, my advice tends to be, so firstly, you need to know that the plan is executed 00:44:53.680 --> 00:44:59.840 from the most indented inwards, just to be able to understand what's going on. 00:44:59.840 --> 00:45:04.720 But secondly, it reports some really important statistics right at the bottom. 00:45:04.720 --> 00:45:10.400 So the total execution time being an obvious example, but also the amount of time it spends 00:45:10.400 --> 00:45:16.560 on planning, on triggers, and on just-in-time compilation. 00:45:16.560 --> 00:45:25.600 So these are... don't worry if you don't know what all of those mean, I didn't, but those can all, 00:45:25.600 --> 00:45:32.640 in rare cases, be the dominant issue. So if planning time is 300 milliseconds and your 00:45:32.640 --> 00:45:38.080 execution time is one millisecond, but you've got 20 operations that the query's done in that one 00:45:38.080 --> 00:45:43.920 millisecond, you could spend easily an hour trying to understand that entire query plan, 00:45:43.920 --> 00:45:48.560 maybe more, maybe you've been interrupted a few times, and then you get to the bottom 00:45:48.560 --> 00:45:51.680 and you see that planning time was the problem the whole time. 00:45:51.680 --> 00:45:56.000 You're like, I have five seconds, five milliseconds I could possibly save. 00:45:56.000 --> 00:45:59.760 Yeah. Yeah, exactly. 00:45:59.760 --> 00:46:03.120 Do things like cache the query plans or stuff like that? 00:46:03.120 --> 00:46:06.560 Yeah, yes, so that's a whole topic in itself. 00:46:06.560 --> 00:46:11.860 But a lot of a lot of our arms do that kind of thing for you. 00:46:11.860 --> 00:46:19.080 Yeah. But. The the big advice there is not so much on the planning time front, it's more 00:46:19.080 --> 00:46:20.480 check out that bottom. 00:46:20.480 --> 00:46:24.600 If you're on Postgres, especially check out that bottom section first and look for dominant 00:46:24.600 --> 00:46:27.720 issues before you spend ages reviewing the rest of the query plan. 00:46:28.160 --> 00:46:32.480 So that's my, that's the third of the five. 00:46:32.480 --> 00:46:35.800 And the fourth is then the bit that I sometimes mess up 00:46:35.800 --> 00:46:38.800 and forget to do, which is work out which, 00:46:38.800 --> 00:46:44.960 so if you then have to look through that tree 00:46:44.960 --> 00:46:49.400 of operations, don't look at any of the statistics 00:46:49.400 --> 00:46:53.280 until you've worked out where the time is going 00:46:53.280 --> 00:46:55.760 or where the majority of the work is happening. 00:46:55.760 --> 00:47:05.040 So Postgres especially is difficult in this area because it will report, I think MySQL does the same, I'm not sure about Mongo and others, 00:47:05.040 --> 00:47:10.600 it will report all of the operations including the statistics of their children. 00:47:10.600 --> 00:47:16.120 So you have to do a bunch of subtraction and some slightly more complicated stuff than that, sadly, 00:47:16.120 --> 00:47:22.440 to work out where this time is going and it's very easy to get distracted by a pattern that you've seen before. 00:47:22.440 --> 00:47:28.760 Like, you know, the obvious, the one that gets me and lots of other people is spot a 00:47:28.760 --> 00:47:33.240 sequential scan on a relatively big table and you start to, you just jump to that and 00:47:33.240 --> 00:47:34.880 think that's probably where the problem is. 00:47:34.880 --> 00:47:37.280 - Missed index, we're gonna go fix that, yeah. 00:47:37.280 --> 00:47:42.160 - Exactly, but in a big enough query, a scan of even a few hundred thousand rows might 00:47:42.160 --> 00:47:46.600 be the, especially if it's returning most of them, might be optimal. 00:47:46.600 --> 00:47:49.320 Or it definitely might not be the bottleneck. 00:47:49.320 --> 00:47:52.400 There might be bigger problems. 00:47:52.400 --> 00:47:57.280 So yeah, my advice is work out where the time's going or the work's being done first. 00:47:57.280 --> 00:48:02.320 And then the last step is once you've found like, is there a big bottleneck? 00:48:02.320 --> 00:48:03.920 Where's the time going? 00:48:03.920 --> 00:48:12.000 Then narrow down into looking at just that operational, that cluster, that subtree. 00:48:12.000 --> 00:48:15.000 Like and it could be a programming pattern. 00:48:15.000 --> 00:48:19.960 It could be, well, this query is returning 200,000 rows. 00:48:19.960 --> 00:48:22.120 when it gets back to the app, 00:48:22.120 --> 00:48:24.400 the app is iterating over the first 10 00:48:24.400 --> 00:48:26.640 and deciding it's had enough, right? 00:48:26.640 --> 00:48:27.640 But meanwhile. 00:48:27.640 --> 00:48:31.500 - Yeah, and yeah, exactly. 00:48:31.500 --> 00:48:35.160 So yeah, I mean, it could be that. 00:48:35.160 --> 00:48:37.760 - Yeah, and the fix would be a simple limit 10 00:48:37.760 --> 00:48:38.600 and skip a certain amount. 00:48:38.600 --> 00:48:41.920 Like you just put paging in the app as part of the query 00:48:41.920 --> 00:48:43.020 and you're good to go. 00:48:43.020 --> 00:48:44.740 - Yeah, exactly. 00:48:44.740 --> 00:48:45.880 Pagination's huge, 00:48:45.880 --> 00:48:47.800 especially once you've got good indexing. 00:48:47.800 --> 00:48:49.800 can be so, so powerful. 00:48:49.800 --> 00:48:53.800 But yeah, I mean, to be fair, we're probably, 00:48:53.800 --> 00:48:57.800 this process is mostly for a query that you've 00:48:57.800 --> 00:49:01.800 worked out is slow and you want to make faster, and that would be a great example. 00:49:01.800 --> 00:49:05.800 But then there are also other application side things, so it might not be that any of 00:49:05.800 --> 00:49:09.800 these queries are slow necessarily, but you're firing off 00:49:09.800 --> 00:49:13.800 the typical N+1 example, you're firing off a lot when you 00:49:13.800 --> 00:49:18.240 you don't need to, you could, instead of going, doing a lot of round trips, or... 00:49:18.240 --> 00:49:23.920 Yeah, instead of doing like an eager join, eager load on an ORM relationship, you're 00:49:23.920 --> 00:49:28.560 doing, I got a 50 of these back, and then I'm doing the relationship 50 times. Yeah. 00:49:28.560 --> 00:49:29.560 Yeah, exactly. 00:49:29.560 --> 00:49:34.480 Yeah, and that echoing out the database calls of your ORM, you'll just see stuff ripping 00:49:34.480 --> 00:49:38.440 by like, why is there so much database stuff here? This seems insane. And they're like, 00:49:38.440 --> 00:49:43.600 well, n plus one. That's the problem. Really quick, I want to just follow up on that. There 00:49:43.600 --> 00:49:46.880 There was a question out in the audience saying, 00:49:46.880 --> 00:49:49.400 what would you like to use? 00:49:49.400 --> 00:49:52.960 Would you wanna use maybe Postgres and SQLAlchemy 00:49:52.960 --> 00:49:57.920 or PsychoPG2, maybe the async version these days 00:49:57.920 --> 00:49:59.320 would be really nice to use. 00:49:59.320 --> 00:50:02.840 I said, well, you might also consider a SQL model, 00:50:02.840 --> 00:50:05.480 but with either of those, SQL model, 00:50:05.480 --> 00:50:08.040 both on top of SQLAlchemy or just SQLAlchemy, 00:50:08.040 --> 00:50:10.480 the question was, can it handle getting millions of rows 00:50:10.480 --> 00:50:12.240 back in a short amount of time? 00:50:12.240 --> 00:50:13.520 Yes and no. 00:50:13.520 --> 00:50:16.180 I like to hear what your experience is with this, 00:50:16.180 --> 00:50:17.420 Michael, if you have any. 00:50:17.420 --> 00:50:21.080 What I've seen is a lot of times the database 00:50:21.080 --> 00:50:23.080 will really quickly give those results back 00:50:23.080 --> 00:50:25.880 and actually even ship them over the network to the app. 00:50:25.880 --> 00:50:29.520 But the deserialization of like, here is a record 00:50:29.520 --> 00:50:31.080 and I'm gonna make a class out of it, 00:50:31.080 --> 00:50:34.000 and here's a record and you're making a million classes 00:50:34.000 --> 00:50:36.920 and then setting probably 10 properties on each one of them. 00:50:36.920 --> 00:50:39.900 You know, 10 million assignments, like that's, 00:50:39.900 --> 00:50:42.280 it doesn't matter how fast your database is gonna be, 00:50:42.280 --> 00:50:47.160 just don't return 10 million rows into an RM almost ever. 00:50:47.160 --> 00:50:49.320 Very, very rarely, let's say. 00:50:49.320 --> 00:50:50.200 - Yeah, exactly. 00:50:50.200 --> 00:50:51.760 The general advice I think is 00:50:51.760 --> 00:50:53.960 if you're gonna be doing aggregation work, 00:50:53.960 --> 00:50:55.840 try to do it database side. 00:50:55.840 --> 00:50:57.240 They're designed to do that stuff. 00:50:57.240 --> 00:51:00.720 They have incredibly smart algorithms built into them. 00:51:00.720 --> 00:51:02.820 I'm sure there's exceptions, 00:51:02.820 --> 00:51:05.600 but it seems to be the wise way of doing things. 00:51:05.600 --> 00:51:07.920 I mean, of course, analytical stuff, 00:51:07.920 --> 00:51:09.520 maybe you are presenting a report 00:51:09.520 --> 00:51:12.120 that needs to be hundreds of thousands of lines long, 00:51:12.120 --> 00:51:15.880 But there aren't many apps that there is a good argument 00:51:15.880 --> 00:51:16.720 for that, I don't think. 00:51:16.720 --> 00:51:20.640 - Yeah, and so many of these ORMs have a way 00:51:20.640 --> 00:51:23.400 to limit the returned fields, right? 00:51:23.400 --> 00:51:25.760 Say I only need the title and the ID, 00:51:25.760 --> 00:51:27.560 or I need the title, the price, and the ID, 00:51:27.560 --> 00:51:30.340 'cause I'm gonna do some local data sciencey stuff 00:51:30.340 --> 00:51:32.920 with price or whatever, right? 00:51:32.920 --> 00:51:34.200 But you don't need the full description 00:51:34.200 --> 00:51:36.640 and all the user reviews or whatever, right? 00:51:36.640 --> 00:51:38.720 So that can save a lot. 00:51:38.720 --> 00:51:39.800 - That's such a good point. 00:51:39.800 --> 00:51:43.760 And yes, I think I might even see that one more often. 00:51:43.760 --> 00:51:45.880 So requesting, you know, 00:51:45.880 --> 00:51:48.400 it's effectively the select staff from equivalent. 00:51:48.400 --> 00:51:49.400 - Yeah, it is, yeah. 00:51:49.400 --> 00:51:52.800 - You'd be surprised how many different things 00:51:52.800 --> 00:51:57.380 that can affect the number of operations down that tree 00:51:57.380 --> 00:52:00.420 that can be affected by the number of columns 00:52:00.420 --> 00:52:02.680 you've requested is huge. 00:52:02.680 --> 00:52:06.480 And it affects things like, oh, by the way, 00:52:06.480 --> 00:52:08.720 the same goes for how many columns you ask 00:52:08.720 --> 00:52:14.200 it to be sorted or ordered by. So it can make the difference between being able to do, for 00:52:14.200 --> 00:52:22.880 example, an index only scan versus an index scan. And that can be a huge performance difference. 00:52:22.880 --> 00:52:28.640 Just that one alone. In terms of aggregation, it can make a big difference. Yeah, there's 00:52:28.640 --> 00:52:33.740 so many things you'd be kind of the knock on effects of passing that many data. We talked 00:52:33.740 --> 00:52:40.180 about sorts. If you're having to keep that many rows in memory and each one's bigger, 00:52:40.180 --> 00:52:41.700 they're going to use that work memory more quickly. 00:52:41.700 --> 00:52:45.340 Right. It might fall back to one of those write to disk and sort it rather than sort 00:52:45.340 --> 00:52:46.340 it in memory. 00:52:46.340 --> 00:52:47.340 Exactly. 00:52:47.340 --> 00:52:52.460 Interesting. Okay. I hadn't thought about the knock on effects as much. Let's talk a 00:52:52.460 --> 00:52:57.900 little bit about fixing this. We've set up the problem, things you shouldn't do, but 00:52:57.900 --> 00:53:03.480 how do you know whether you're doing them? So you've got a couple of free tools that 00:53:03.480 --> 00:53:07.380 people can use that you might give a shout out to and then let's talk about 00:53:07.380 --> 00:53:10.780 PG Mustard and maybe talk through some examples there as well. 00:53:10.780 --> 00:53:16.140 Sure, so yeah there's I mean there's so many query plan visualization tools a 00:53:16.140 --> 00:53:22.200 lot of the ideas we discussed also have them inbuilt as well and amazingly all 00:53:22.200 --> 00:53:25.740 of them a lot of them seem to choose to visualize it quite differently so it's 00:53:25.740 --> 00:53:31.940 quite be quite interesting looking through some of them but this the one 00:53:31.940 --> 00:53:38.060 One you've got up on screen at the moment is a free tool written by a French team, a 00:53:38.060 --> 00:53:41.740 Postgres consultancy called, I don't know how to pronounce it, it's either Dalibo or 00:53:41.740 --> 00:53:43.700 Dalibo. 00:53:43.700 --> 00:53:50.740 And they, this tool is inspired, I think it's even a fork of a product that was written 00:53:50.740 --> 00:53:54.020 many years ago by a great guy called Alex Tatyants. 00:53:54.020 --> 00:53:59.540 So a lot of people will recognize the Tatyants name from the URL, that was a tool called 00:53:59.540 --> 00:54:00.540 Pev. 00:54:00.540 --> 00:54:05.500 So back in the day when I started the two tools were Depez and Tatyants. 00:54:05.500 --> 00:54:10.220 So Depez being the one you just flicked to briefly there. 00:54:10.220 --> 00:54:12.980 This is like the original, this is the first one. 00:54:12.980 --> 00:54:15.700 Let's start with the OG first then. 00:54:15.700 --> 00:54:20.220 This guy is a bit of a legend in the Postgres community. 00:54:20.220 --> 00:54:24.620 He's so helpful on so many of the chat forums. 00:54:24.620 --> 00:54:28.420 And he's a Postgres DBA and has been for many, many years. 00:54:28.420 --> 00:54:36.280 And he built this tool because the text format of Explain, I think he described it as unreadable. 00:54:36.280 --> 00:54:41.600 So he's written a whole great blog series on maybe six parts now on reading Explain 00:54:41.600 --> 00:54:49.000 plans and this tool he's using to highlight in red here kind of big problems or where 00:54:49.000 --> 00:54:50.160 a lot of the time is going. 00:54:50.160 --> 00:54:54.600 So that's that step four that we talked about, work out where time's going first. 00:54:54.600 --> 00:55:01.680 So if you've seen the explain text, it's kind of indented and talks about the problem and 00:55:01.680 --> 00:55:03.000 it has things about timing. 00:55:03.000 --> 00:55:07.760 But here's an HTML table that has columns that shows you the timing and then the number 00:55:07.760 --> 00:55:10.040 of rows is very important. 00:55:10.040 --> 00:55:12.560 And then it has, again, still the actual text. 00:55:12.560 --> 00:55:15.920 And it sounds to me like you think this is pretty important to still have the text that 00:55:15.920 --> 00:55:17.680 people expect. 00:55:17.680 --> 00:55:22.840 I think that text is super important for experts and people that have been doing Postgres performance 00:55:22.840 --> 00:55:24.160 work for years. 00:55:24.160 --> 00:55:27.520 So they can fall back to that text. 00:55:27.520 --> 00:55:29.600 If the left-hand side's not helping them, 00:55:29.600 --> 00:55:31.400 or in the case we're looking at right now, 00:55:31.400 --> 00:55:32.720 there's a lot of red, 00:55:32.720 --> 00:55:35.560 it might be tricky to know where to start. 00:55:35.560 --> 00:55:36.840 You might fall back to looking 00:55:36.840 --> 00:55:39.120 at the right-hand side for something. 00:55:39.120 --> 00:55:40.640 Interestingly, that rows column 00:55:40.640 --> 00:55:45.020 is a row count estimate multiplication. 00:55:45.020 --> 00:55:48.560 So it's how many rows did Postgres expect to get 00:55:48.560 --> 00:55:50.600 versus how many actually came back. 00:55:50.600 --> 00:55:54.200 And that's super important for the cases we discussed 00:55:54.200 --> 00:55:57.120 briefly around which join algorithms are gonna choose 00:55:57.120 --> 00:56:00.640 or multiple other, join order is really important as well. 00:56:00.640 --> 00:56:02.680 It'll try and filter out as many as possible 00:56:02.680 --> 00:56:05.640 in the first few joins so that there's less data 00:56:05.640 --> 00:56:08.400 to worry about in the future ones. 00:56:08.400 --> 00:56:11.640 So if it's really far out in an estimate somewhere, 00:56:11.640 --> 00:56:13.200 that's really important. 00:56:13.200 --> 00:56:15.760 So yeah, this tool's doing some of those things 00:56:15.760 --> 00:56:18.240 and doing it really neatly. 00:56:18.240 --> 00:56:20.940 but that text format I think is super expert friendly. 00:56:20.940 --> 00:56:24.700 People that are very used to the Postgres text format, 00:56:24.700 --> 00:56:28.980 this is probably the tool they're gonna like best. 00:56:28.980 --> 00:56:31.140 - Yeah, yeah, it's really nice. 00:56:31.140 --> 00:56:34.980 Okay, and then there's the Dalibo, 00:56:34.980 --> 00:56:37.740 other one that's similar, right? 00:56:37.740 --> 00:56:42.180 - Yes, so this one's the one that's based on PEV. 00:56:42.180 --> 00:56:43.540 So they call it PEV2, I think, 00:56:43.540 --> 00:56:46.980 and it's open source GitHub repos PEV2. 00:56:46.980 --> 00:56:49.740 And this is their hosted tool. 00:56:49.740 --> 00:56:52.300 Again, free to use, open source. 00:56:52.300 --> 00:56:56.540 And like the other one, this supports both text format 00:56:56.540 --> 00:56:58.420 explains and JSON format explains. 00:56:58.420 --> 00:57:00.340 So I think that, yeah, that's what they're saying 00:57:00.340 --> 00:57:02.100 up at the top there. 00:57:02.100 --> 00:57:04.260 This is much more visual as a tool. 00:57:04.260 --> 00:57:06.660 So this displays the query patterns. 00:57:06.660 --> 00:57:09.100 - Sadly, they don't have an example I can just pull up. 00:57:09.100 --> 00:57:10.740 I'd have to like put it all together 00:57:10.740 --> 00:57:11.580 and drop it in there. 00:57:11.580 --> 00:57:12.420 Or do they? 00:57:12.420 --> 00:57:13.500 I didn't see one when I clicked on that. 00:57:13.500 --> 00:57:16.220 - No, it's at the bottom, they might have sample plans. 00:57:16.220 --> 00:57:17.060 There you go. 00:57:17.060 --> 00:57:21.720 So the gray button on the right. 00:57:21.720 --> 00:57:22.720 - Oh yeah, yeah, okay, you're right. 00:57:22.720 --> 00:57:24.420 Okay, let's just go with that one. 00:57:24.420 --> 00:57:27.520 Cool, and then I'll submit it. 00:57:27.520 --> 00:57:28.800 Yeah, how did it, oh, well, it's gray. 00:57:28.800 --> 00:57:29.640 That's why I missed it. 00:57:29.640 --> 00:57:31.720 Okay, yeah, this looks fantastic. 00:57:31.720 --> 00:57:34.360 - Yeah, so this is a visualization, 00:57:34.360 --> 00:57:37.160 very similar color scheme to the last one. 00:57:37.160 --> 00:57:38.520 So you'll see things in red 00:57:38.520 --> 00:57:40.640 that they're saying are really bad, 00:57:40.640 --> 00:57:42.180 orange if they're medium bad, 00:57:42.180 --> 00:57:45.640 and then yellow if there may be a problem. 00:57:45.640 --> 00:57:46.640 Yeah, exactly. 00:57:46.640 --> 00:57:50.880 Don't worry about it for now, get back to it later. 00:57:50.880 --> 00:57:57.640 And then, so this is good for visual stuff if you're that way inclined. 00:57:57.640 --> 00:58:04.080 The other thing, some people that are still extremely competent at this love the left 00:58:04.080 --> 00:58:05.460 hand side especially. 00:58:05.460 --> 00:58:15.600 So it looks unassuming but those little bars are kind of like a flame chart on its side 00:58:15.600 --> 00:58:17.140 And they'll show you, in this case, 00:58:17.140 --> 00:58:20.220 we're looking at the amount of time spent by each operation, 00:58:20.220 --> 00:58:23.220 but you might be able to take buffers there 00:58:23.220 --> 00:58:25.180 is a really popular way of looking at it. 00:58:25.180 --> 00:58:31.620 That's a feature that's not in a lot of the other tools. 00:58:31.620 --> 00:58:35.200 So that's a common reason that people really like this tool. 00:58:35.200 --> 00:58:37.420 - Fantastic. 00:58:37.420 --> 00:58:40.620 Okay, so these are both totally free, web hosted, 00:58:40.620 --> 00:58:41.860 sounds like some of them are open source. 00:58:41.860 --> 00:58:44.060 You can probably download them as well. 00:58:44.060 --> 00:58:45.580 They look nice. 00:58:45.580 --> 00:58:48.780 Let's talk PG Mustard a little bit. 00:58:48.780 --> 00:58:50.780 So like I said, when I came across this, 00:58:50.780 --> 00:58:54.980 I thought, "Wow, this is really a lot nicer 00:58:54.980 --> 00:58:57.280 than the other tools that I've seen for doing this." 00:58:57.280 --> 00:59:01.280 And maybe the best way to understand it would be to talk about two things real quick. 00:59:01.280 --> 00:59:03.280 You can tell me how you want to talk about it. 00:59:03.280 --> 00:59:07.280 But I think maybe talk about the performance advice. 00:59:07.280 --> 00:59:10.280 What do you think about that? We could maybe go through that real quick. 00:59:10.280 --> 00:59:12.280 Some of the things that are in mind for you. 00:59:12.280 --> 00:59:20.360 Well yes, I know there's a lot listed there, but the short version is that we're also trying 00:59:20.360 --> 00:59:24.600 to, we've got some strong opinions on the visual side of things, you know, the red gate 00:59:24.600 --> 00:59:27.880 trying to keep things as simple as possible has not left. 00:59:27.880 --> 00:59:34.280 So my co-founder of this is a guy called Dave, and we worked together for a couple of years 00:59:34.280 --> 00:59:36.080 almost full time building this. 00:59:36.080 --> 00:59:40.320 So we've also spent a lot more time than some of the others on building this. 00:59:40.320 --> 00:59:46.920 There's a different level of polish and attention to detail if something is your job and the 00:59:46.920 --> 00:59:51.360 maker or breaker of your success versus something you work on in your spare time because it'll 00:59:51.360 --> 00:59:52.360 help a little, right? 00:59:52.360 --> 00:59:54.040 Like there just is. 00:59:54.040 --> 00:59:56.280 Yeah, well, absolutely. 00:59:56.280 --> 01:00:02.760 And hopefully that shows but also the others have done an amazing job and there's surprisingly 01:00:02.760 --> 01:00:06.080 few differences if you consider how much time difference there is. 01:00:06.080 --> 01:00:11.160 But yeah, so the big differences are on the visualization front, we try and keep things 01:00:11.160 --> 01:00:12.160 that bit simpler. 01:00:12.160 --> 01:00:14.160 We're also a lot more opinionated. 01:00:14.160 --> 01:00:16.980 We assume you are trying to speed up your query. 01:00:16.980 --> 01:00:21.360 And that might sound stupid, but it's not the only reason you would look at a query 01:00:21.360 --> 01:00:22.360 plan. 01:00:22.360 --> 01:00:27.940 So there are reasons why that we might not be a good fit for people. 01:00:27.940 --> 01:00:30.840 Let's say you're working on the Postgres source code and you're trying to implement a new 01:00:30.840 --> 01:00:33.980 feature and you just want to look at some explain plans. 01:00:33.980 --> 01:00:38.700 You probably want the text format and you probably really want the costs. 01:00:38.700 --> 01:00:43.900 So the things that the plan is using to choose between plans, you probably want those highlighted 01:00:43.900 --> 01:00:45.220 and we don't highlight those. 01:00:45.220 --> 01:00:49.280 We hide those really far away from you because most of the time when you're trying to speed 01:00:49.280 --> 01:00:54.980 up a query, that's not what you're going to be wanting to see. 01:00:54.980 --> 01:00:55.980 So yeah, so it's super. 01:00:55.980 --> 01:00:56.980 Yeah. 01:00:56.980 --> 01:01:01.660 So which ones of these really do you feel like are both highly valuable, highly recommended 01:01:01.660 --> 01:01:05.020 Or maybe some that are like, wow, I can't believe it would find that I would have never found that 01:01:05.020 --> 01:01:10.380 Good question. Sadly, the most valuable are the index related ones 01:01:10.380 --> 01:01:17.340 So as you mentioned that we call it we call it high index potential when there's a lot of rows being filtered 01:01:17.340 --> 01:01:19.660 Versus the amount being scanned 01:01:19.660 --> 01:01:20.060 um 01:01:20.060 --> 01:01:26.220 So that's the big thing to look out for in the text format is when there's high numbers of rows being built not just on 01:01:26.460 --> 01:01:32.140 sequential scans, but also if you've got an index scan that's still having to filter out a lot of rows 01:01:32.140 --> 01:01:37.820 that we call that an inefficient index. But it's only inefficient for that query. It's 01:01:37.820 --> 01:01:42.220 there's nothing wrong with the index. It's more that your query doesn't suit that. It's just 01:01:42.220 --> 01:01:47.740 Postgres. That's the best one Postgres could use. Right, right. So yeah, the indexing one's super 01:01:47.740 --> 01:01:51.500 valuable. Operations on disk we've already discussed a few times that comes up a lot and 01:01:51.500 --> 01:01:52.860 and poor row estimates. 01:01:52.860 --> 01:01:54.840 These are the most common. 01:01:54.840 --> 01:01:57.660 All of the others are either rarer 01:01:57.660 --> 01:02:00.380 or less valuable when you hit them. 01:02:00.380 --> 01:02:02.580 But sometimes when they're rare, 01:02:02.580 --> 01:02:04.540 they can be extremely helpful. 01:02:04.540 --> 01:02:06.260 - Sure, well, once you pass the advice 01:02:06.260 --> 01:02:09.660 of look for table scans and then add an index 01:02:09.660 --> 01:02:13.660 and don't select star if you got a big result set 01:02:13.660 --> 01:02:14.860 so it doesn't go to disk. 01:02:14.860 --> 01:02:18.580 Those are pretty, not super easy, 01:02:18.580 --> 01:02:20.200 but those are somewhat easy. 01:02:20.200 --> 01:02:25.240 If it's not fast enough, you're like, well, now what? 01:02:25.240 --> 01:02:26.240 I added an index. 01:02:26.240 --> 01:02:27.240 It's not fast enough. 01:02:27.240 --> 01:02:28.240 Now what? 01:02:28.240 --> 01:02:29.240 Yeah. 01:02:29.240 --> 01:02:31.680 One more that I'd love to discuss actually is something we talked about really early 01:02:31.680 --> 01:02:34.600 on is the idea of trying to do less work. 01:02:34.600 --> 01:02:37.180 So either not doing the work. 01:02:37.180 --> 01:02:42.620 So what I think something I'm quite proud of is that we'll try and not highlight anything 01:02:42.620 --> 01:02:44.920 if it's pretty optimal already. 01:02:44.920 --> 01:02:48.880 Now that's if your query is pretty optimal already and doing a lot of work and still 01:02:48.880 --> 01:02:53.840 not fast enough for what you want, you kind of need to rethink. You either need to think, 01:02:53.840 --> 01:02:59.200 should we estimate this number? Do we need to materialize it and run it once every few 01:02:59.200 --> 01:03:05.840 minutes and have every user look at kind of a cached version of it? So that's quite powerful. 01:03:05.840 --> 01:03:11.200 And then the other side of it, if it's not efficient, it might be a sign of something 01:03:11.200 --> 01:03:19.640 else going wrong. So in Postgres, one of the things that it gets beaten with is effectively 01:03:19.640 --> 01:03:26.180 its equivalent of garbage collection. So the way it does multi-version concurrency control, 01:03:26.180 --> 01:03:32.120 it ends up bloating the tables and the indexes a little bit, sometimes a lot, which is a 01:03:32.120 --> 01:03:38.880 problem. It has automatic systems that people sometimes disable around that. But it also, 01:03:38.880 --> 01:03:44.300 old versions especially, indexes on heavily updated tables for example can quite quickly 01:03:44.300 --> 01:03:45.580 and easily bloat. 01:03:45.580 --> 01:03:51.820 So pointing out, so if you include the buffers flag we talked about, you can spot when Postgres 01:03:51.820 --> 01:03:56.060 is doing a lot of reads versus the amount of data it's returning and that's sometimes 01:03:56.060 --> 01:04:03.380 a sign that for example an index has got really badly bloated and rebuilding that index, ideally 01:04:03.380 --> 01:04:08.260 concurrently if you're on production, but yeah rebuilding that index. 01:04:08.260 --> 01:04:09.260 This is how you fix it, Michael. 01:04:09.260 --> 01:04:11.820 What you do is you put that little banner up there and you say we're going to be down 01:04:11.820 --> 01:04:14.380 for four hours on Sunday. 01:04:14.380 --> 01:04:19.040 So if you just don't mind, don't use our website on Sundays because that's when we rebuild 01:04:19.040 --> 01:04:20.040 the index. 01:04:20.040 --> 01:04:24.740 Those are often on the same site that have the really bad queries when they're not offline. 01:04:24.740 --> 01:04:26.340 Anyway, yeah. 01:04:26.340 --> 01:04:30.340 So we're trying to help them and trying to educate and trying to teach them how to avoid 01:04:30.340 --> 01:04:31.340 those things. 01:04:31.340 --> 01:04:32.780 But yeah, I know exactly what you mean. 01:04:32.780 --> 01:04:33.780 So yeah, those are the kinds. 01:04:33.780 --> 01:04:34.780 Yeah, that's good. 01:04:34.780 --> 01:04:40.940 Yeah, and that I think the more I talk to people that really know this topic and they've spent 20-30 years on it 01:04:40.940 --> 01:04:44.480 they focus mostly on work done and 01:04:44.480 --> 01:04:48.820 Really trying to minimize and reduce that as much as possible 01:04:48.820 --> 01:04:52.860 Less is more and all of that. So yeah. Yeah. Okay 01:04:52.860 --> 01:04:56.480 So let's maybe highlight this through a couple of examples 01:04:56.480 --> 01:05:01.820 I also have one maybe final tool to suggest you have a couple of examples that show how you can 01:05:03.060 --> 01:05:08.060 Check this out over at app.pgmuster.com. 01:05:08.060 --> 01:05:11.520 It will give you the visualization that you can dive into 01:05:11.520 --> 01:05:14.100 and then the recommendations, right? 01:05:14.100 --> 01:05:16.740 - Yeah, so this is just an example we've published. 01:05:16.740 --> 01:05:20.080 Maybe not the best example ever, but it's one that I like. 01:05:20.080 --> 01:05:20.920 - Sure. 01:05:20.920 --> 01:05:22.540 All right, well, maybe let's just talk 01:05:22.540 --> 01:05:24.540 through the three examples here. 01:05:24.540 --> 01:05:25.740 Maybe really quickly, the UI. 01:05:25.740 --> 01:05:28.700 So you've got these nodes that are connected, 01:05:28.700 --> 01:05:31.680 sort of graph theory-like, 01:05:31.680 --> 01:05:42.360 And you can expand and collapse the sections to get details about very much the indented way of the explain text, right? 01:05:42.360 --> 01:05:45.440 Yes, so we're trying to mimic the text format with that. 01:05:45.440 --> 01:05:50.760 So people that are used to text format, this is the exact same structure but with less data. 01:05:50.760 --> 01:05:58.360 We're doing some of the calculations there, so those 200,000 that you see there, that's the number of rows being returned at each stage. 01:05:58.360 --> 01:06:05.160 And one thing you saw was that those two of those operations were hidden by default at the beginning when you first loaded it. 01:06:05.160 --> 01:06:11.160 That's one of the opinionated things we're doing. By default we'll hide subtrees that are really fast already. 01:06:11.160 --> 01:06:15.960 So you can, in this example, it's not great, you know, there's only four nodes. 01:06:15.960 --> 01:06:22.520 But you can get queries that have hundreds of nodes, and even ones that return quickly. 01:06:22.520 --> 01:06:27.520 So hiding far sub trees is a very opinionated thing to do, 01:06:27.520 --> 01:06:31.720 but I find very valuable. 01:06:31.720 --> 01:06:32.720 - Sure, I like it. 01:06:32.720 --> 01:06:35.520 And it's about find, you know, 01:06:35.520 --> 01:06:36.840 bring your attention to the part 01:06:36.840 --> 01:06:38.880 where you can make improvements. 01:06:38.880 --> 01:06:40.320 - Yeah, exactly. - Where the time is being spent. 01:06:40.320 --> 01:06:41.160 Yeah. 01:06:41.160 --> 01:06:42.640 - So yeah, so that's what we're doing. 01:06:42.640 --> 01:06:44.680 At the top there, we have a timing bar, 01:06:44.680 --> 01:06:47.680 which is a bit like a flame graph, but viewed from the top. 01:06:47.680 --> 01:06:51.200 So kind of like a bird's eye view on that. 01:06:51.200 --> 01:06:53.440 Maybe this one's got a little better breakdown. 01:06:53.440 --> 01:06:54.280 - Definitely. 01:06:54.280 --> 01:06:56.640 So you can see the width of those 01:06:56.640 --> 01:06:58.040 is the amount of time spent 01:06:58.040 --> 01:07:00.200 and they're ordered by slowest to fastest. 01:07:00.200 --> 01:07:02.240 So the idea is to quickly show you 01:07:02.240 --> 01:07:03.600 if there's one dominant issue 01:07:03.600 --> 01:07:06.560 versus maybe quite a few 01:07:06.560 --> 01:07:09.000 that are taking a similar amount of time. 01:07:09.000 --> 01:07:10.320 - Sure, so for people listening, 01:07:10.320 --> 01:07:15.320 there's this tree of nodes that you can explore 01:07:15.320 --> 01:07:16.540 to see what's happening, 01:07:16.540 --> 01:07:18.500 like a gather merge or sort and so on. 01:07:18.500 --> 01:07:22.060 and there's the timing and the likelihood 01:07:22.060 --> 01:07:23.500 of good recommendations. 01:07:23.500 --> 01:07:25.140 They all have numbers and then across the top, 01:07:25.140 --> 01:07:28.100 you can see, well, number one was the worst offender 01:07:28.100 --> 01:07:31.660 of performance and then number zero and then number two. 01:07:31.660 --> 01:07:34.820 - Yeah, I've just realized how un-podcast friendly 01:07:34.820 --> 01:07:35.700 this might be. 01:07:35.700 --> 01:07:37.340 So apologies to people listening. 01:07:37.340 --> 01:07:39.580 - Well, you know, I'm gonna link to these examples 01:07:39.580 --> 01:07:41.500 and people can go check them out. 01:07:41.500 --> 01:07:42.740 - Yeah, well, yeah, thank you. 01:07:42.740 --> 01:07:44.700 And then, so the last thing that we're doing, 01:07:44.700 --> 01:07:47.060 so the other tools do that quite well, in my opinion, 01:07:47.060 --> 01:07:48.900 in terms of showing you where the time's going. 01:07:48.900 --> 01:07:50.660 I prefer a very visual way of seeing that, 01:07:50.660 --> 01:07:52.820 and that's similar to the Dalibor one we looked at. 01:07:52.820 --> 01:07:57.020 You can see quite quickly where those time bars are going. 01:07:57.020 --> 01:08:00.420 The thing we do additionally is these scored tips. 01:08:00.420 --> 01:08:05.420 So we're trying to look for every issue we know about, 01:08:05.420 --> 01:08:06.260 and we're trying to look at them 01:08:06.260 --> 01:08:08.260 on every single operation we see, 01:08:08.260 --> 01:08:10.440 and then we're calculating roughly how much time 01:08:10.440 --> 01:08:12.660 we think you could save if you fix the problem 01:08:12.660 --> 01:08:14.260 that we think might be there, 01:08:14.260 --> 01:08:16.640 and then scoring that so that we only show you 01:08:16.640 --> 01:08:21.360 the top three, maximum three, in this case, just one. 01:08:21.360 --> 01:08:27.480 So even in this case, you can see that most of the time is being spent doing 200,000 loops 01:08:27.480 --> 01:08:28.940 of an index only scan. 01:08:28.940 --> 01:08:33.720 So something that's probably extremely fast on its own, but the fact we're doing 200,000 01:08:33.720 --> 01:08:36.240 loops of it adds up quite quickly. 01:08:36.240 --> 01:08:37.880 Yeah, exactly. 01:08:37.880 --> 01:08:41.560 And the real problem is a bad row estimate. 01:08:41.560 --> 01:08:44.400 So in fact, it's the exact example we were discussing earlier. 01:08:44.400 --> 01:08:52.720 picked a nested loop because it's not expecting to have to do that many loops. 01:08:52.720 --> 01:09:00.200 But in reality there were nearly 200 times more rows than it was expecting and that was 01:09:00.200 --> 01:09:02.520 probably a bad choice of join algorithm. 01:09:02.520 --> 01:09:04.080 Right, okay. 01:09:04.080 --> 01:09:09.880 Yeah, so if I click on, if you just open it up, right on the left it says top tips 4.6 01:09:09.880 --> 01:09:12.200 out of five likelihood of improvement, 01:09:12.200 --> 01:09:16.280 operation zero, row estimate out by a factor of 191.8. 01:09:16.280 --> 01:09:19.280 Then I can click on the details here. 01:09:19.280 --> 01:09:22.040 If I click on the node that it pulls up, 01:09:22.040 --> 01:09:24.200 it says that same thing, you can expand it out 01:09:24.200 --> 01:09:29.200 and it gives you a nice human English language description 01:09:29.200 --> 01:09:32.240 of that and then the operations, 01:09:32.240 --> 01:09:35.560 the nested loop it describes like what that is, 01:09:35.560 --> 01:09:37.200 gives you some stats on the buffers 01:09:37.200 --> 01:09:39.000 and then the operational details 01:09:39.000 --> 01:09:42.120 like it was an inner join and how long it took and so on. 01:09:42.120 --> 01:09:44.280 - Yeah, absolutely. 01:09:44.280 --> 01:09:45.160 And actually, it's a really good point 01:09:45.160 --> 01:09:47.440 on the operation descriptions. 01:09:47.440 --> 01:09:50.520 I also spent ages, probably a couple of months solid, 01:09:50.520 --> 01:09:55.280 making documentation page or a glossary of terms 01:09:55.280 --> 01:09:57.040 that you can come across in WordPress. 01:09:57.040 --> 01:10:01.440 - Writing a manual, but in like two paragraphs at a time 01:10:01.440 --> 01:10:02.880 to be delivered. 01:10:02.880 --> 01:10:04.760 - Yeah, so we put a lot of them in the product, 01:10:04.760 --> 01:10:06.440 but we've also put them online for people 01:10:06.440 --> 01:10:08.680 so they can freely access them as well. 01:10:08.680 --> 01:10:14.280 sort of yeah those are all available on glossary on our website as well. 01:10:14.280 --> 01:10:20.520 Yeah okay so in this example what do I do like it tells me these things what is what 01:10:20.520 --> 01:10:24.040 should I go do to make this better is there anything I can't? 01:10:24.040 --> 01:10:26.880 Well exactly I mean here's what we're looking at. 01:10:26.880 --> 01:10:35.560 Vow estimates are tricky but Postgres has a thing called analyze where it will gather 01:10:35.560 --> 01:10:43.440 statistics on each table, it will take a sample of them, it will look for most common values, 01:10:43.440 --> 01:10:49.720 so it can kind of estimate joins accurately, that kind of thing. And it might be that analyze 01:10:49.720 --> 01:10:52.160 hasn't run in a long time for whatever reason. 01:10:52.160 --> 01:10:56.440 I see the data has dramatically changed since it last took stats. So it's using out of date 01:10:56.440 --> 01:10:59.160 stats to make its decisions. 01:10:59.160 --> 01:11:04.440 Exactly. A really common cause of that is if you do a major version upgrade in Postgres, 01:11:04.440 --> 01:11:08.880 doesn't bring along your statistics so you do need to run and analyze as part of your 01:11:08.880 --> 01:11:13.400 upgrade so that's really important. 01:11:13.400 --> 01:11:18.320 But then if you, there's a learn more link here as well with, it will go into quite a 01:11:18.320 --> 01:11:23.320 few more things you can do so if it's not analyzed there's a bunch of other ways you 01:11:23.320 --> 01:11:29.320 can teach Postgres or ask Postgres to gather more statistics so if it's a heavily skewed 01:11:29.320 --> 01:11:36.080 column. Yeah. Yeah. Anyway, so we've gone into quite a lot of detail in that blog post. 01:11:36.080 --> 01:11:40.560 But yeah, there's a bunch. I love it, though. Because if I if somebody told me this, I'm 01:11:40.560 --> 01:11:46.840 like, well, I don't really know what to do. To be clear, I'm basically a beginner on Postgres, 01:11:46.840 --> 01:11:50.880 but not on relational databases, especially not on databases in general. Right. So I might 01:11:50.880 --> 01:11:55.080 look at this go up. Great. But does Postgres even have a way to fix this? Right. And then 01:11:55.080 --> 01:11:59.160 you're right here next to it. You're like, OK, you just run this command and you'll probably 01:11:59.160 --> 01:12:01.160 And I think that's super valuable, 01:12:01.160 --> 01:12:03.160 because it's one thing to say, "Here's your problem," 01:12:03.160 --> 01:12:05.160 or, "This is where the problem resides." 01:12:05.160 --> 01:12:07.160 It's another, like, "Now what?" 01:12:07.160 --> 01:12:10.160 - Yeah, and these days, engineers and developers 01:12:10.160 --> 01:12:12.160 are expected to know, well, there's so many people 01:12:12.160 --> 01:12:15.160 that started back-end, and then they kind of went full stack, 01:12:15.160 --> 01:12:17.160 but then they have to worry about the database as well. 01:12:17.160 --> 01:12:20.160 And it feels like the surface area is expanding 01:12:20.160 --> 01:12:21.160 in a lot of cases. 01:12:21.160 --> 01:12:26.160 So we meet a lot of extremely smart developers 01:12:26.160 --> 01:12:31.320 developers that have been doing back-end work for decades that only have to deal 01:12:31.320 --> 01:12:35.120 with Postgres every few months or maybe even every couple of years. 01:12:35.120 --> 01:12:39.560 Those people, we kind of want to help them because they know what 01:12:39.560 --> 01:12:43.160 they're doing, it's just this information is not top of mind right now. 01:12:43.160 --> 01:12:48.240 So that tends to be who we can help most easily and we're trying to 01:12:48.240 --> 01:12:50.160 make it a bit more beginner-friendly as well. 01:12:50.160 --> 01:12:52.160 Yeah, that's fantastic. 01:12:52.160 --> 01:12:55.160 Okay, so that's example one, which I'll link to. 01:12:55.160 --> 01:13:03.160 Example two, we've got a more significant challenge here, more stuff happening. 01:13:03.160 --> 01:13:06.160 This one we're throwing away too many rows. 01:13:06.160 --> 01:13:12.160 Yeah, so well this is a great example actually of a tip I didn't mention, but it's probably index related. 01:13:12.160 --> 01:13:22.160 So we're throwing away rows at the end here, going down from 45,000 in the second to last operation, down to 100 in the last one. 01:13:22.160 --> 01:13:30.200 last one, if we could filter down to ideally a much smaller number very early on, we could 01:13:30.200 --> 01:13:36.720 avoid, there's a couple of large sequential scans there, a couple of hash joins, all on 01:13:36.720 --> 01:13:39.640 tens of thousands of rows of data. 01:13:39.640 --> 01:13:45.920 In this case, there's no big filter until the end, so we're not giving an index potential 01:13:45.920 --> 01:13:51.920 tip, but we're saying if you can, yeah, so in fact, we'll actually see on the sequential 01:13:51.920 --> 01:13:56.000 scans we're saying index potential is zero. 01:13:56.000 --> 01:13:58.600 Because no rows are being filtered there. 01:13:58.600 --> 01:14:07.840 But the late filter at the end, the rows discarded, is depending on exactly how it's discarding 01:14:07.840 --> 01:14:13.780 those and what it's doing, I think I remember from this example, we're actually sorting 01:14:13.780 --> 01:14:24.340 by columns in two different tables. So we're ordering by a column from one and then ordering 01:14:24.340 --> 01:14:30.800 by a column from the other and neither of those columns is indexed. So if we could get 01:14:30.800 --> 01:14:35.660 the data from each of those in an ordered manner, this would probably be a lot faster. 01:14:35.660 --> 01:14:40.380 But it's a tricky investigation because we don't know from what Postgres is telling us 01:14:40.380 --> 01:14:45.720 here exactly how those are distributed and how many of them are going to be appropriate 01:14:45.720 --> 01:14:46.720 at each stage. 01:14:46.720 --> 01:14:47.720 Yeah. 01:14:47.720 --> 01:14:53.160 Another thing I find value is not necessarily the advice, but the non-advice. 01:14:53.160 --> 01:14:55.380 Like you know what, there's zero index potential here. 01:14:55.380 --> 01:14:56.700 This is fine. 01:14:56.700 --> 01:14:59.660 So don't look down that path. 01:14:59.660 --> 01:15:00.660 Go look somewhere else. 01:15:00.660 --> 01:15:01.660 Right? 01:15:01.660 --> 01:15:02.660 Row estimates, perfect. 01:15:02.660 --> 01:15:03.660 Operation happened in memory. 01:15:03.660 --> 01:15:04.960 Don't worry about those things. 01:15:04.960 --> 01:15:07.780 Go find the problem somewhere else. 01:15:07.780 --> 01:15:09.520 Yeah, exactly. 01:15:09.520 --> 01:15:12.800 That was actually user feedback quite early on. 01:15:12.800 --> 01:15:16.520 People wanted to see the things that were not an issue. 01:15:16.520 --> 01:15:22.080 So it's been really helpful working with people and trying to understand what they do and 01:15:22.080 --> 01:15:23.080 don't find. 01:15:23.080 --> 01:15:27.520 It wasn't something I thought of, but through feedback, it was really helpful to know that 01:15:27.520 --> 01:15:29.520 is exactly what people want to see. 01:15:29.520 --> 01:15:30.520 Yeah. 01:15:30.520 --> 01:15:35.600 I mean, I think this is super valuable. 01:15:35.600 --> 01:15:37.560 So people want to check out PG Mustard. 01:15:37.560 --> 01:15:39.680 It's worth pointing out that it does cost money, right? 01:15:39.680 --> 01:15:41.880 It is a product. 01:15:41.880 --> 01:15:46.280 - Yeah, sorry, for our sins, it's commercial. 01:15:46.280 --> 01:15:49.720 But if anybody's, if there are students listening, 01:15:49.720 --> 01:15:51.680 for non-commercial use, we give it away to people. 01:15:51.680 --> 01:15:53.620 All you need to do is just contact us. 01:15:53.620 --> 01:15:57.480 And yeah, there's a couple of other reasons 01:15:57.480 --> 01:15:59.000 we give it away for free. 01:15:59.000 --> 01:16:00.840 So yeah, feel free to reach out. 01:16:00.840 --> 01:16:03.080 But we hope it's not too expensive as well, 01:16:03.080 --> 01:16:05.520 especially if you're doing this for work. 01:16:06.800 --> 01:16:09.940 Yeah, it starts at 95 euros per year for a single person. 01:16:09.940 --> 01:16:12.860 So hopefully affordable. 01:16:12.860 --> 01:16:14.580 - Yeah, and I think it's, 01:16:14.580 --> 01:16:17.020 I don't mean to make that derogatory, not at all. 01:16:17.020 --> 01:16:18.060 I just wanted people to know, 01:16:18.060 --> 01:16:19.980 like the other ones we talked about were free 01:16:19.980 --> 01:16:22.100 and open source and this one, it is a product, 01:16:22.100 --> 01:16:23.060 you do pay for it. 01:16:23.060 --> 01:16:28.820 But boy, if you could save a database upgrade 01:16:28.820 --> 01:16:31.220 to a new machine, or if you could save, 01:16:31.220 --> 01:16:35.180 you know, spending $100 a month on a larger server 01:16:35.180 --> 01:16:39.820 in the cloud every month because this thing help you find 01:16:39.820 --> 01:16:40.820 that, oh, you actually don't need it. 01:16:40.820 --> 01:16:42.580 You're just doing it wrong, right? 01:16:42.580 --> 01:16:46.740 Well, $100 once seems like a pretty good bargain, you know? 01:16:46.740 --> 01:16:48.660 - Yeah, absolutely. 01:16:48.660 --> 01:16:50.740 And I mean, not even talking about PG Mustard, 01:16:50.740 --> 01:16:51.700 but tools in general, 01:16:51.700 --> 01:16:53.900 I think developers probably underestimate 01:16:53.900 --> 01:16:55.380 how valuable their time is, 01:16:55.380 --> 01:16:58.500 not just from a salary per hour perspective, 01:16:58.500 --> 01:17:02.100 but also from a, if your management team 01:17:02.100 --> 01:17:04.180 or your boss or whoever could choose 01:17:04.180 --> 01:17:08.460 whether you would ship the next year's worth of features today, 01:17:08.460 --> 01:17:11.340 and they could pay, let's say, 10 times your salary, 01:17:11.340 --> 01:17:13.940 a lot of them would actually take that trade. 01:17:13.940 --> 01:17:16.660 I think every company in the world is hiring developers. 01:17:16.660 --> 01:17:20.220 Most people have a huge backlog that they want to be building. 01:17:20.220 --> 01:17:26.460 So this is an awful example in terms of spending money to solve that, 01:17:26.460 --> 01:17:29.060 but even if it comes to more expensive tools, 01:17:29.060 --> 01:17:31.420 I think you'd be surprised how much, 01:17:31.420 --> 01:17:33.860 if they can make you more efficient, 01:17:33.860 --> 01:17:37.580 companies will probably think of it as a better trade than you do. 01:17:37.580 --> 01:17:43.780 Yeah, I agree. I do think that we often muddle along using 01:17:43.780 --> 01:17:47.060 something that we should be better off paying a decent 01:17:47.060 --> 01:17:50.060 amount. I'm talking about like $2,000 a month subscriptions of 01:17:50.060 --> 01:17:54.980 some insane tool that may or may not be helpful. But also, just 01:17:54.980 --> 01:17:59.020 from a joy, you're like, do you want to go to work and work on 01:17:59.020 --> 01:18:02.260 stuff? Or are you like, God, I'm in the log files again. I just, 01:18:02.740 --> 01:18:05.660 I can't do it, I can't do another day of this, right? 01:18:05.660 --> 01:18:07.580 Versus, oh, like I really have a good understanding 01:18:07.580 --> 01:18:10.020 I'm building something that I'm proud to share with my friends 01:18:10.020 --> 01:18:11.860 look how fast and zippy this site is 01:18:11.860 --> 01:18:15.100 versus, yeah, we're working on it, you know? 01:18:15.100 --> 01:18:18.340 Like there is a pride and a joy aspect of being better 01:18:18.340 --> 01:18:21.020 and if tools get you there, that's great. 01:18:21.020 --> 01:18:23.700 All right, speaking of tools, 01:18:23.700 --> 01:18:25.420 I think we're about out of time for this 01:18:25.420 --> 01:18:27.920 but final two questions before we get out of here. 01:18:31.060 --> 01:18:35.140 I think I want to focus on database GUI editors 01:18:35.140 --> 01:18:37.180 rather than text editors for this episode. 01:18:37.180 --> 01:18:39.660 And we talked about a bunch. 01:18:39.660 --> 01:18:41.900 So maybe we could just quick reference back to that. 01:18:41.900 --> 01:18:46.300 But favorite database GUI tool these days? 01:18:46.300 --> 01:18:48.820 - Oh, I'm going to give it a shot. 01:18:48.820 --> 01:18:52.300 - Do you want to go with PSQL? 01:18:52.300 --> 01:18:54.860 - No, I'm going to risk the wrath of the Postgres crowd 01:18:54.860 --> 01:18:57.060 and say Archetype. 01:18:57.060 --> 01:18:57.460 - Archetype. 01:18:57.460 --> 01:19:00.660 All right, yeah, Archetype looks pretty neat. 01:19:00.660 --> 01:19:05.620 And then, normally I ask about Python extensions, 01:19:05.620 --> 01:19:07.660 how about Postgres, or Python packages, 01:19:07.660 --> 01:19:10.980 how about Postgres extensions for a notable one? 01:19:10.980 --> 01:19:15.220 - Yeah, so I think the Postgres extension 01:19:15.220 --> 01:19:17.020 that everybody should install really, 01:19:17.020 --> 01:19:20.280 if you're using Postgres of course, is pgstatstatements. 01:19:20.280 --> 01:19:21.420 If you're using a cloud provider, 01:19:21.420 --> 01:19:23.880 it's probably on by default already. 01:19:23.880 --> 01:19:26.440 So in that case, I'd give a shout out 01:19:26.440 --> 01:19:28.740 to my favorite, autoexplain. 01:19:30.060 --> 01:19:32.020 - Cool, awesome. 01:19:32.020 --> 01:19:34.060 All right, Michael, well, thanks for giving us insight 01:19:34.060 --> 01:19:36.120 to all these different things we can do 01:19:36.120 --> 01:19:38.820 to make our databases work better, especially Postgres. 01:19:38.820 --> 01:19:40.700 And thanks for making PG Mustard. 01:19:40.700 --> 01:19:42.340 I'm sure it'll help a lot of people. 01:19:42.340 --> 01:19:43.180 Final call to action, 01:19:43.180 --> 01:19:45.580 people wanna get better with their databases. 01:19:45.580 --> 01:19:47.420 Maybe pick up some of these tools or these ideas. 01:19:47.420 --> 01:19:48.500 What do you tell them? 01:19:48.500 --> 01:19:52.940 - Ooh, don't be scared to learn. 01:19:52.940 --> 01:19:56.020 It probably looks more intimidating than it is. 01:19:56.020 --> 01:19:58.000 The Postgres documentation's awesome, 01:19:58.000 --> 01:20:00.720 so a lot of the other databases out there. 01:20:00.720 --> 01:20:03.920 I know it's a bit of a meme and a joke these days, 01:20:03.920 --> 01:20:05.520 but do read the documentation. 01:20:05.520 --> 01:20:09.800 It's written, it's been written and revised over years 01:20:09.800 --> 01:20:11.160 by some really smart people, 01:20:11.160 --> 01:20:12.800 but it's very approachable still. 01:20:12.800 --> 01:20:15.480 - Fantastic, all right. 01:20:15.480 --> 01:20:17.320 Well, thanks a bunch.