diff --git a/transcripts/474-python-performance-for-data-science.txt b/transcripts/474-python-performance-for-data-science.txt index bc4a1c8f..e439d047 100644 --- a/transcripts/474-python-performance-for-data-science.txt +++ b/transcripts/474-python-performance-for-data-science.txt @@ -18,7 +18,7 @@ 00:00:46 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy, -00:00:50 and follow the podcast using @talkpython, both accounts over at fostodon.org. +00:00:50 and follow the podcast using @talkpython, both accounts over at fosstodon.org. 00:00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm. @@ -300,7 +300,7 @@ 00:10:56 you might actually care line by line how much time is being taken. And that can be a better -00:11:00 way to think about it. And so I think the tool is called LineProf. I forget the exact URL, but +00:11:00 way to think about it. And so I think the tool is called LineProfiler. I forget the exact URL, but 00:11:06 it's an excellent tool in Python for, there's one in R and there's an equivalent one. Yes. Robert @@ -492,7 +492,7 @@ 00:18:56 And that kind of data structures before you apply Numba JIT compilation to it. Does that mean -00:19:03 list as in bracket bracket or these NumPy type vector things? We all have different definitions. +00:19:03 list as in bracket or these NumPy type vector things? We all have different definitions. 00:19:10 Yes. That's true. Array to Bay. Generally, yeah. Usually the go-to I talked about is a NumPy array. @@ -536,7 +536,7 @@ 00:20:49 special class that you can create. The downside by the way, is that, and the reason we have those, -00:20:55 and we don't just take, historically Numby used to try and let you pass in a Python list, +00:20:55 and we don't just take, historically Numba used to try and let you pass in a Python list, 00:21:00 is that wrapper function would have to go recursively through the list of list of lists @@ -598,11 +598,11 @@ 00:23:20 GPU programming. I will say Numba might not be the best place to start with GPU programming in -00:23:25 Python because there's a great project called Coupy, C-U-P-Y, that is literally a copy of NumPy, +00:23:25 Python because there's a great project called Cupy, C-U-P-Y, that is literally a copy of NumPy, -00:23:33 but does all of the computation on the GPU. And Coupy works great with Numba. So I often tell +00:23:33 but does all of the computation on the GPU. And CUPY works great with Numba. So I often tell -00:23:38 people, if you're curious, start with Coupy, use some of those NumPy functions to get a sense of, +00:23:38 people, if you're curious, start with CUPY, use some of those NumPy functions to get a sense of, 00:23:44 you know, when is an array big enough to matter on the GPU, that sort of thing. And then when you @@ -660,11 +660,11 @@ 00:25:58 specifically built for it. Maybe DuckDB has got something going on here, but also MongoDB has, -00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at LanceDB +00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at Lance DB 00:26:09 is one I've seen mentioned by used by a couple of projects. That's just for vector stuff. It -00:26:14 doesn't do anything else. LanceDB. LanceDB. Okay. I heard about it in the context of another Python +00:26:14 doesn't do anything else. Lance DB. Lance DB. Okay. I heard about it in the context of another Python 00:26:20 LLM project. Well, that's news to me, but it is a developer friendly open source database for AI. @@ -702,7 +702,7 @@ 00:27:46 those two cases and will generate different code for those two cases. So this is stuff that you as -00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole nother +00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole another 00:27:56 level. So you were like, okay, well, if it's laid out in this order, it's probably this, it appears @@ -774,7 +774,7 @@ 00:30:46 really different use cases and they're getting the same JIT and it has to work for both of them. But -00:30:50 you know, combinatorially explode that problem, right? +00:30:50 you know, combinatorically explode that problem, right? 00:30:53 - Yeah. And you know, all the different hardware, I mean, Numba supports a lot of different @@ -844,7 +844,7 @@ 00:33:29 see how rapidly it can evolve. That'll be really interesting. - Yeah. And this whole copy and -00:33:33 patch and jit is we often hear people say, I'm a computer, I have a computer science degree. +00:33:33 patch and Jit is we often hear people say, I'm a computer, I have a computer science degree. 00:33:38 And I think what that really means is I have a software engineering degree in, or I am a software @@ -854,7 +854,7 @@ 00:33:55 I write JSON API. So I talk to databases. This is like true new research out of legitimate computer -00:34:02 science, right? This copy and patch, jit. - Yeah. They mentioned, I mean, they cite a paper from +00:34:02 science, right? This copy and patch, Jit. - Yeah. They mentioned, I mean, they cite a paper from 00:34:05 2021 and in computer science, going from paper to implementation in one of the most popular @@ -1046,7 +1046,7 @@ 00:41:24 you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and -00:41:29 embed it into your JIT, which might be have other LLVM bitcodes. So then you can optimize, you can +00:41:29 embed it into your JIT, which might be have other LLVM bit codes. So then you can optimize, you can 00:41:35 have a function you wrote in Python that calls a function in C and you could actually optimize @@ -1104,7 +1104,7 @@ 00:43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust. I mean, -00:43:32 if we move to WebAssembly and like PyScript, Pyodad, Land a lot, having that right in, +00:43:32 if we move to Web Assembly and like PyScript, Pyodide, Land a lot, having that right in, 00:43:36 there's a non-zero probability, but it's not a high number, I suppose. Speaking of something @@ -1158,7 +1158,7 @@ 00:45:23 or Fortran, as long as you weren't touching Python objects directly, you could release -00:45:27 the gill. And so Python, so especially in the scientific and computing and data science +00:45:27 the GIL. And so Python, so especially in the scientific and computing and data science 00:45:31 space, where multi-threaded code has been around for a long time and we've been using @@ -1166,11 +1166,11 @@ 00:45:40 I frequently will use Dask with four threads and that's totally fine because most of the -00:45:44 codes in NumPy and Pandas, that release the gill. But that's only a few use cases. And +00:45:44 codes in NumPy and Pandas, that release the GIL. But that's only a few use cases. And 00:45:48 so if you want to expand that to the whole Python interpreter, you have to get rid of -00:45:52 the gill. You have to have a more fine-grained approach to concurrency. And so this proposal +00:45:52 the GIL. You have to have a more fine-grained approach to concurrency. And so this proposal 00:45:58 from Sam Gross at Meta was basically a, one of many historical attempts to kind of make @@ -1192,7 +1192,7 @@ 00:46:52 is now 50% slower. And that's what most people do and we don't accept it. All right. That's -00:46:58 the one of the sides, you know, the galactomy and all that was kind of in that realm, I believe. +00:46:58 the one of the sides, you know, the Gilectomy and all that was kind of in that realm, I believe. 00:47:02 The other is yet to be determined, I think, is much like the Python two to three shift. @@ -1266,7 +1266,7 @@ 00:49:40 you know, if even if I have like read only data, I might if I have to load two gigabytes of data -00:49:44 in every process, and I want to start start 32 of them because I have a nice big computer. +00:49:44 in every process, and I want to start 32 of them because I have a nice big computer. 00:49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations. @@ -1370,7 +1370,7 @@ 00:53:43 three 13. So now we're at the first rung of the ladder of iOS and Android support in CPython. -00:53:48 That's awesome. Poga and briefcase, the two components of beware are really focused again +00:53:48 That's awesome. Pega and briefcase, the two components of beware are really focused again 00:53:52 on that. Yeah. How do I make apps? How do I make it for desktop and mobile? And so, but it's, @@ -1434,9 +1434,9 @@ 00:56:14 this and you can get the free threaded Docker version or whatever. Right. We've already put -00:56:18 out conda packages as well. So if you want to build a conda environment, yeah, actually, if +00:56:18 out Conda packages as well. So if you want to build a Conda environment, yeah, actually, if -00:56:21 you jump over to the, the PI free thread page. Yeah. Tell people about this. Yeah. We didn't +00:56:21 you jump over to the, the Py free thread page. Yeah. Tell people about this. Yeah. We didn't 00:56:25 make this. This is the, the community made this, the scientific Python community put this together. @@ -1448,11 +1448,11 @@ 00:56:44 what are your options for installing the free threaded CPython? You can get it from Ubuntu or -00:56:48 high-end for conda. If you go look at the you know, and you could build it from source or get +00:56:48 high-end for Conda. If you go look at the you know, and you could build it from source or get 00:56:53 a container. Yeah. So these are, again, this is very focused on the kind of things the scientific -00:56:57 Python community cares about, but, but these are things like, you know, have we ported Scython? +00:56:57 Python community cares about, but, but these are things like, you know, have we ported Cython? 00:57:00 Have we ported NumPy? Is it being automatically tested? Which release has it? And the nice thing @@ -1466,7 +1466,7 @@ 00:57:26 can choose to upload wheels for both versions and make it easier for people to test out stuff. So -00:57:31 for example, I mean, Scython, it looks like there are nightly wheels already being built. And so +00:57:31 for example, I mean, Cython, it looks like there are nightly wheels already being built. And so 00:57:36 this is, they're moving fast and, and, you know, definitely, and our condo, we're also very @@ -1482,7 +1482,7 @@ 00:58:01 There was something like this for Python two to three. I remember it showed like the top, -00:58:05 top 1000 packages on IPI. And then how many of them were compatible with Python three, +00:58:05 top 1000 packages on PyPI. And then how many of them were compatible with Python three, 00:58:11 basically by expressing their language tag or something like that. @@ -1566,7 +1566,7 @@ 01:01:05 but I think that's a really great way to approach it because often there's always been this tension -01:01:10 of, well, if I make Python statically compilable, is it just, you know, C with, you know, different +01:01:10 of, well, if I make Python statically compliable, is it just, you know, C with, you know, different 01:01:16 keywords? Do I lose the thing I loved about Python, which was how quickly I could express my @@ -1640,13 +1640,13 @@ 01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things -01:04:03 that Togr are doing, doing their things to try to make this happen. But I feel like they're kind of +01:04:03 that Torg are doing, doing their things to try to make this happen. But I feel like they're kind of 01:04:07 looking in at Python and go like, how can we grab what we need out of Python and jam it into an 01:04:12 executable and make it work? Like, should we be encouraging the core developers to just go like a, -01:04:16 a Python, MyScript, --windows and they're out, you get in .exe or something. +01:04:16 a Python, PyScript, --windows and they're out, you get in .exe or something. 01:04:22 I don't know, actually, that would be a great question. Actually, I would ask Russell that @@ -1741,4 +1741,3 @@ 01:07:52 at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. 01:07:58 I really appreciate it. Now get out there and write some Python code. -