Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 30 additions & 31 deletions transcripts/474-python-performance-for-data-science.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

00:00:46 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy,

00:00:50 and follow the podcast using @talkpython, both accounts over at fostodon.org.
00:00:50 and follow the podcast using @talkpython, both accounts over at fosstodon.org.

00:00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm.

Expand Down Expand Up @@ -300,7 +300,7 @@

00:10:56 you might actually care line by line how much time is being taken. And that can be a better

00:11:00 way to think about it. And so I think the tool is called LineProf. I forget the exact URL, but
00:11:00 way to think about it. And so I think the tool is called LineProfiler. I forget the exact URL, but

00:11:06 it's an excellent tool in Python for, there's one in R and there's an equivalent one. Yes. Robert

Expand Down Expand Up @@ -492,7 +492,7 @@

00:18:56 And that kind of data structures before you apply Numba JIT compilation to it. Does that mean

00:19:03 list as in bracket bracket or these NumPy type vector things? We all have different definitions.
00:19:03 list as in bracket or these NumPy type vector things? We all have different definitions.

00:19:10 Yes. That's true. Array to Bay. Generally, yeah. Usually the go-to I talked about is a NumPy array.

Expand Down Expand Up @@ -536,7 +536,7 @@

00:20:49 special class that you can create. The downside by the way, is that, and the reason we have those,

00:20:55 and we don't just take, historically Numby used to try and let you pass in a Python list,
00:20:55 and we don't just take, historically Numba used to try and let you pass in a Python list,

00:21:00 is that wrapper function would have to go recursively through the list of list of lists

Expand Down Expand Up @@ -598,11 +598,11 @@

00:23:20 GPU programming. I will say Numba might not be the best place to start with GPU programming in

00:23:25 Python because there's a great project called Coupy, C-U-P-Y, that is literally a copy of NumPy,
00:23:25 Python because there's a great project called Cupy, C-U-P-Y, that is literally a copy of NumPy,

00:23:33 but does all of the computation on the GPU. And Coupy works great with Numba. So I often tell
00:23:33 but does all of the computation on the GPU. And CUPY works great with Numba. So I often tell

00:23:38 people, if you're curious, start with Coupy, use some of those NumPy functions to get a sense of,
00:23:38 people, if you're curious, start with CUPY, use some of those NumPy functions to get a sense of,

00:23:44 you know, when is an array big enough to matter on the GPU, that sort of thing. And then when you

Expand Down Expand Up @@ -660,11 +660,11 @@

00:25:58 specifically built for it. Maybe DuckDB has got something going on here, but also MongoDB has,

00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at LanceDB
00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at Lance DB

00:26:09 is one I've seen mentioned by used by a couple of projects. That's just for vector stuff. It

00:26:14 doesn't do anything else. LanceDB. LanceDB. Okay. I heard about it in the context of another Python
00:26:14 doesn't do anything else. Lance DB. Lance DB. Okay. I heard about it in the context of another Python

00:26:20 LLM project. Well, that's news to me, but it is a developer friendly open source database for AI.

Expand Down Expand Up @@ -702,7 +702,7 @@

00:27:46 those two cases and will generate different code for those two cases. So this is stuff that you as

00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole nother
00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole another

00:27:56 level. So you were like, okay, well, if it's laid out in this order, it's probably this, it appears

Expand Down Expand Up @@ -774,7 +774,7 @@

00:30:46 really different use cases and they're getting the same JIT and it has to work for both of them. But

00:30:50 you know, combinatorially explode that problem, right?
00:30:50 you know, combinatorically explode that problem, right?

00:30:53 - Yeah. And you know, all the different hardware, I mean, Numba supports a lot of different

Expand Down Expand Up @@ -844,7 +844,7 @@

00:33:29 see how rapidly it can evolve. That'll be really interesting. - Yeah. And this whole copy and

00:33:33 patch and jit is we often hear people say, I'm a computer, I have a computer science degree.
00:33:33 patch and Jit is we often hear people say, I'm a computer, I have a computer science degree.

00:33:38 And I think what that really means is I have a software engineering degree in, or I am a software

Expand All @@ -854,7 +854,7 @@

00:33:55 I write JSON API. So I talk to databases. This is like true new research out of legitimate computer

00:34:02 science, right? This copy and patch, jit. - Yeah. They mentioned, I mean, they cite a paper from
00:34:02 science, right? This copy and patch, Jit. - Yeah. They mentioned, I mean, they cite a paper from

00:34:05 2021 and in computer science, going from paper to implementation in one of the most popular

Expand Down Expand Up @@ -1046,7 +1046,7 @@

00:41:24 you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and

00:41:29 embed it into your JIT, which might be have other LLVM bitcodes. So then you can optimize, you can
00:41:29 embed it into your JIT, which might be have other LLVM bit codes. So then you can optimize, you can

00:41:35 have a function you wrote in Python that calls a function in C and you could actually optimize

Expand Down Expand Up @@ -1104,7 +1104,7 @@

00:43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust. I mean,

00:43:32 if we move to WebAssembly and like PyScript, Pyodad, Land a lot, having that right in,
00:43:32 if we move to Web Assembly and like PyScript, Pyodide, Land a lot, having that right in,

00:43:36 there's a non-zero probability, but it's not a high number, I suppose. Speaking of something

Expand Down Expand Up @@ -1158,19 +1158,19 @@

00:45:23 or Fortran, as long as you weren't touching Python objects directly, you could release

00:45:27 the gill. And so Python, so especially in the scientific and computing and data science
00:45:27 the GIL. And so Python, so especially in the scientific and computing and data science

00:45:31 space, where multi-threaded code has been around for a long time and we've been using

00:45:35 it and it's fine, Dask, you can use workers with threads or processes or both. And so

00:45:40 I frequently will use Dask with four threads and that's totally fine because most of the

00:45:44 codes in NumPy and Pandas, that release the gill. But that's only a few use cases. And
00:45:44 codes in NumPy and Pandas, that release the GIL. But that's only a few use cases. And

00:45:48 so if you want to expand that to the whole Python interpreter, you have to get rid of

00:45:52 the gill. You have to have a more fine-grained approach to concurrency. And so this proposal
00:45:52 the GIL. You have to have a more fine-grained approach to concurrency. And so this proposal

00:45:58 from Sam Gross at Meta was basically a, one of many historical attempts to kind of make

Expand All @@ -1192,7 +1192,7 @@

00:46:52 is now 50% slower. And that's what most people do and we don't accept it. All right. That's

00:46:58 the one of the sides, you know, the galactomy and all that was kind of in that realm, I believe.
00:46:58 the one of the sides, you know, the Gilectomy and all that was kind of in that realm, I believe.

00:47:02 The other is yet to be determined, I think, is much like the Python two to three shift.

Expand Down Expand Up @@ -1266,7 +1266,7 @@

00:49:40 you know, if even if I have like read only data, I might if I have to load two gigabytes of data

00:49:44 in every process, and I want to start start 32 of them because I have a nice big computer.
00:49:44 in every process, and I want to start 32 of them because I have a nice big computer.

00:49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations.

Expand Down Expand Up @@ -1370,7 +1370,7 @@

00:53:43 three 13. So now we're at the first rung of the ladder of iOS and Android support in CPython.

00:53:48 That's awesome. Poga and briefcase, the two components of beware are really focused again
00:53:48 That's awesome. Pega and briefcase, the two components of beware are really focused again

00:53:52 on that. Yeah. How do I make apps? How do I make it for desktop and mobile? And so, but it's,

Expand Down Expand Up @@ -1434,9 +1434,9 @@

00:56:14 this and you can get the free threaded Docker version or whatever. Right. We've already put

00:56:18 out conda packages as well. So if you want to build a conda environment, yeah, actually, if
00:56:18 out Conda packages as well. So if you want to build a Conda environment, yeah, actually, if

00:56:21 you jump over to the, the PI free thread page. Yeah. Tell people about this. Yeah. We didn't
00:56:21 you jump over to the, the Py free thread page. Yeah. Tell people about this. Yeah. We didn't

00:56:25 make this. This is the, the community made this, the scientific Python community put this together.

Expand All @@ -1448,11 +1448,11 @@

00:56:44 what are your options for installing the free threaded CPython? You can get it from Ubuntu or

00:56:48 high-end for conda. If you go look at the you know, and you could build it from source or get
00:56:48 high-end for Conda. If you go look at the you know, and you could build it from source or get

00:56:53 a container. Yeah. So these are, again, this is very focused on the kind of things the scientific

00:56:57 Python community cares about, but, but these are things like, you know, have we ported Scython?
00:56:57 Python community cares about, but, but these are things like, you know, have we ported Cython?

00:57:00 Have we ported NumPy? Is it being automatically tested? Which release has it? And the nice thing

Expand All @@ -1466,7 +1466,7 @@

00:57:26 can choose to upload wheels for both versions and make it easier for people to test out stuff. So

00:57:31 for example, I mean, Scython, it looks like there are nightly wheels already being built. And so
00:57:31 for example, I mean, Cython, it looks like there are nightly wheels already being built. And so

00:57:36 this is, they're moving fast and, and, you know, definitely, and our condo, we're also very

Expand All @@ -1482,7 +1482,7 @@

00:58:01 There was something like this for Python two to three. I remember it showed like the top,

00:58:05 top 1000 packages on IPI. And then how many of them were compatible with Python three,
00:58:05 top 1000 packages on PyPI. And then how many of them were compatible with Python three,

00:58:11 basically by expressing their language tag or something like that.

Expand Down Expand Up @@ -1566,7 +1566,7 @@

01:01:05 but I think that's a really great way to approach it because often there's always been this tension

01:01:10 of, well, if I make Python statically compilable, is it just, you know, C with, you know, different
01:01:10 of, well, if I make Python statically compliable, is it just, you know, C with, you know, different

01:01:16 keywords? Do I lose the thing I loved about Python, which was how quickly I could express my

Expand Down Expand Up @@ -1640,13 +1640,13 @@

01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things

01:04:03 that Togr are doing, doing their things to try to make this happen. But I feel like they're kind of
01:04:03 that Torg are doing, doing their things to try to make this happen. But I feel like they're kind of

01:04:07 looking in at Python and go like, how can we grab what we need out of Python and jam it into an

01:04:12 executable and make it work? Like, should we be encouraging the core developers to just go like a,

01:04:16 a Python, MyScript, --windows and they're out, you get in .exe or something.
01:04:16 a Python, PyScript, --windows and they're out, you get in .exe or something.

01:04:22 I don't know, actually, that would be a great question. Actually, I would ask Russell that

Expand Down Expand Up @@ -1741,4 +1741,3 @@
01:07:52 at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening.

01:07:58 I really appreciate it. Now get out there and write some Python code.