Skip to content

Commit 371ee4e

Browse files
authored
Merge pull request #159 from manikanta-hitunik-com/patch-159
Update 474-python-performance-for-data-science.txt
2 parents b191fca + 72ff81c commit 371ee4e

File tree

1 file changed

+30
-31
lines changed

1 file changed

+30
-31
lines changed

transcripts/474-python-performance-for-data-science.txt

Lines changed: 30 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
00:00:46 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy,
2020

21-
00:00:50 and follow the podcast using @talkpython, both accounts over at fostodon.org.
21+
00:00:50 and follow the podcast using @talkpython, both accounts over at fosstodon.org.
2222

2323
00:00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm.
2424

@@ -300,7 +300,7 @@
300300

301301
00:10:56 you might actually care line by line how much time is being taken. And that can be a better
302302

303-
00:11:00 way to think about it. And so I think the tool is called LineProf. I forget the exact URL, but
303+
00:11:00 way to think about it. And so I think the tool is called LineProfiler. I forget the exact URL, but
304304

305305
00:11:06 it's an excellent tool in Python for, there's one in R and there's an equivalent one. Yes. Robert
306306

@@ -492,7 +492,7 @@
492492

493493
00:18:56 And that kind of data structures before you apply Numba JIT compilation to it. Does that mean
494494

495-
00:19:03 list as in bracket bracket or these NumPy type vector things? We all have different definitions.
495+
00:19:03 list as in bracket or these NumPy type vector things? We all have different definitions.
496496

497497
00:19:10 Yes. That's true. Array to Bay. Generally, yeah. Usually the go-to I talked about is a NumPy array.
498498

@@ -536,7 +536,7 @@
536536

537537
00:20:49 special class that you can create. The downside by the way, is that, and the reason we have those,
538538

539-
00:20:55 and we don't just take, historically Numby used to try and let you pass in a Python list,
539+
00:20:55 and we don't just take, historically Numba used to try and let you pass in a Python list,
540540

541541
00:21:00 is that wrapper function would have to go recursively through the list of list of lists
542542

@@ -598,11 +598,11 @@
598598

599599
00:23:20 GPU programming. I will say Numba might not be the best place to start with GPU programming in
600600

601-
00:23:25 Python because there's a great project called Coupy, C-U-P-Y, that is literally a copy of NumPy,
601+
00:23:25 Python because there's a great project called Cupy, C-U-P-Y, that is literally a copy of NumPy,
602602

603-
00:23:33 but does all of the computation on the GPU. And Coupy works great with Numba. So I often tell
603+
00:23:33 but does all of the computation on the GPU. And CUPY works great with Numba. So I often tell
604604

605-
00:23:38 people, if you're curious, start with Coupy, use some of those NumPy functions to get a sense of,
605+
00:23:38 people, if you're curious, start with CUPY, use some of those NumPy functions to get a sense of,
606606

607607
00:23:44 you know, when is an array big enough to matter on the GPU, that sort of thing. And then when you
608608

@@ -660,11 +660,11 @@
660660

661661
00:25:58 specifically built for it. Maybe DuckDB has got something going on here, but also MongoDB has,
662662

663-
00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at LanceDB
663+
00:26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at Lance DB
664664

665665
00:26:09 is one I've seen mentioned by used by a couple of projects. That's just for vector stuff. It
666666

667-
00:26:14 doesn't do anything else. LanceDB. LanceDB. Okay. I heard about it in the context of another Python
667+
00:26:14 doesn't do anything else. Lance DB. Lance DB. Okay. I heard about it in the context of another Python
668668

669669
00:26:20 LLM project. Well, that's news to me, but it is a developer friendly open source database for AI.
670670

@@ -702,7 +702,7 @@
702702

703703
00:27:46 those two cases and will generate different code for those two cases. So this is stuff that you as
704704

705-
00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole nother
705+
00:27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole another
706706

707707
00:27:56 level. So you were like, okay, well, if it's laid out in this order, it's probably this, it appears
708708

@@ -774,7 +774,7 @@
774774

775775
00:30:46 really different use cases and they're getting the same JIT and it has to work for both of them. But
776776

777-
00:30:50 you know, combinatorially explode that problem, right?
777+
00:30:50 you know, combinatorically explode that problem, right?
778778

779779
00:30:53 - Yeah. And you know, all the different hardware, I mean, Numba supports a lot of different
780780

@@ -844,7 +844,7 @@
844844

845845
00:33:29 see how rapidly it can evolve. That'll be really interesting. - Yeah. And this whole copy and
846846

847-
00:33:33 patch and jit is we often hear people say, I'm a computer, I have a computer science degree.
847+
00:33:33 patch and Jit is we often hear people say, I'm a computer, I have a computer science degree.
848848

849849
00:33:38 And I think what that really means is I have a software engineering degree in, or I am a software
850850

@@ -854,7 +854,7 @@
854854

855855
00:33:55 I write JSON API. So I talk to databases. This is like true new research out of legitimate computer
856856

857-
00:34:02 science, right? This copy and patch, jit. - Yeah. They mentioned, I mean, they cite a paper from
857+
00:34:02 science, right? This copy and patch, Jit. - Yeah. They mentioned, I mean, they cite a paper from
858858

859859
00:34:05 2021 and in computer science, going from paper to implementation in one of the most popular
860860

@@ -1046,7 +1046,7 @@
10461046

10471047
00:41:24 you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and
10481048

1049-
00:41:29 embed it into your JIT, which might be have other LLVM bitcodes. So then you can optimize, you can
1049+
00:41:29 embed it into your JIT, which might be have other LLVM bit codes. So then you can optimize, you can
10501050

10511051
00:41:35 have a function you wrote in Python that calls a function in C and you could actually optimize
10521052

@@ -1104,7 +1104,7 @@
11041104

11051105
00:43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust. I mean,
11061106

1107-
00:43:32 if we move to WebAssembly and like PyScript, Pyodad, Land a lot, having that right in,
1107+
00:43:32 if we move to Web Assembly and like PyScript, Pyodide, Land a lot, having that right in,
11081108

11091109
00:43:36 there's a non-zero probability, but it's not a high number, I suppose. Speaking of something
11101110

@@ -1158,19 +1158,19 @@
11581158

11591159
00:45:23 or Fortran, as long as you weren't touching Python objects directly, you could release
11601160

1161-
00:45:27 the gill. And so Python, so especially in the scientific and computing and data science
1161+
00:45:27 the GIL. And so Python, so especially in the scientific and computing and data science
11621162

11631163
00:45:31 space, where multi-threaded code has been around for a long time and we've been using
11641164

11651165
00:45:35 it and it's fine, Dask, you can use workers with threads or processes or both. And so
11661166

11671167
00:45:40 I frequently will use Dask with four threads and that's totally fine because most of the
11681168

1169-
00:45:44 codes in NumPy and Pandas, that release the gill. But that's only a few use cases. And
1169+
00:45:44 codes in NumPy and Pandas, that release the GIL. But that's only a few use cases. And
11701170

11711171
00:45:48 so if you want to expand that to the whole Python interpreter, you have to get rid of
11721172

1173-
00:45:52 the gill. You have to have a more fine-grained approach to concurrency. And so this proposal
1173+
00:45:52 the GIL. You have to have a more fine-grained approach to concurrency. And so this proposal
11741174

11751175
00:45:58 from Sam Gross at Meta was basically a, one of many historical attempts to kind of make
11761176

@@ -1192,7 +1192,7 @@
11921192

11931193
00:46:52 is now 50% slower. And that's what most people do and we don't accept it. All right. That's
11941194

1195-
00:46:58 the one of the sides, you know, the galactomy and all that was kind of in that realm, I believe.
1195+
00:46:58 the one of the sides, you know, the Gilectomy and all that was kind of in that realm, I believe.
11961196

11971197
00:47:02 The other is yet to be determined, I think, is much like the Python two to three shift.
11981198

@@ -1266,7 +1266,7 @@
12661266

12671267
00:49:40 you know, if even if I have like read only data, I might if I have to load two gigabytes of data
12681268

1269-
00:49:44 in every process, and I want to start start 32 of them because I have a nice big computer.
1269+
00:49:44 in every process, and I want to start 32 of them because I have a nice big computer.
12701270

12711271
00:49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations.
12721272

@@ -1370,7 +1370,7 @@
13701370

13711371
00:53:43 three 13. So now we're at the first rung of the ladder of iOS and Android support in CPython.
13721372

1373-
00:53:48 That's awesome. Poga and briefcase, the two components of beware are really focused again
1373+
00:53:48 That's awesome. Pega and briefcase, the two components of beware are really focused again
13741374

13751375
00:53:52 on that. Yeah. How do I make apps? How do I make it for desktop and mobile? And so, but it's,
13761376

@@ -1434,9 +1434,9 @@
14341434

14351435
00:56:14 this and you can get the free threaded Docker version or whatever. Right. We've already put
14361436

1437-
00:56:18 out conda packages as well. So if you want to build a conda environment, yeah, actually, if
1437+
00:56:18 out Conda packages as well. So if you want to build a Conda environment, yeah, actually, if
14381438

1439-
00:56:21 you jump over to the, the PI free thread page. Yeah. Tell people about this. Yeah. We didn't
1439+
00:56:21 you jump over to the, the Py free thread page. Yeah. Tell people about this. Yeah. We didn't
14401440

14411441
00:56:25 make this. This is the, the community made this, the scientific Python community put this together.
14421442

@@ -1448,11 +1448,11 @@
14481448

14491449
00:56:44 what are your options for installing the free threaded CPython? You can get it from Ubuntu or
14501450

1451-
00:56:48 high-end for conda. If you go look at the you know, and you could build it from source or get
1451+
00:56:48 high-end for Conda. If you go look at the you know, and you could build it from source or get
14521452

14531453
00:56:53 a container. Yeah. So these are, again, this is very focused on the kind of things the scientific
14541454

1455-
00:56:57 Python community cares about, but, but these are things like, you know, have we ported Scython?
1455+
00:56:57 Python community cares about, but, but these are things like, you know, have we ported Cython?
14561456

14571457
00:57:00 Have we ported NumPy? Is it being automatically tested? Which release has it? And the nice thing
14581458

@@ -1466,7 +1466,7 @@
14661466

14671467
00:57:26 can choose to upload wheels for both versions and make it easier for people to test out stuff. So
14681468

1469-
00:57:31 for example, I mean, Scython, it looks like there are nightly wheels already being built. And so
1469+
00:57:31 for example, I mean, Cython, it looks like there are nightly wheels already being built. And so
14701470

14711471
00:57:36 this is, they're moving fast and, and, you know, definitely, and our condo, we're also very
14721472

@@ -1482,7 +1482,7 @@
14821482

14831483
00:58:01 There was something like this for Python two to three. I remember it showed like the top,
14841484

1485-
00:58:05 top 1000 packages on IPI. And then how many of them were compatible with Python three,
1485+
00:58:05 top 1000 packages on PyPI. And then how many of them were compatible with Python three,
14861486

14871487
00:58:11 basically by expressing their language tag or something like that.
14881488

@@ -1566,7 +1566,7 @@
15661566

15671567
01:01:05 but I think that's a really great way to approach it because often there's always been this tension
15681568

1569-
01:01:10 of, well, if I make Python statically compilable, is it just, you know, C with, you know, different
1569+
01:01:10 of, well, if I make Python statically compliable, is it just, you know, C with, you know, different
15701570

15711571
01:01:16 keywords? Do I lose the thing I loved about Python, which was how quickly I could express my
15721572

@@ -1640,13 +1640,13 @@
16401640

16411641
01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things
16421642

1643-
01:04:03 that Togr are doing, doing their things to try to make this happen. But I feel like they're kind of
1643+
01:04:03 that Torg are doing, doing their things to try to make this happen. But I feel like they're kind of
16441644

16451645
01:04:07 looking in at Python and go like, how can we grab what we need out of Python and jam it into an
16461646

16471647
01:04:12 executable and make it work? Like, should we be encouraging the core developers to just go like a,
16481648

1649-
01:04:16 a Python, MyScript, --windows and they're out, you get in .exe or something.
1649+
01:04:16 a Python, PyScript, --windows and they're out, you get in .exe or something.
16501650

16511651
01:04:22 I don't know, actually, that would be a great question. Actually, I would ask Russell that
16521652

@@ -1741,4 +1741,3 @@
17411741
01:07:52 at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening.
17421742

17431743
01:07:58 I really appreciate it. Now get out there and write some Python code.
1744-

0 commit comments

Comments
 (0)