String Indexing and Splitting |Section 1|Celestial warrior

11 months ago
37

1

00:00:00,030 --> 00:00:04,290

Hi again, in the previous lecture we

talked about strings and in this lecture

3

00:00:04,290 --> 00:00:10,500

again we are going deeper into learning

strings and specifically we're going to add

5

00:00:10,500 --> 00:00:16,260

indexing. Indexing is a very important

concept not only with strings but

7

00:00:16,260 --> 00:00:21,150

with all the data type that we'll be

looking at later such as lists,

9

00:00:21,150 --> 00:00:26,880

tuples and dictionaries, so please follow

this lecture. Yeah, let's talk about

11

00:00:26,880 --> 00:00:35,070

indexing. Let me open the Python

interactive session. Let me create a

13

00:00:35,070 --> 00:00:44,070

string, "Hi there!"

exclamation mark and close quote.

15

00:00:44,070 --> 00:00:48,989

Now when you create a string, every string

that you create Python under the hood

17

00:00:48,989 --> 00:00:57,930

what it does is it assigns a number to

each or the items of your string so it

19

00:00:57,930 --> 00:01:06,780

starts from 0 so H will be assigned an

index of 0 and then a 1 for I, 2 for the

21

00:01:06,780 --> 00:01:12,799

space, 3 for T, five, six

seven, eight.

23

00:01:12,799 --> 00:01:18,990

So the exclamation mark character will

will have an index of eight, so you would

25

00:01:18,990 --> 00:01:26,360

ask why is this useful? Well this allows

you to extract certain parts from a

27

00:01:26,360 --> 00:01:33,600

string and there is a certain notation to

do that you'd want to access the string.

29

00:01:33,600 --> 00:01:39,270

So in this case that's variable C

holding the string, then you want to use

31

00:01:39,270 --> 00:01:44,579

square brackets so an opening square

bracket and then a closing square

33

00:01:44,579 --> 00:01:49,350

bracket and inside those square brackets you

pass the index of the item that you want

35

00:01:49,350 --> 00:01:58,170

to extract. Let's say I want the T

character so that would be 0 1 2 3.

37

00:01:58,170 --> 00:02:05,490

I pass number 3 there as an index and that

will extract for me the letter T which is

39

00:02:05,490 --> 00:02:10,800

actually a string.

So you can check the type of the

41

00:02:10,800 --> 00:02:15,810

output and you'll see that that's a

string, so a string is made of strings.

43

00:02:15,810 --> 00:02:22,950

You can say so. If you wanted H you'd

past zero and if you wanted a lost

45

00:02:22,950 --> 00:02:28,260

character well you might have to count

so one two three four five six seven

47

00:02:28,260 --> 00:02:33,330

eight.

And so, and sorry including type there.

49

00:02:33,330 --> 00:02:38,880

Sorry about that!

so c8 would give you the exclamation

51

00:02:38,880 --> 00:02:45,720

mark, but there's also another indexing

system there that Python uses so in this

53

00:02:45,720 --> 00:02:50,519

case we had to count from the beginning

to the end of the string but wouldn't it

55

00:02:50,519 --> 00:02:56,370

be useful if you had another system that

lets you count from the end, so from

57

00:02:56,370 --> 00:03:00,780

right to left?

Well, such a system exists and it

59

00:03:00,780 --> 00:03:08,400

starts from minus 1, so the exclamation

mark would be minus 1 now that you can

61

00:03:08,400 --> 00:03:18,989

extract that with that. Minus 2 would be

E, R would be minus 3 and so on up to H, so we

63

00:03:18,989 --> 00:03:25,019

have two indexing systems there which

allow you to extract items depending on

65

00:03:25,019 --> 00:03:30,090

whether the items are in the beginning

or near the end. Now what if you want to

67

00:03:30,090 --> 00:03:36,959

extract more than one item from a string?

Let's say you want to extract Hi.

69

00:03:36,959 --> 00:03:47,090

Well there is a notation for that too, so

Hi is expanding from index 0 to index 1.

71

00:03:47,090 --> 00:03:57,380

So up to here. So let's try 0 to 1.

That will not work a very good, but let's see.

73

00:03:57,380 --> 00:04:05,130

So as I said this will return only H,

so it doesn't return item with index 1.

75

00:04:05,130 --> 00:04:11,250

That's because splitting in Python, so we

we're splitting a string here

77

00:04:11,250 --> 00:04:17,609

splitting in Python is upper bound

exclusive, which means that the upper

79

00:04:17,609 --> 00:04:24,730

bounds of the split here is not included

in the output and so if you want to

81

00:04:24,730 --> 00:04:33,070

include I you want to pass the index

after I which is two so this is the string

83

00:04:33,070 --> 00:04:40,420

just for reference, so we extracted item

with index 0 and item with index 1.

85

00:04:40,420 --> 00:04:46,770

And then the split starts right there at

index 2 so it doesn't include the space.

87

00:04:46,770 --> 00:04:54,430

Just hi. If you pass 3 there, that would

include the wide space as well and

89

00:04:54,430 --> 00:05:00,310

similarly you can pass like one.

That will include the I and the space, there

91

00:05:00,310 --> 00:05:06,970

are also shortcuts, so if you pass zero

like nothing to 3 but that is the same

93

00:05:06,970 --> 00:05:18,120

as doing 0 to 3 so it will include 0 1 &

2 and similarly if you want to extract

95

00:05:18,120 --> 00:05:24,760

there with the exclamation mark

well what you'd want to do is T would

97

00:05:24,760 --> 00:05:32,919

have an index of 0 1 2 3 so 3 and then

you can just pass a column there and

99

00:05:32,919 --> 00:05:38,830

execute it and that will give you There

with the exclamation mark. Let's now try

101

00:05:38,830 --> 00:05:46,240

to extract R and E using negative

indexing, so something to know is that

103

00:05:46,240 --> 00:05:50,200

even though we are using a negative

indexing which starts from right to left

105

00:05:50,200 --> 00:05:56,530

the splitting will work from left to

right which means the first index you

107

00:05:56,530 --> 00:06:05,140

want to pass is that of the first item.

So R we pass minus 3 for R because

109

00:06:05,140 --> 00:06:15,300

you know we have minus 1, minus 2 and minus 3.

So minus 3 for R and then we want E

111

00:06:15,300 --> 00:06:21,450

which is minus 2,

but since splitting is upper bounce

113

00:06:21,450 --> 00:06:30,430

exclusive we want to pass minus one there.

So let's try that, minus one, execute and yeah

115

00:06:30,430 --> 00:06:35,360

we get the correct output.

That's the idea. If you didn't pass

117

00:06:35,360 --> 00:06:40,640

anything here you'd get the exclamation

mark as well. So it will take a while

119

00:06:40,640 --> 00:06:45,050

until we get used to this, but with some

practice you'll be able to remember

121

00:06:45,050 --> 00:06:52,010

these notations. Yeah, that's what I wanted

to cover in this lecture, so this was

123

00:06:52,010 --> 00:06:58,100

about string indexing and splitting,

so you split using the index that Python

125

00:06:58,100 --> 00:07:02,660

uses on the background. Yeah, I hope

you enjoyed this and I'll talk to you

127

00:07:02,660 --> 00:07:05,470

later, thanks!

Loading comments...