Hello, and welcome to Lesson 5 of my tutorial series, “Data Science with Keshav“. To get an overview of what this tutorial series is about, you can check out my another post, Data Science 101. To get to part 1 of the tutorial, into the programming, you can follow this link.
In the last article, we talked about some fundamental concepts in python programming. If you noticed, I included few queries that were very important. However, in this article, we will talk about other data types (this might include lists only), other than the native types that are very important for you as a python programmer. I will continue the remaining portion in next article if anything is left regarding data types.
Let us start from one of the most used data types in python, “LIST”. As the name suggests “LIST” provides you an opportunity to put same or different types of data together. You can access items in a list via indexing and using some other advanced features as well. Let us get into details with a practical approach.
# Initialization of list In : x = list()#First method In : y =  #Second method
Well, there are two ways of initializing a list. But if we talk about efficient code writing, or say code optimizations, I suggest you go for the second method. You can check the execution time of statements in python using the following code.
In : import timeit In : exc_time1 = sum(timeit.Timer('x = list()').repeat(repeat=100,number=1000))/100 In : exc_time2 = sum(timeit.Timer('x = ').repeat(repeat=100,number=1000))/100 In : (exc_time2/exc_time1)*100 Out: 26.74822229484029
Here, I am not spending my time explaining what timeit does, I am gonna cover these things later on. But as you can see, the second method is around 27% faster. If you have any queries regarding this let’s meet in the comment section at the end of the post. So, initialization is done. Let’s hop into more details. We will start from understanding range() function at first, which will be in your frequent use.
# Understanding range function at first In : ?range() Init signature: range(self, /, *args, **kwargs) Docstring: range(stop) -> range object range(start, stop[, step]) -> range object Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement). Type: type
Let’s start using it.
In : range() # This will throw an error --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-5bcbe005bf48> in <module>() ----> 1 range() TypeError: range expected 1 arguments, got 0 In : range(0) Out: range(0, 0) In : list(range(0)) #creates nothing Out:  In : list(range(5)) Out: [0, 1, 2, 3, 4] In : list(range(5,10)) Out: [5, 6, 7, 8, 9] In : list(range(5,10,2)) Out: [5, 7, 9] In : list(range(5,10,0.1)) # Will throw error, all three argument should be integer --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-fd2fd60f5526> in <module>() ----> 1 list(range(5,10,0.1)) TypeError: 'float' object cannot be interpreted as an integer
I believe, now, you can make your list using range() functions. You saw in In we got an error, this is obvious. Now, I am going to give you a challenge.
Can you create a list such that it starts with 0.9 ends at 11.19 with steps of 0.73? If you can please give me an answer in comment sections.
Now, the question is, are there any other methods of creating a random list of numbers? Yes! There are tons of methods you can use. For eg: Libraries like numpy etc. can be used to make lists as well. For now, I am gonna show you one important concept in list generations called a list comprehension.
Here, we are going to create a list o the even numbers between 10 and 50.
# First lets see what are even numbers In : 1%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2 Out: 1 In : 2%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2 Out: 0 In : 22%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2 Out: 0 In : 97877%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2 Out: 1 # No you must know how to check an even number In : 'even' if 1%2==0 else 'odd' Out: 'odd' # Similarly In : 'even' if 98987%2==0 else 'odd' Out: 'odd' # Similarly In : 'even' if 9898%2==0 else 'odd' Out: 'even'
Here I have introduced ternary operator. Well, we may not use this operator for now. But I want to make you understand if and else are used to check if some condition is true or false.
In : x = [i for i in range(9,21) if i%2==0] #this is list comprehension In : x Out: [10, 12, 14, 16, 18, 20]
I am going to introduce another function choice() from library ‘random’ which randomly selects an element from list or range object.
In : from random import choice In : choice([1,2,3,4]) Out: 3 In : choice([1,2,3,4]) Out: 4 In : choice([1,2,3,4]) Out: 3 In : choice([1,2,3,4]) Out: 1 In : choice(range(10)) Out: 5 In : choice(range(10)) Out: 2 # I guess now you know the use of choice # Lets use this in our list comprehension technique to create a random number lists of five elements between 23 to 32. In : x = [choice(range(23,32)) for i in range(5)] In : x Out: [30, 31, 30, 29, 28]
I guess now you can combine these skills to create your own lists. So far we are seeing list of numbers only, but we can create a list of different data types as well.
In : a = list('python') In : a #lists of only characters Out: ['p', 'y', 't', 'h', 'o', 'n'] In : a = [1, 'hello','c',[1,2,3],range(9,10)] #lists of various types of data In : a Out: [1, 'hello', 'c', [1, 2, 3], range(9, 10)]
I think now we should move into basic operations in lists. I suggest you try yourselves using techniques I described previously. In ipython console, you can create a list and just type the list put dot and press tab to see all the available options
I think I must not get into more details. I must leave you for explorations. However, if you ever run into any problem, I’ll always be right here to help you out. But for now, I am going to raise one challenge and try to solve it.
Suppose I have following list
In : x Out: ['21.pdf', '24.pdf', '2.pdf', '20.pdf', '18.pdf', '8.pdf', '10.pdf', '9.pdf', '5.pdf', '6.pdf', '19.pdf', '13.pdf', '23.pdf', '16.pdf', '4.pdf', '14.pdf', '3.pdf', '22.pdf', '17.pdf', '11.pdf', '28.pdf', '1.pdf', '27.pdf', '15.pdf', '26.pdf', '25.pdf', '7.pdf', '12.pdf']
x is a list that contains the name of pdf files in one of a directory. I need this list to be in order, like, [‘1.pdf’, ‘2.pdf’, ……… ]. Only, then I can stack all these pdfs into one single pdf in order.
So how can we do this? I am trying to sort the list. I am going to teach you some amazing techniques that you can use this in your work
# Let us see what following command does to a string In : "23.pdf".split('.') Out: ['23', 'pdf'] # or you can assign "23.pdf" to a value and do the same In : a = "23.pdf" In : a.split('.') Out: ['23', 'pdf'] # Now just notice what following does In : "23.pdf".split('.') Out: '23' # We separate number out of our string "23.pdf" or any string with similar pattern # But still output is not a number it is string , as you can see it is quoted In : int("23.pdf".split('.')) Out: 23 # With above ways we can get number from our string as a number not as a string
I suggest you understand what we did above. Now we can make an output of such string as keys to sort our list as per our need.
In : x.sort(key=lambda x:int(x.split('.'))) # just focus in ":int(x.split('.'))" In : x Out: ['1.pdf', '2.pdf', '3.pdf', '4.pdf', '5.pdf', '6.pdf', '7.pdf', '8.pdf', '9.pdf', '10.pdf', '11.pdf', '12.pdf', '13.pdf', '14.pdf', '15.pdf', '16.pdf', '17.pdf', '18.pdf', '19.pdf', '20.pdf', '21.pdf', '22.pdf', '23.pdf', '24.pdf', '25.pdf', '26.pdf', '27.pdf', '28.pdf']
I know you might have lots of questions. I suggest you write them in comments.
For now. I must put a comma to this article series. See you on the next article.