In this post we will learn about Indexes in detail, if you have seen our earlier post whenever we print our dataframe our first column would be a iterative number starting from 0 till the end of row.
Even if these indexes are unique, it wouldn't be much useful. What we can actually do is replace these indexes with anyone of the column and it is not needed to be unique either. By doing that we can use that index via its label(i.e, value) to access its row. To
do that use set_index on dataframe.
IN[1]
df.set_index('email')
OUT[1]
I believe you think now the dataframe is altered and the answer is both yes and now, actually pandas does not alter the data unless an another argument is passed. So if you print the dataframe you can still see indexes will be default one.
IN[2]
df.set_index('email', inplace = True)
OUT[2]
By passing the argument named inplace to True we committed those changes into our dataframe and now the column email has finally become as our index. If you note that when you print dataframe index's header is either not available or a line\row below the other header row.
IN[3]
df.index
OUT[3]
We also have a way to print index alone so that we can find out what are the current index and work with the accordingly.
IN[4]
df.loc['prathapdom@gmail.com']
OUT[4]
In our earlier post we learnt loc where it is used to fetch row based on label value, since we changed it to email, we pass in our unique label to get the appropriate row.
IN[5]
df.loc[0]
OUT[5]
As we don't have default index anymore if you try to fetch row based on index using loc interpreter will throw an error.
IN[6]
df.sort_index(inplace = True)
OUT[6]
Since we are using labels as the index it might come as handy if we sort them, simply call sort_index.