Q1 - How would you do feature scaling in Python?

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

original_data = pd.DataFrame(mydata[‘my_column’])

scaled_data = pd.DataFrame(scaler.fit_transform(original_data))

Q2 - You have been given the list of positive integers from 1 to n. All the numbers from 1 to n are present except x, and you have to find x. Write code for that.

def find_missing_num(input):

sum_of_elements = sum(input)

n = len(input) + 1

real_sum = (n * ( n + 1 ) ) / 2

return int(real_sum - sum_of_elements)

mylist = [1,5,6,3,4]

find_missing_num(mylist)

Q3 - What is a pickle module in Python?

For serializing and de-serializing any given object in Python, we make use of the pickle module. In order to save given object on drive, we make use of pickle. It converts an object structure into character stream

Q4 - How Do You Get Indices of N Maximum Values in a Numpy Array?

>>import numpy as np

>>arr=np.array([10, 30, 20, 40, 50])

>>print(arr.argsort( ) [ -N: ][: : -1])

Q5 - What is the difference between .pyc and .py file formats in Python?

.pyc files contain the compiled bytecode of Python source files. The Python interpreter loads .pyc files before .py files, so if they're present, it can save some time by not having to re-compile the Python source code.

Q6 - What are global variables and local variables in Python

A local variable is any variable declared within a function. This variable exists only in local space, not in global space. Global variables are variables declared outside of a function or in a global space. Any function in the program can access these variables.

Q7 - What are lambda functions?

Lambda functions are anonymous functions in Python. It's helpful when you need to define a function that's very short and consists of only one expression. So, instead of formally defining the small function with a specific name, body, and return statement, you can write everything in one short line of code using a lambda function.

Here's an example of how lambda functions are defined and used:

(lambda x, y,: (x+y))

(3,2)

Q8 - What is a negative index, and how is it used in Python?

A negative index is used in Python to index a list, string, or any other container class in reverse order (from the end). Thus, [-1] refers to the last element, [-2] refers to the second-to-last element, and so on.

Q9 - Do you know about vectorization in pandas?

Vectorization is basically the process of implementing operations on the dataframe without using loops. We instead use functions that are highly optimized. For example, if I want to calculate the sum of all the rows of a column in a dataframe, instead of looping over each row, I can use the aggregation functionality that pandas provides and calculate the sum.

Q10 - What is the use of PYTHONPATH

PYTHONPATH tells the python Interpreter where to locate module files imported into a program. The role is similar to PATH. PYTHONPATH includes both the source library directory and the source code directories.

Q10 - What’s the difference between / and // in Python?

Both / and // are division operators. However, / does float division, dividing the first operand by the second. / returns the value in decimal form. // does floor division, dividing the first operand by the second, but returns the value in natural number form.

An example: 9 / 2 returns 4.5
An example: 9 / 2 returns 4

Q11 - Compare pandas and spark.

Pandas is a good choice for working with
small to medium-sized datasets, as it is relatively faster and easy to use. Spark is a better choice for working with large datasets, as it is more scalable and can handle more data. If the environment is Hadoop-based, spark integrates smoothly with it.

Q12 - You are given test scores, write python code to return bucketed scores of <50, <75, <90, <100.

def test_scores_bucket(df):

bins = [0, 50, 75, 90, 100]

labels=['<50','<75','<90' , '<100']

df['test score'] = pd.cut(df['test score'], bins,labels=labels)

return df

Q13 - How can you obtain the principal components and the eigenvalues from Scikit-Learn PCA?

from sklearn.decomposition import PCA

import numpy as np

data = np.array([[2.5, 2.4], [0.5, 0.7], [1.1, 0.9]])

pca = PCA()

pca.fit(data)

eigenvectors

print(pca.components_)

eigenvalues

print(pca.explained_variance_)

You may also be interested in

TOP Python Interview Questions and Answers