Background Image Example

● tell me about urself & your project

● explain hadoop architecture

● diff between hadoop 1 and 2

● when a file is stored in hdfs, can we modify that file ?

● can multiple clients write the same file at the same time ?

● if a file is being written and another client wants to read the same file, is it possible?

● if i have 10 nodes and a job is running on these nodes. Now till 8 nodes the job has finished and then name node goes down, then what will happen.

● what is speculative execution

● what is Spark

● what id RDD

● what are the properties of RDD

● explain about kafka-spark poc

● explain nifi tool

● what is hive

● what is the use of external tables in hive

● If i have millions of records in a csv file and i want to load it in hive table, how to do that ?

● what is Oozie ?

● Can kafka start without zookeeper ?

● Higher order function? and its advantages?

● Why you used Scala for Spark ?

● How do you do packaging (managing packages)

● How do you add dependencies for your project/application?

● Why do you need dependencies?

● pom.xml <- will things work if we rename pom.xml

● what are minimum mandatory imports that are required for scala application

● What is Kafka and why its used but not other Message queue?

● Can we use analytic functions on RDD? (i don't exactly know what he meant)

● What is vectorization and how does it improves performance?

● tell me about urself & your project

● What is short term and long term goal and where u want to see urself in like years

● In Spark, what is a lower level and higher level abstractions

● What is a dataframe and a dataset and what is the diff b/w them

● Lets say u have 5 tb of data & 16 gb of RAM, and u want to provide top 20 customers then what will be your approach for this ?

● one scenario based question on hive joins (it was kind of a very tricky question, it actually didn't require join but required case condition)

● When we submit a job in the spark cluster, what happens at the back end ?

● Can we import data directly to hive table through Sqoop

● how to give password and other details through file in Sqoop import

● what is the architecture of Flume

● Which kind of channel in flume is most reliable

● what is SCD and how many types of SCDs are there

● what is OLAP and OLTP database schemas

● what is a surrogate key in database

● what are the components u have used so far

● which version of spark and hive

● what is the distribution u r using

● difference between spark 1.6 and spark 2.0

● If used s3, then what is the default no. of buckets in s3

● How to read a nested JSON data in spark

● How to convert rdd to data frame

● what is struct type

● is it possible to define array field in struct

● what is the datatype of array field in struct

● is it possible to convert a df into rdd

● what is output of df.rdd

● which ide to develop spark code

● how u deploy the code to production

● How to use the same spark jar for different variables inputs

● How you manage duplicate data in hive

● Difference between input split and block

● How to change replication factor in hdfs

● what is default block size and how to change it

● if job is running and we increase block size what will happen with job

● How to recover cluster if all name node stand by name node fail

● what is fs image

● How many resource manager we have in cluster'

● Difference internal and external table

● Difference partitioning and bucketing

● if you have one table in hive write a query to add new column

● How to use user defined function in hive

● is it possible to create bucket without partition

● Syntax of bucketing

● What type of data we store in hive

● is it possible to create table in hive with data like .log file

● which is bidefault file format in spark

● Difference between ORC and Parquet

● Different type of join use in spark

● Optimization technique use in spark

● Explain sort merge and broadcast join

● Difference between map and flat map

● Difference between map and map by partition

● Project flow

● difference between partitioning & Buckating

● how to put data on s3 ?

● what is narrow and wide transformation

● what is case statement in scala ?

● difference between dataframe and data set

● which file format have you used in your project ?

● what is singletone object in scala ?

● How multiple task get created in spark submit job ?

● How customer is reading the final report ?

● Explain Spark Architecture

● what is SerDe in Hive ?

● Total no. of years of relevant hands-on experience in Big Data Analytics?

● Total no. of years of relevant hands-on experience in Spark using Scala?

● How many projects deployed in production using Spark?

● Total no. of years of relevant Spark streaming experience using Scala?

● How many projects deployed in production using Spark Streaming?

● Rate yourself on Spark Dataframe API from 1 to 5, 5 being highest.

● How many projects deployed in production using Spark Dataframe API?

● Max. size of data processed on daily basis in GB/TB/PB?

● How many productionalized projects processed data in TB?

● Total no. of years of Hive partitioning, bucketing, vectorization?

● How many projects deployed in production using Hive?

● Rate urself on Hive from 1 to 5, 5 being highest.

● Have you used Appworks/Oozie/AirFlow/Control-M scheduler in production?

● How many pipelines were deployed using the scheduler in production?

● Total no. of years of experience on Azure platform/Cloud?

● How many projects were deployed in production on Azure Data Lake platform

● Any experience with AWS or GCP Big data technologies?

● How many projects were deployed in production on AWS or GCP platform?

● Have you worked on CI/CD or Git Pipeline?

● What hadoop distribution was used on premises?

● Any Hadoop or cloud certifications?

● Any exposure to manufacturing domain?

● How much experience with SQL?

● Any experience with performance optimization in SQL?

● Any experience with performance optimization techniques in Spark?

● What Big Data technologies are used in current project?

● What is storage services in AWS

● What is object in S3

● How we access data from S3 to EC2

● How to put data in S3

● Max Data files limit in S3.

● Can we delete bucket in s3.

● How we recover data from EBS while EC2 failed.

● What is difference between EBS and S3.

● Various Databases in AWS.

● Use of DynamoDB and RedShift and RDS

● can we create same user in different region machines.

● What is use of EC2.

● What is security group.

● Roles of Security Group.

● What is VPC pair.

● How can we transfer data with in two regions Machine.

● Type of EC2 instances.

● What is AIM.

● Can we share AMI

● Role of VPC.

● can we change the region of the EC2.

● what can we use for text to speech conversion.

● what we use for huge data transfer.

● can we run multiple websites on single machine.

● Bigdata with AWS.

COMPANY

INTERVIEW QUESTIONS

Cognizant :

Wipro :

Maverick :

Legato systems :

HCL Technologies:

FIS Global :

DATA PEACE :

GSPANN :

DST Worldwide Services :

Michelin Types :

Global Edge Software, Bangalore :

Capgemini :

T-Systems :

some other companies :

impetus :

Cognizant :

Others :

Accenture :