● If the Given data Looks like and Only One Row having One Extra Column how do you Process it
42,34,26
23,56,76
75,17,76,97
22,57,66
● Suppose we have a HDFS utility where you have to work with java. HDFS utility help us to create folder, delete folder . can you work on that?
● Let's say you have table,daily partitioned with date. And in that table we have five years worth of history. so we will have close to 1800 partitions. if you want to perform bucketing on this , how will you do that.
● Let's say you have 50 products that bank offer and it is 5 years worth of data and if your partitioning on daily basis, whatever bucketing you do will have 18x partitions. In this case aren't we adding the burden to namenode by creating separate partitions for each bucket?
● if your using vectorization, what is the target file format?
● project explanation
● what is the data pipeline? how do you build it?
● how do you decide the number of buckets for the given data. and if it is continuously increasing then how will you decide?
● Lets say your getting 500 gb of data. And you tuned according to that. Now after sometime, may be a week after or a month you got data which is 1000gb . Now earlier you used few parameters (executors,drivers,cores etc...), will that work now.
● how will you configure your cores, executors and memory.
● if any job, if it fails, then what will be your approach.
● have you faced any errors in hive? And how you overcome with that. What kind of issues have you faced.
● what types of joins have you used in your project
● how do you do select statement on partition table
● if you do "select * from table " on partition table, will you be getting the benefit of partitions.
● select * from table --> which will run map or reducer ? what is a key and value? select * from table where id=1 ---> which will run here ? map or reducer?
● if you don't give any split or mappers in sqoop what will happen?
● Can we do partition using employee id.
● How to configured any reducers in sqoop?
● Tell the basic transformations and actions you worked on .
● hive table creation syntax, and how to load the data into it