Skip to main content

Top 100 Big Data Hadoop Interview Questions

Hadoop Interview questions has been contributed by Charanya Durairajan, She attended interview in Wipro, Zensar and TCS for Big Data Hadoop. The questions mentions below are very important for hadoop interviews.




1 In the hadoop-envsh file, what does the setting HADOOP_HEAPSIZE establish? 



2 What is AVRO? 



3 When archiving Hadoop files, which of the following statements are true? (Choose two) 



4 What is the difference between killed task and failed tasks? 



5 What are the two methods of the orgapachehadoopmapredInputFormat interface? 



6 The file sampletxt has the following content: 



7 Which of the following is false about RawComparator ? 



8 Which of the following are true about fsck? 



9 Based on the following reduce method signature, which two of the following statements are true? 



10 You are given a file that is split into 5 blocks when writing to HDFSWhat configuration changes have to 

be done inorder for one mapper to read all five blocks ? 



11 Which one of the following statements is false regarding the Distributed Cache? 



12 Which one of the following statements is false regarding the Combiner of a MapReduce job? 



13 What is hadoop stack? 



14 What is writable? 



15 HDFS is designed for: 



16 what Workflow expressed in Oozie can contain ? 



17 In a MapReduce program, the reducer receives all values associated with the same keyWhich statement is most accurate about the ordering of these values? 



18 Given the following code from a MapReduce application: 



19 What are the common problems with map-side join? 



20 In order to apply a combiner, what is one property that has to be satisfied by the values emitted from the 

mapper 



21 In Flume, which of the following provides scalability at the collector tier? 



22 Out of Pig, Hive, and Jaql, which of the following attributes is specific 



23 when can a reducer class also serve as a combiner without affecting the output of a map-reduce pgm? 



24 Which of the following two responsibilities of the Job tracker have been split into separate daemons in the Map Reduce v2 ? 



25 Where is the information about the hive meta data stored ? 



26 Which of the following are compelling reasons to benchmark a Hadoop deployment? 



27 What is the purpose of the shuffle in Hadoop MapReduce? 



28 Why would a developer create a map-reduce without the reduce step? 



29 When a job is run,your properties file are copied to distributed cache in order for your map jobs to accessHow do u access the property file 



30 Which of the following are among the duties of the DataNodes in HDFS? 



31 Which demon is responsible for replication of data in Hadoop? 



32 Which file is required configuration file to run oozie job? 



33 What are supported programming languages for Map Reduce? 



34 What is HIVE? 



35 What is Identity mapper? 



36 Can a custom type for data Map-Reduce processing be implemented? 



37 Which of the following is true for the output of the shuffle and sort phase? 



38 Which of the following job doesn't support in oozie ? 



39 How does Hadoop process large volumes of data? 



40 How can you use binary data in MapReduce? 



41 What happens if the client requests to access a part of the data file during the processing stage? 



42 When exactly Reducer starts? 



43 Which of the following are Flume points of extension? 



44 Which one of the following statements is false regarding a MapReduce job? 



45 What is HBASE? 



46 What's the difference of having 0 reducer and 1 reducer 



47 Which one of the following statements is false regarding the Partitioner of a MapReduce job? 



48 Put the following phases of a MapReduce program in the order that they execute? 



49 Which is faster: Map-side join or Reduce-side join? Why? 



50 The input to a mapper takes the form <k1, v1>What form does the mapper's output take? 



51 Which of the following is a distributed, scalable, big data store that can be used when you need random, 

realtime read/write access to your Big Data 



52 What is map - side join? 



53 What is the difference between addInputPaths() and setInputPaths() of FileInputFormat ? 



54 What is Flume? 



55 When writing data to HDFS what is true if the replication factor is three? (Choose 2) 



56 How can you disable the reduce step? 



57 Which of the following components retrieves the input splits directly from HDFS to determine the number of map tasks? 



58 Which two of the following statements are true regarding HCatalog? 



59 What is true about Writable and WritableComparable 



60 What is the default input format? 



61 When using HDFS, what occurs when a file is deleted from the command line? 



62 Will settings using Java API overwrite values in configuration files? 



63 Out of Pig, Hive, and Jaql, which of the following attributes is specific 



64 Given the following code from a MapReduce program: 



65 Which one of the following is not a main component of HBase? 



66 In Flume, which reliability level guarantees an accepted event reaches the endpoint? 



67 The orgapachehadoopioWritable interface declares which two methods? 



68 Based on the following map method signature, which two of the following statements are true? 



69 Can you run Map - Reduce jobs directly on Avro data? 



70 Can you suppress reducer output? 



71 Which of the following two responsibilities of the Job tracker have been split into separate daemons in the Map Reduce v2 ? 



72 The file sampletxt has the following content: 



73 What is distributed cache? 



74 What is the default InputFormat of a MapReduce job? 



75 Which one of the following statements is true regarding <key,value> pairs of a MapReduce job? 



76 The output of shuffle and sort is an Iterator of values which are iterated What does the iteratornext provide? 



77 If a file split into large noof small chunks / blocks (ieblock size is very small), what's the problem? 



78 There is 100 data node with 100TB capacityHow much data one can store (replication factor is 3)? 



79 A file in HDFS is treated as small file if its size is 



80 What should be carefully coordinated by an administrator when decommissioning multiple DataNodes in a cluster? 



81 Which of the following are configured in core-sitexml? (Choose three) 



82 What happens if mapper output does not match reducer input? 



83 What is reduce - side join? 



84 Which of the following is common to Pig, Hive, and Jaql? 



85 Why Value in a Key-Value pair doesn't implement WritableComparable? 



86 What is the most important feature of map-reduce? 



87 On cluster hosting 10 TB of data, the following command is executed on a JobTracker nodeWhat is the anticipated activity on that cluster? (Choose 1) 



88 What is the data type of the return value of the getPartition method in the orgapachehadoopmapredPartitioner interface? 



89 Keys from the output of shuffle and sort implement which of the following interface ? 



90 What happens when the iosortspillpercent threshold is exceeded when a Mapper is outputting <key,value> pairs? 



91 Which one of the following is not a built-in Pig data type? 



92 You have file of 300 mb being written to HDFSWhat happens if after 200mb is written, another user concurrently accesses the file 







Contributor

Charanya Durairajan

Comments

Post a Comment

Popular posts from this blog

Contact Me

Do You have any queries ?                   If you are having any query or wishing to get any type of help related Datawarehouse, OBIEE, OBIA, OAC then please e-email on below. I will reply to your email within 24 hrs. If I didn’t reply to you within 24 Hrs., Please be patience, I must be busy in some work. kashif7222@gmail.com

Top 130 SQL Interview Questions And Answers

1. Display the dept information from department table.   Select   *   from   dept; 2. Display the details of all employees   Select * from emp; 3. Display the name and job for all employees    Select ename ,job from emp; 4. Display name and salary for all employees.   Select ename   , sal   from emp;   5. Display employee number and total salary   for each employee. Select empno, sal+comm from emp; 6. Display employee name and annual salary for all employees.   Select empno,empname,12*sal+nvl(comm,0) annualsal from emp; 7. Display the names of all employees who are working in department number 10   Select ename from emp where deptno=10; 8. Display the names of all employees working as   clerks and drawing a salary more than 3000   Select ename from emp where job=’clerk’and sal>3000; 9. Display employee number and names for employees who earn commission   Select empno,ename from emp where comm is not null and comm>0. 10

Informatica sample project

Informatica sample project - 1 CareFirst – Blue Cross Blue Shield, Maryland (April 2009 – Current) Senior ETL Developer/Lead Model Office DWH Implementation (April 2009 – Current) CareFirst Blue Cross Blue Shield is one of the leading health care insurance provided in Atlantic region of United States covering Maryland, Delaware and Washington DC. Model Office project was built to create data warehouse for multiple subject areas including Members, Claims, and Revenue etc. The project was to provide data into EDM and to third party vendor (Verisk) to develop cubes based on data provided into EDM. I was responsible for analyzing source systems data, designing and developing ETL mappings. I was also responsible for coordinating testing with analysts and users. Responsibilities: ·          Interacted with Data Modelers and Business Analysts to understand the requirements and the impact of the ETL on the business. ·          Understood the requirement and develope