Top 100 Big Data Hadoop Interview Questions

Hadoop Interview questions has been contributed by Charanya Durairajan, She attended interview in Wipro, Zensar and TCS for Big Data Hadoop. The questions mentions below are very important for hadoop interviews.

1 In the hadoop-envsh file, what does the setting HADOOP_HEAPSIZE establish?

2 What is AVRO?

3 When archiving Hadoop files, which of the following statements are true? (Choose two)

4 What is the difference between killed task and failed tasks?

5 What are the two methods of the orgapachehadoopmapredInputFormat interface?

6 The file sampletxt has the following content:

7 Which of the following is false about RawComparator ?

8 Which of the following are true about fsck?

9 Based on the following reduce method signature, which two of the following statements are true?

10 You are given a file that is split into 5 blocks when writing to HDFSWhat configuration changes have to

be done inorder for one mapper to read all five blocks ?

11 Which one of the following statements is false regarding the Distributed Cache?

12 Which one of the following statements is false regarding the Combiner of a MapReduce job?

13 What is hadoop stack?

14 What is writable?

15 HDFS is designed for:

16 what Workflow expressed in Oozie can contain ?

17 In a MapReduce program, the reducer receives all values associated with the same keyWhich statement is most accurate about the ordering of these values?

18 Given the following code from a MapReduce application:

19 What are the common problems with map-side join?

20 In order to apply a combiner, what is one property that has to be satisfied by the values emitted from the

mapper

21 In Flume, which of the following provides scalability at the collector tier?

22 Out of Pig, Hive, and Jaql, which of the following attributes is specific

23 when can a reducer class also serve as a combiner without affecting the output of a map-reduce pgm?

24 Which of the following two responsibilities of the Job tracker have been split into separate daemons in the Map Reduce v2 ?

25 Where is the information about the hive meta data stored ?

26 Which of the following are compelling reasons to benchmark a Hadoop deployment?

27 What is the purpose of the shuffle in Hadoop MapReduce?

28 Why would a developer create a map-reduce without the reduce step?

29 When a job is run,your properties file are copied to distributed cache in order for your map jobs to accessHow do u access the property file

30 Which of the following are among the duties of the DataNodes in HDFS?

31 Which demon is responsible for replication of data in Hadoop?

32 Which file is required configuration file to run oozie job?

33 What are supported programming languages for Map Reduce?

34 What is HIVE?

35 What is Identity mapper?

36 Can a custom type for data Map-Reduce processing be implemented?

37 Which of the following is true for the output of the shuffle and sort phase?

38 Which of the following job doesn't support in oozie ?

39 How does Hadoop process large volumes of data?

40 How can you use binary data in MapReduce?

41 What happens if the client requests to access a part of the data file during the processing stage?

42 When exactly Reducer starts?

43 Which of the following are Flume points of extension?

44 Which one of the following statements is false regarding a MapReduce job?

45 What is HBASE?

46 What's the difference of having 0 reducer and 1 reducer

47 Which one of the following statements is false regarding the Partitioner of a MapReduce job?

48 Put the following phases of a MapReduce program in the order that they execute?

49 Which is faster: Map-side join or Reduce-side join? Why?

50 The input to a mapper takes the form <k1, v1>What form does the mapper's output take?

51 Which of the following is a distributed, scalable, big data store that can be used when you need random,

realtime read/write access to your Big Data

52 What is map - side join?

53 What is the difference between addInputPaths() and setInputPaths() of FileInputFormat ?

54 What is Flume?

55 When writing data to HDFS what is true if the replication factor is three? (Choose 2)

56 How can you disable the reduce step?

57 Which of the following components retrieves the input splits directly from HDFS to determine the number of map tasks?

58 Which two of the following statements are true regarding HCatalog?

59 What is true about Writable and WritableComparable

60 What is the default input format?

61 When using HDFS, what occurs when a file is deleted from the command line?

62 Will settings using Java API overwrite values in configuration files?

63 Out of Pig, Hive, and Jaql, which of the following attributes is specific

64 Given the following code from a MapReduce program:

65 Which one of the following is not a main component of HBase?

66 In Flume, which reliability level guarantees an accepted event reaches the endpoint?

67 The orgapachehadoopioWritable interface declares which two methods?

68 Based on the following map method signature, which two of the following statements are true?

69 Can you run Map - Reduce jobs directly on Avro data?

70 Can you suppress reducer output?

71 Which of the following two responsibilities of the Job tracker have been split into separate daemons in the Map Reduce v2 ?

72 The file sampletxt has the following content:

73 What is distributed cache?

74 What is the default InputFormat of a MapReduce job?

75 Which one of the following statements is true regarding <key,value> pairs of a MapReduce job?

76 The output of shuffle and sort is an Iterator of values which are iterated What does the iteratornext provide?

77 If a file split into large noof small chunks / blocks (ieblock size is very small), what's the problem?

78 There is 100 data node with 100TB capacityHow much data one can store (replication factor is 3)?

79 A file in HDFS is treated as small file if its size is

80 What should be carefully coordinated by an administrator when decommissioning multiple DataNodes in a cluster?

81 Which of the following are configured in core-sitexml? (Choose three)

82 What happens if mapper output does not match reducer input?

83 What is reduce - side join?

84 Which of the following is common to Pig, Hive, and Jaql?

85 Why Value in a Key-Value pair doesn't implement WritableComparable?

86 What is the most important feature of map-reduce?

87 On cluster hosting 10 TB of data, the following command is executed on a JobTracker nodeWhat is the anticipated activity on that cluster? (Choose 1)

88 What is the data type of the return value of the getPartition method in the orgapachehadoopmapredPartitioner interface?

89 Keys from the output of shuffle and sort implement which of the following interface ?

90 What happens when the iosortspillpercent threshold is exceeded when a Mapper is outputting <key,value> pairs?

91 Which one of the following is not a built-in Pig data type?

92 You have file of 300 mb being written to HDFSWhat happens if after 200mb is written, another user concurrently accesses the file

Contributor

Charanya Durairajan

Datawarehouse Architect

Search This Blog

Top 100 Big Data Hadoop Interview Questions

Labels

Comments

Post a Comment

Popular posts from this blog

Top 100 Informatica Interview Questions

OBIEE 11g dumps

Top 130 SQL Interview Questions And Answers