摘要详情

ID / 提交时间

33 / 2017-05-09 21:27:52

标题

A Kind of Access Method Research for Massive Small Files in Hadoop

关键字

主题及专题

全体主题

状态

全文待审

作者

通郑 / East China University of Science and Technology

卫斌郭 / East China University of Science and Technology

贵生范 / East China University of Science and Technology

摘要

As a new open source project, Hadoop provides a new way to store data. Because it has the high scalability, low cost, good flexibility, high speed and strong fault tolerance performance, it is widely used in the internet companies. However, the performance of Hadoop will be reduced severely when it is used to handle massive small file. Thus, this paper proposes a new solution to merge small files, which occupy a lot of memory in NameNode, into large files and establish the mapping relationship between small files and large files, and store the mapping information in HBase. At the same time, the method also designs a prefetching mechanism to improve the reading performance by analysing the access logs, and put the metadata of frequently accessed merge files into the memory of client. The result of experiments show that this scheme has a good performance in reducing the large memory occupied by massive small files in NameNode and improving the read-write speed of small files, therefore improving the overall performance of HDFS in dealing with massive files.