33 / 2017-05-09 21:27:52
A Kind of Access Method Research for Massive Small Files in Hadoop
全文待审
通 郑 / East China University of Science and Technology
卫斌 郭 / East China University of Science and Technology
贵生 范 / East China University of Science and Technology
As a new open source project, Hadoop provides a new way to store data. Because it has the high scalability, low cost, good flexibility, high speed and strong fault tolerance performance, it is widely used in the internet companies. However, the performance of Hadoop will be reduced severely when it is used to handle massive small file. Thus, this paper proposes a new solution to merge small files, which occupy a lot of memory in NameNode, into large files and establish the mapping relationship between small files and large files, and store the mapping information in HBase. At the same time, the method also designs a prefetching mechanism to improve the reading performance by analysing the access logs, and put the metadata of frequently accessed merge files into the memory of client. The result of experiments show that this scheme has a good performance in reducing the large memory occupied by massive small files in NameNode and improving the read-write speed of small files, therefore improving the overall performance of HDFS in dealing with massive files.
重要日期
  • 会议日期

    07月22日

    2017

    07月23日

    2017

  • 05月15日 2017

    终稿截稿日期

  • 07月23日 2017

    注册截止日期

联系方式
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询