第一本当然是大名鼎鼎的《Hadoop: The Definitive Guide》,基本上是Bible级别的,目前已经有第二版了。去年读了第一版,当时是以旧的API为例子的。关于新版本,参考amazon的介绍:

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

* Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce
    * Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
    * Discover common pitfalls and advanced features for writing real-world MapReduce programs
    * Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
    * Use Pig, a high-level query language for large-scale data processing
    * Analyze datasets with Hive, Hadoop’s data warehousing system
    * Take advantage of HBase, Hadoop’s database for structured and semi-structured data
    * Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

"Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk."

--Doug Cutting, Cloudera


About the Author

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

第二本是《Hadoop in Action》,这本书不厚,目前看了大概一半了,非常实用,如果你想快速的了解并开始实践的话,推荐这个。参考amazon的介绍:

Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.

The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.

About the Author

Chuck Lam is a Senior Engineer at RockYou!. Chuck received his B.S from San Jose State University and his Ph.D in Electrical Engineering from Stanford University, where his thesis topic was computational data acquisition.

然后就是《Pro Hadoop》,看名字就知道是进阶的版本哦,我的下一本书了。参考amazon的介绍:

You’ve heard the hype about Hadoop: it runs petabyte–scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open source (thus free). But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running?

From Apress, the name you’ve come to trust for hands–on technical knowledge, Pro Hadoop brings you up to speed on Hadoop. You learn the ins and outs of MapReduce; how to structure a cluster, design, and implement the Hadoop file system; and how to build your first cloud–computing tasks using Hadoop. Learn how to let Hadoop take care of distributing and parallelizing your software—you just focus on the code, Hadoop takes care of the rest.

Best of all, you’ll learn from a tech professional who’s been in the Hadoop scene since day one. Written from the perspective of a principal engineer with down–in–the–trenches knowledge of what to do wrong with Hadoop, you learn how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system or inheriting someone else’s.

Skip the novice stage and the expensive, hard–to–fix mistakes...go straight to seasoned pro on the hottest cloud–computing framework with Pro Hadoop. Your productivity will blow your managers away.
What you’ll learn

* Set up a stand–alone Hadoop cluster the smart way, laid out simply and step by step so you can get up and running quickly to build your next data center, collaborative, data–intensive Internet services application, Software as a Service (SaaS), and more.
    * Optimize your Hadoop production tasks like an experienced pro.
    * Work with time–proven, bulletproof standard patterns that have been tested and debugged in high–volume production.
    * Understand just enough theoretical knowledge to know why something works in Hadoop, without getting bogged down in abstruse walls of theory.
    * Get detailed explanations of not only how to do something with Hadoop, but also why, from a front–line coder with years in the Hadoop game.
    * Turn someone else’s expensive cluster–wide “wrong” into an orderly, productive "right" with professional–level debugging and testing.

Who this book is for

IT professionals interested in investigating Hadoop and implementing it in their organizations, and existing Hadoop users who want to deepen their professional toolkits.
Table of Contents

1. Getting Started with Hadoop Core
   2. The Basics of a MapReduce Job
   3. The Basics of Multimachine Clusters
   4. HDFS Details for Multimachine Clusters
   5. MapReduce Details for Multimachine Clusters
   6. Tuning Your MapReduce Jobs
   7. Unit Testing and Debugging
   8. Advanced and Alternate MapReduce Techniques
   9. Solving Problems with Hadoop
  10. Projects Based On Hadoop and Future Directions

About the Author

Jason Venner has 20+ years of software engineering, managing, designing, and coding. He has been a VP, director, and consultant. Currently, his interests and expertise are in Java, Hadoop, cloud computing, and more. For more, visit www.prohadoopbook.com.

最后是一本延伸读物《HBase: The Definitive Guide》,还没有上市,需要预定。

注意:Hadoop 0.20采用了全新的API,所以以前的代码很多都需要重新写过。所以很有必要了解这些变化,如果你直接从0.20开始,也是没有问题的。

下载:英文的在csdn下载里面都有

购买:china-pub和dangdang有《Hadoop权威指南(中文版)》,英文原版的太贵了。

Hadoop书籍介绍相关推荐

  1. 经典Hadoop书籍介绍

    1.Hadoop: The Definitive Guide(Hadoop权威指南) 这本书很全,Hadoop中的圣经级教材,不过看起来挺累. 内容简介 Discover how Apache Had ...

  2. Hadoop书籍和网络资源介绍

    本文介绍Hadoop Core(MapReduce和HDFS)相关的书籍和网络资源. [Hadoop书籍] 这些书均可以从http://ishare.iask.sina.com.cn/上下载电子版,有 ...

  3. Hadoop学习总结(1)——大数据以及Hadoop相关概念介绍

    一.大数据的基本概念 1.1.什么是大数据 大数据指的就是要处理的数据是TB级别以上的数据.大数据是以TB级别起步的.在计算机当中,存放到硬盘上面的文件都会占用一定的存储空间,例如: 文件占用的存储空 ...

  4. 大数据以及Hadoop相关概念介绍

    一.大数据的基本概念 1.1.什么是大数据 大数据指的就是要处理的数据是TB级别以上的数据.大数据是以TB级别起步的.在计算机当中,存放到硬盘上面的文件都会占用一定的存储空间,例如: 文件占用的存储空 ...

  5. MongoDB之Hadoop驱动介绍

    http://blog.csdn.net/amuseme_lu/article/details/6584661 MongoDB之Hadoop驱动介绍 ------------------------ ...

  6. Hadoop端口介绍及各种启动命令列表

    Hadoop端口介绍 9000 namenode的常用端口 给机子 8020 namenode的RPC调用端口(接收Client连接的RPC端口,用于获取文件系统metadata信息) 50070 n ...

  7. hadoop简单介绍_Hadoop:简单介绍

    hadoop简单介绍 什么是Hadoop: Hadoop是用Java编写的框架,用于在大型商品硬件群集上运行应用程序,并具有类似于Google File System和MapReduce的功能 . H ...

  8. 人工智能相关书籍介绍

    给人工智能初学者看的5本入门书 | 附下载链接 给人工智能初学者看的5本入门书 | 附下载链接_量子位的博客-CSDN博客 史上最完整的人工智能书单大全,学习AI的请收藏好 史上最完整的人工智能书单大 ...

  9. 计算机经典书籍介绍及下载站点

    计算机经典书籍介绍及下载站点 魏献华 学习计算机的建议 请看几篇关于此的文章吧. < 计算机学科一个新知识框架 > < 一篇关于计算机专业考研的很有深度的文章 > < 成 ...

最新文章

  1. 使用CleanIISLog清除IIS记录
  2. python使用函数的优点-Python函数的特点
  3. 在服务器端生成Excel文件然后从服务器下载的本地的代码
  4. 不同职业的面试着装技巧。
  5. ural 1066 uva 1555
  6. 九章基础算法02:栈和队列
  7. C++之epoll监听输入(替代select)
  8. unity 2d文字跟随主角移动_时间回溯——用Unity实现时空幻境(Braid)中的控制时间效果...
  9. python代码混淆加密
  10. 软件工程系统建模总结
  11. 股票分析软件编程开发日记与总结,自动交易软件开发
  12. VMware Fusion安装VMware Tools
  13. 系统u盘测试软件,u盘检测工具操作教程
  14. 怎么避开平台多ip检测
  15. Linux基础内容介绍
  16. 写了100条测试用例,被正经执行的只有50条?
  17. Learning-Based Approximation of Interconnect Delay and Slew in Signoff Timing Tools
  18. 0基础2(在1基础之上)
  19. 人力资源职位英文缩写汇总(人力资源岗位术语缩写)
  20. JZOJ-senior-5952. 【NOIP2018模拟11.5A组】凯旋而归

热门文章

  1. 日本秋色之美:赏红叶
  2. 3.4 项目经理的胜任力
  3. bcg库使用心得两则
  4. 2.《一个物联网系统的实现》之 EMQX 配置
  5. Sublime Text3 JSON格式化插件JsFromat
  6. kotlin小悟-这个继承有点不一样
  7. BOGNER博格纳宣布演员王紫璇、超模雎晓雯担任品牌大使
  8. 【AGM】《风色幻想:纷争—luca篇》角色调整版
  9. Java程序GUI与JDBC的应用
  10. 365天挑战LeetCode1000题——Day 038 公交站间的距离 + 基于时间的键值存储 + 转变数组后最接近目标值的数组和 + 有界数组中指定下标处的最大值