<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[HDFS 安装和部署]]></title><description><![CDATA[<p dir="auto">一 HDFS 架构简介<br />
<img src="/assets/uploads/files/1616580767372-3b2455ed-f5f4-436d-9dcc-430eb87ee62c-image.png" alt="3b2455ed-f5f4-436d-9dcc-430eb87ee62c-image.png" class=" img-responsive img-markdown" /><br />
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.</p>
<p dir="auto">二，安装和配置<br />
2.1 准备账号和网络环境<br />
我们的网络环境和资源如下图<br />
<img src="/assets/uploads/files/1616641252619-592a3987-593c-4b4e-b358-9222b10ecf03-image.png" alt="592a3987-593c-4b4e-b358-9222b10ecf03-image.png" class=" img-responsive img-markdown" /><br />
集群组网：<br />
服务器A(计算节点): 192.168.1.2<br />
服务器B(存储节点): 192.168.1.3<br />
两个服务器都创建anneng 账户　作为hadoop的主帐号　并使用该帐号在两个机器之间建立免密登录<br />
<a href="http://an.forum.genostack.com/topic/3/ubuntu-18-04-ssh-key%E7%9A%84%E9%85%8D%E7%BD%AE%E4%BB%A5%E5%8F%8A%E5%85%8D%E5%AF%86%E7%99%BB%E5%BD%95?_=1616574188290">http://an.forum.genostack.com/topic/3/ubuntu-18-04-ssh-key的配置以及免密登录?_=1616574188290</a><br />
在master和slave节点 都配置好hosts域名解析<br />
192.168.1.2  master<br />
192.168.1.3  slave1</p>
<p dir="auto">2.2 安装软件<br />
<a href="https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions" rel="nofollow ugc">https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions</a><br />
Apache Hadoop 3.3 编译时支持Java8 运行时支持Java8和Java11<br />
我们选择OpenJdk8　来部署后面的应用<br />
sudo apt-get update<br />
sudo apt-get install openjdk-8-jdk<br />
记住javac的实际安装路径<br />
which javac<br />
readlink -f /usr/bin/javac<br />
或者dpkg - l |grep openjdk</p>
<p dir="auto">下载hadoop软件<br />
<a href="https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz" rel="nofollow ugc">https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz</a></p>
<p dir="auto">配置环境变量<br />
#Hadoop Related Options<br />
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1<br />
export HADOOP_INSTALL=$HADOOP_HOME<br />
export HADOOP_MAPRED_HOME=$HADOOP_HOME<br />
export HADOOP_COMMON_HOME=$HADOOP_HOME<br />
export HADOOP_HDFS_HOME=$HADOOP_HOME<br />
export YARN_HOME=$HADOOP_HOME<br />
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native<br />
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin<br />
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"</p>
<p dir="auto">2.3 配置hadoop<br />
配置env<br />
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh<br />
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64</p>
<p dir="auto">配置主节点：<br />
sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml<br />
&lt;configuration&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;hadoop.tmp.dir&lt;/name&gt;<br />
&lt;value&gt;/ceph_disk2/hadoop_temp&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;fs.default.name&lt;/name&gt;<br />
&lt;value&gt;hdfs://192.168.1.2:7000&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;/configuration&gt;</p>
<p dir="auto">//配置hdfs<br />
&lt;configuration&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;<br />
&lt;value&gt;/data_raid1/hdfs_namenode&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;<br />
&lt;value&gt;/ceph_disk3/hdfs_datanode&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;dfs.replication&lt;/name&gt;<br />
&lt;value&gt;1&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;/configuration&gt;</p>
<p dir="auto">配置从节点<br />
core-size.xml<br />
&lt;configuration&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;hadoop.tmp.dir&lt;/name&gt;<br />
&lt;value&gt;/ceph_disk1/hadoop_temp&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;fs.default.name&lt;/name&gt;<br />
&lt;value&gt;hdfs://192.168.1.2:7000&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;/configuration&gt;</p>
<p dir="auto">hdfs-site.xml<br />
&lt;configuration&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;<br />
&lt;value&gt;/ceph_disk1/hdfs_datanode&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;property&gt;<br />
&lt;name&gt;dfs.replication&lt;/name&gt;<br />
&lt;value&gt;1&lt;/value&gt;<br />
&lt;/property&gt;<br />
&lt;/configuration&gt;</p>
<p dir="auto">三 初始化 启动<br />
初始化：<br />
hdfs namenode -format<br />
./sbin/start-dfs.sh</p>
<p dir="auto">可以使用sudo jps查看各个服务器的进程<br />
主节点<br />
17329 DataNode<br />
47634 Jps<br />
17818 SecondaryNameNode</p>
<p dir="auto">从节点<br />
55904 DataNode</p>
]]></description><link>http://an.forum.genostack.com/topic/256/hdfs-安装和部署</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 12:32:06 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/256.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 23 Mar 2021 07:34:38 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 18 Aug 2022 02:11:57 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571545/" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571545/</a><br />
<img src="/assets/uploads/files/1660788715440-9271cf73-f88b-408f-a4ae-dfe20eda8ab5-image.png" alt="9271cf73-f88b-408f-a4ae-dfe20eda8ab5-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1804</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1804</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Aug 2022 02:11:57 GMT</pubDate></item><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 18 Aug 2022 01:57:42 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://support.huaweicloud.com/intl/en-us/prtg-kunpenghpcs/kunpenggatk_02_0011.html" rel="nofollow ugc">https://support.huaweicloud.com/intl/en-us/prtg-kunpenghpcs/kunpenggatk_02_0011.html</a></p>
]]></description><link>http://an.forum.genostack.com/post/1803</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1803</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Aug 2022 01:57:42 GMT</pubDate></item><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 18 Aug 2022 01:53:24 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://cordis.europa.eu/docs/projects/cnect/1/317871/080/deliverables/001-Ares20143999552D23.pdf" rel="nofollow ugc">https://cordis.europa.eu/docs/projects/cnect/1/317871/080/deliverables/001-Ares20143999552D23.pdf</a><br />
<a href="/assets/uploads/files/1660787601833-001-ares20143999552d23.pdf">001-Ares20143999552D23.pdf</a></p>
]]></description><link>http://an.forum.genostack.com/post/1802</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1802</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Aug 2022 01:53:24 GMT</pubDate></item><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 25 Mar 2021 04:17:38 GMT]]></title><description><![CDATA[<p dir="auto">执行hdfs ls报错  发现 端口9000被docker占用了 重新分配为7000端口<br />
ls: DestHost:destPort master:9000 , LocalHost:localPort anneng01/103.114.101.5:0. Failed on local exception: java.io.IOException: Connection reset by peer</p>
]]></description><link>http://an.forum.genostack.com/post/504</link><guid isPermaLink="true">http://an.forum.genostack.com/post/504</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 25 Mar 2021 04:17:38 GMT</pubDate></item><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 25 Mar 2021 03:54:49 GMT]]></title><description><![CDATA[<p dir="auto">HDFS读取的过程<br />
<img src="/assets/uploads/files/1616642522582-f51bf043-3a1a-4271-8b86-bfbd832b72b6-image.png" alt="f51bf043-3a1a-4271-8b86-bfbd832b72b6-image.png" class=" img-responsive img-markdown" /><br />
A client initiates read request by calling 'open()' method of FileSystem object; it is an object of type DistributedFileSystem.<br />
This object connects to namenode using RPC and gets metadata information such as the locations of the blocks of the file. Please note that these addresses are of first few blocks of a file.<br />
In response to this metadata request, addresses of the DataNodes having a copy of that block is returned back.<br />
Once addresses of DataNodes are received, an object of type FSDataInputStream is returned to the client. FSDataInputStream contains DFSInputStream which takes care of interactions with DataNode and NameNode. In step 4 shown in the above diagram, a client invokes 'read()' method which causes DFSInputStream to establish a connection with the first DataNode with the first block of a file.<br />
Data is read in the form of streams wherein client invokes 'read()' method repeatedly. This process of read() operation continues till it reaches the end of block.<br />
Once the end of a block is reached, DFSInputStream closes the connection and moves on to locate the next DataNode for the next block<br />
Once a client has done with the reading, it calls a close() method.</p>
<p dir="auto">写入的过程<br />
<img src="/assets/uploads/files/1616642577382-e81b1e74-a55c-4b62-9bc2-e064140ce59e-image.png" alt="e81b1e74-a55c-4b62-9bc2-e064140ce59e-image.png" class=" img-responsive img-markdown" /><br />
A client initiates write operation by calling 'create()' method of DistributedFileSystem object which creates a new file - Step no. 1 in the above diagram.<br />
DistributedFileSystem object connects to the NameNode using RPC call and initiates new file creation. However, this file creates operation does not associate any blocks with the file. It is the responsibility of NameNode to verify that the file (which is being created) does not exist already and a client has correct permissions to create a new file. If a file already exists or client does not have sufficient permission to create a new file, then IOException is thrown to the client. Otherwise, the operation succeeds and a new record for the file is created by the NameNode.<br />
Once a new record in NameNode is created, an object of type FSDataOutputStream is returned to the client. A client uses it to write data into the HDFS. Data write method is invoked (step 3 in the diagram).<br />
FSDataOutputStream contains DFSOutputStream object which looks after communication with DataNodes and NameNode. While the client continues writing data, DFSOutputStream continues creating packets with this data. These packets are enqueued into a queue which is called as DataQueue.<br />
There is one more component called DataStreamer which consumes this DataQueue. DataStreamer also asks NameNode for allocation of new blocks thereby picking desirable DataNodes to be used for replication.<br />
Now, the process of replication starts by creating a pipeline using DataNodes. In our case, we have chosen a replication level of 3 and hence there are 3 DataNodes in the pipeline.<br />
The DataStreamer pours packets into the first DataNode in the pipeline.<br />
Every DataNode in a pipeline stores packet received by it and forwards the same to the second DataNode in a pipeline.<br />
Another queue, 'Ack Queue' is maintained by DFSOutputStream to store packets which are waiting for acknowledgment from DataNodes.<br />
Once acknowledgment for a packet in the queue is received from all DataNodes in the pipeline, it is removed from the 'Ack Queue'. In the event of any DataNode failure, packets from this queue are used to reinitiate the operation.<br />
After a client is done with the writing data, it calls a close() method (Step 9 in the diagram) Call to close(), results into flushing remaining data packets to the pipeline followed by waiting for acknowledgment.<br />
Once a final acknowledgment is received, NameNode is contacted to tell it that the file write operation is complete.</p>
<p dir="auto">基本操作<br />
将本地文件保存到HDFS<br />
$HADOOP_HOME/bin/hdfs dfs -copyFromLocal temp.txt /<br />
查看系统内容<br />
$HADOOP_HOME/bin/hdfs dfs -ls /<br />
将HDFS文件保存到本地<br />
$HADOOP_HOME/bin/hdfs dfs -copyToLocal /temp.txt<br />
创建新的目录<br />
$HADOOP_HOME/bin/hdfs dfs -mkdir /mydirectory</p>
]]></description><link>http://an.forum.genostack.com/post/503</link><guid isPermaLink="true">http://an.forum.genostack.com/post/503</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 25 Mar 2021 03:54:49 GMT</pubDate></item><item><title><![CDATA[Reply to HDFS 安装和部署 on Thu, 25 Mar 2021 03:05:00 GMT]]></title><description><![CDATA[<p dir="auto">我们系统当前已经有了anneng账号 如果是新机器 可以用下面的方法创建账号：<br />
创建hadoop普通账户<br />
sudo adduser hadoop<br />
su - hadoop (-表示切换到hadoop的新会话　su hadoop会使用当前用户的上下文)</p>
<p dir="auto">安装openssh  服务器之间配置免密登录<br />
sudo apt install openssh-server openssh-client -y<br />
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa<br />
cat ~/.ssh/id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys<br />
chmod 0600 ~/.ssh/authorized_keys<br />
ssh localhost</p>
]]></description><link>http://an.forum.genostack.com/post/496</link><guid isPermaLink="true">http://an.forum.genostack.com/post/496</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 25 Mar 2021 03:05:00 GMT</pubDate></item></channel></rss>