HDFS伪分布式配置
时间: 2025-04-22 07:01:51 浏览: 19
### HDFS Pseudo-Distributed Configuration Tutorial
In a pseudo-distributed setup, each daemon runs on a single machine simulating a cluster environment. This configuration is useful for testing and development purposes.
#### Setting Up Environment Variables
Ensure that Java and SSH are installed properly as these services will be required by Hadoop components. Set up the `JAVA_HOME` variable within the `.bashrc` file:
```bash
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin
```
#### Configuring Core-Site.xml
Modify the core-site.xml located under `$HADOOP_HOME/etc/hadoop/core-site.xml`. Add necessary properties such as specifying the default filesystem URI which points back to localhost indicating this is indeed a local instance of HDFS running in pseudo-distributed mode[^1]:
```xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
```
#### Formatting NameNode
Before starting any daemons format the namenode using command line tool provided with Hadoop distribution package[^2]. Execute below mentioned shell script from terminal window after navigating into appropriate directory where binaries reside:
```bash
$HADOOP_HOME/bin/hdfs namenode -format
```
#### Adjusting hdfs-site.xml File
Edit the hdfs-site.xml found inside etc/hadoop folder path appending relevant parameters like replication factor along with directories used during operation time including temporary storage location `/tmp`, alongside permanent repository area designated specifically towards housing tables managed via Apache Hive service.
```xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/path/to/name/directory</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/path/to/data/directory</value>
</property>
</configuration>
```
#### Starting Daemons
Start all related processes associated with Hadoop framework through executing start-all.sh present at bin subdirectory relative to installation root position:
```bash
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
```
Verify successful initialization checking web interfaces available over HTTP protocol listening ports 50070 (for Namenode), 8088 (Resource Manager).
#### Creating Directories Required By Hive
Prepare specific folders needed when integrating Hadoop ecosystem component named Hive ensuring proper permissions granted so users can write data thereupon:
```bash
$hadoop fs -mkdir /tmp
$hadoop fs -mkdir /user/hive/warehouse
$hadoop fs -chmod g+w /tmp
$hadoop fs -chmod g+w /user/hive/warehouse
```
阅读全文
相关推荐


















