Apache Hadoop installation with Cloudera 5 Update

Cloudera Manager –> Support –> About

Version: Cloudera Express 5.16.2

Hadoop Cloudera Installation CM5

No upgrade needed

Updating Cloudera Manager & CDH (If Needed)

Cloudera Manager –> Support –> About

Version: Cloudera Express 5.13.0

Parcels –>

CDH 5 –> Download –> Distribute –> Activate


Cloudera manager –> Stop

# service cloudera-scm-server stop
# service cloudera-scm-agent stop

# vi /etc/yum.repos.d/cloudera-manager.repo

# yum clean all

# yum upgrade -y cloudera-manager-server cloudera-manager-agent cloudera-manager-maemons

# service cloudera-scm-server start
# service cloudera-scm-agent start

Cloudera manager --> Upgrade Cluster

Updating Java

# service cloudera-scm-agent stop
# service cloudera-scm-server stop

// JDK8 Installation
# wget http://archive.cloudera.com/director/redhat/7/x86_64/director/2.8.1/RPMS/x86_64/oracle-j2sdk1.8-1.8.0+update121-1.x86_64.rpm

# yum localinstal -y oracle-j2sdk1.8-1.8.0+update121-1.x86_64.rpm
# yum remove -y oracle-j2sdk1.7.x86_64 64

# vi /etc/default/cloudera-scm-server

export JAVA_HOME=/usr/java/latest

# ln -s /usr/java/jdk1.8.0_121-cloudera/ /usr/java/latest

# vi /etc/profile.d/java.sh
#### JDK8 #######################

export JAVA_HOME=/usr/java/latest
export PATH=${JAVA_HOME}/bin:$PATH

#### JDK8 #######################

# service cloudera-scm-server start
# service cloudera-scm-agent start

Cloudera manager -->  Hosts -->  All Hosts --> Configuration -->  Java Home Directory

Hadoop Cloudera Installation CM5

restart cloudera manager
restart cloudera services

Hadoop Cloudera Installation CM5




Administation –> Settings –> Custom Service Descriptors

Hadoop Cloudera Installation CM5

# cp SPARK2_ON_YARN-2.2.0.cloudera1.jar /opt/cloudera/csd
# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
# chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
# service cloudera-scm-server restart

Cloudera Management -> restart

Cloudera Management --> parcels --> Configuration --> Add


Hadoop Cloudera Installation CM5

Save Changes

SPARK2 --> Download --> Distribute --> Activate

Hadoop Cloudera Installation CM5

Cloudera manager -->  Add Service

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Please check the values of ‘yarn.scheduler.maximum-allocation-mb’ and/or ‘yarn.nodemanager.resource.memory-mb’.


Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

Hadoop Cloudera Installation CM5

# su hdfs

$ pyspark2

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0.cloudera2

$ spark2-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0.cloudera2

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.

IPython: Supercharge Your PySpark Shell

A great improvement in functionality is through the use of IPython, which gives you typeahead and the ability to run commands among other features. To install and enable IPython you can install the 5.x LTS (long term support), which can be done simpler by installing Anaconda which provides some of the most popular Python packages, including IPython.

To get Anaconda you can:

Navigate to the parcel page Click on Configuration Below the cdh5 parcels insert a new row by clicking the + and Save Changes Look for the Anaconda parcel entry and click on Download. It is reasonably large, so might take some time Once it has been downloaded, click on Distribute and then Activate

Anaconda	4.3.1

Now you need to restart stale services.

Next, there are a couple of files that you may need to modify, namely the following: vi /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/etc/spark/conf.dist/spark-env.sh

And add

export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/ipython
export PATH=/opt/cloudera/parcels/Anaconda/bin:$PATH
export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera

Modify .bashrc for your user su hdfs vi ~/.bashrc

export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/ipython export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python export PATH=/opt/cloudera/parcels/Anaconda/bin:$PATH export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera

Now open pyspark2 and confirm that you have IPython by testing autocomplete and executing a shell command directly from the REPL