Apache Hadoop installation with Cloudera 5 Update


Cloudera Manager –> Support –> About

Version: Cloudera Express 5.16.2


Hadoop Cloudera Installation CM5

No upgrade needed


Updating Cloudera Manager & CDH (If Needed)


Cloudera Manager –> Support –> About

Version: Cloudera Express 5.13.0

Parcels –>

CDH 5 –> Download –> Distribute –> Activate

http://archive.cloudera.com/cdh5/parcels/5.16.2/


Cloudera manager –> Stop

# service cloudera-scm-server stop
# service cloudera-scm-agent stop

# vi /etc/yum.repos.d/cloudera-manager.repo

# yum clean all

# yum upgrade -y cloudera-manager-server cloudera-manager-agent cloudera-manager-maemons

# service cloudera-scm-server start
# service cloudera-scm-agent start


Cloudera manager --> Upgrade Cluster


Updating Java

# service cloudera-scm-agent stop
# service cloudera-scm-server stop


// JDK8 Installation
# wget http://archive.cloudera.com/director/redhat/7/x86_64/director/2.8.1/RPMS/x86_64/oracle-j2sdk1.8-1.8.0+update121-1.x86_64.rpm

# yum localinstal -y oracle-j2sdk1.8-1.8.0+update121-1.x86_64.rpm
# yum remove -y oracle-j2sdk1.7.x86_64 64


# vi /etc/default/cloudera-scm-server


export JAVA_HOME=/usr/java/latest


# ln -s /usr/java/jdk1.8.0_121-cloudera/ /usr/java/latest


# vi /etc/profile.d/java.sh
#### JDK8 #######################

export JAVA_HOME=/usr/java/latest
export PATH=${JAVA_HOME}/bin:$PATH

#### JDK8 #######################


# service cloudera-scm-server start
# service cloudera-scm-agent start


Cloudera manager -->  Hosts -->  All Hosts --> Configuration -->  Java Home Directory


Hadoop Cloudera Installation CM5

restart cloudera manager
restart cloudera services


Hadoop Cloudera Installation CM5


SPARK2

https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html

https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_requirements.html

Administation –> Settings –> Custom Service Descriptors


Hadoop Cloudera Installation CM5

# cp SPARK2_ON_YARN-2.2.0.cloudera1.jar /opt/cloudera/csd
# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
# chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
# service cloudera-scm-server restart


Cloudera Management -> restart


Cloudera Management --> parcels --> Configuration --> Add

http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/


Hadoop Cloudera Installation CM5

Save Changes


SPARK2 --> Download --> Distribute --> Activate


Hadoop Cloudera Installation CM5


Cloudera manager -->  Add Service


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Please check the values of ‘yarn.scheduler.maximum-allocation-mb’ and/or ‘yarn.nodemanager.resource.memory-mb’.


yarn-conf/yarn-site.xml


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


Hadoop Cloudera Installation CM5


# su hdfs

$ pyspark2


Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0.cloudera2
      /_/


$ spark2-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0.cloudera2
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.


IPython: Supercharge Your PySpark Shell

A great improvement in functionality is through the use of IPython, which gives you typeahead and the ability to run commands among other features. To install and enable IPython you can install the 5.x LTS (long term support), which can be done simpler by installing Anaconda which provides some of the most popular Python packages, including IPython.

To get Anaconda you can:

Navigate to the parcel page Click on Configuration Below the cdh5 parcels insert a new row by clicking the + and Save Changes Look for the Anaconda parcel entry and click on Download. It is reasonably large, so might take some time Once it has been downloaded, click on Distribute and then Activate

Anaconda	4.3.1

Now you need to restart stale services.

Next, there are a couple of files that you may need to modify, namely the following: vi /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/etc/spark/conf.dist/spark-env.sh

And add

export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/ipython
export PATH=/opt/cloudera/parcels/Anaconda/bin:$PATH
export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera

Modify .bashrc for your user su hdfs vi ~/.bashrc

export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/ipython export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python export PATH=/opt/cloudera/parcels/Anaconda/bin:$PATH export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera

Now open pyspark2 and confirm that you have IPython by testing autocomplete and executing a shell command directly from the REPL