Hadoop , Hive, Shell Script(BASH), Beeline and hiveserver2 (Data Pipeline Automation Part 2)

Tariqul Islam
14 min readNov 6, 2023

--

Using Shell Script(Bash) with Hive and Hadoop, we can build robust automated data pipeline solution for project. In this part,I will discuss about minimal knowledge base of learn shell script: how to create function and how to run the shell script and how to create environment variable for shell script and Hive. Also build the shell script, which will then run the HQL(Hive Query Language) files and commands to generate the sample report for application with CSV format.

For Next Article (Part 3), I will add the Azkaban Scheduler to create jobs and scheduler to run shell script to automated the daily task for data pipelining.

Prerequisite

1. Linux Dabien  or (Ubuntu)
2. Hive and Hadoop
3. Configured Bash

Create the User hadoop and Log into User in Debian

> sudo adduser hadoop
> sudo adduser haddop sudo
> sudo reboot

After restart, login as hadoop user to run all of the below operation

> su - hadoop
> pwd
/home/hadoop

Creating application Structure

> mkdir -p app
> cd app
> mkdir -p conf data ddl dml func src reports
# Application Structure
/home/hadoop/app
|-- conf
| |-- hive.conf.ini.hql
|-- data
| |-- *.csv
|-- ddl
| |-- tbl_<table_name>*.hql
| |-- create_database.hql
|-- dml
| |-- *.hql
|-- func
| |-- func_<name>.sh
|-- src
| |-- exe_<process_name>*.sh
|
|-- reports
| |--gen_<report_name>_<datetime>.csv

All the data related to application you will find from

If you want to know about installing hive hadoop and configured it please read my following previous published article

Basic Knowledge about Shell Script and Create functions to use it for Data pipeline application

Configure the Bash in Debian, because by default debien configured with Dash

Create the sample function which will generate the log during running from terminal

#!/bin/bash

# -------
# arugments
# $1 = type of log
# $2 = message
# $3 = filename of script
# -------

function __log {
local _level="$1";
local _script="$2";
local _message="$3";
# Add the datetime into log rows
local _date_time=$(date +"%F %T");
local _print_msg="[${_level}] [${_date_time}]${_script} ${_message}";
echo "${_print_msg}";
}

# caller returns the context of any active subroutine call
# Log function to return the error message with argument
function error {
local _caller=($(caller));
__log "ERROR" "[${_caller[1]}](${_caller[0]})" "$@";
}

# Warn function to return the warn message with arguments
function warn {
local _caller=($(caller));
__log "WARN" "[${_caller[1]}](${_caller[0]})" "$@";
}

# log function to return any log and success message with aruguments
function inform {
local _caller=($(caller));
__log "INFO" "[${_caller[1]}](${_caller[0]})" "$@"
}
function <function_name> {
} -> function or method declaration block for bash script

local -> is block variable declaration, work with in block of function

caller -> caller. caller [ expr ] Returns the context of
any active subroutine call (a shell function or a script executed
within it

$1 -> Command Arg or Parameter handling for terminal or shell or bash

$@ -> internal bash variables that represents all parameters
passed into a function or a scrip

Sample script to test log function

#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "function file is missing"; exit 1; }

function main {
inform "[Test ] Logging function is working perfectly";
}

main "$@"
> sh src/exe_insert_department_info.sh 
[INFO] [2023-10-25 11:32:47][src/exe_insert_department_info.sh](8) [Test ] Logging function is working perfectly
source -> using for sourcing the other shell files, it makes the other shell
files function and variable within caller script and functions

Create date format function to use during check date argument from terminal

#!/bin/bash

function is_date_yyyy_mm_dd {
local _check_date_format=$1

# if condition to check the date format
if [[ "$(date +'%Y-%m-%d' -d ${_check_date_format} 2> /dev/null)" == "${_check_date_format}" ]]; then
true
else
false
fi
}
#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
inform "[Test ] is_date_yyyy_mm_dd function";

# Check the Date function
local _dt=
if [[ $# -eq 0 ]]; then
_dt=$(date -d "+%Y-%m-%d");
elif [[ $# -eq 1 ]]; then
_dt=$1;
else
error "Invalid number of arugments missing date";
exit 1;
fi

is_date_yyyy_mm_dd ${_dt} || {
error "[Invalid] ${_dt} is in invalid format"
exit 1;
}

}
main "$@"
<function_name> "$@" --> pass the arugment into following functions

exit 1; --> exit the excution of functions and script

$# --> Stores the total number of arguments

is_date_yyyy_mm_dd -> is the function to check the date format which is
source is from func_date.sh file

$(date +'%Y-%m-%d' -d ${_check_date_format} 2> /dev/null) -> bash function
to check the date formated by following '%Y-%m-%d' format

Create hive properties or initialization config file using during run hive to get and save data by hive into hadoop cluster

/home/hadoop/app/conf/hive.config.ini.hql

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

SET hivevar:DATABASE_NAME=hive_shell_sample;
SET hivevar:TBL_EMPLOYEE=${DATABASE_NAME}.employees;
SET hivevar:TBL_DEPARTMENT=${DATABASE_NAME}.departments;
SET hivevar:TBL_SALES_INFO=${DATABASE_NAME}.sales_info;

Database Creation and Design DDL and Shell script to create the data structure into Hive

Create database HQL file into/home/hadoop/app/ddl/create_database.hql

CREATE DATABASE IF NOT EXISTS ${DATABASE_NAME};

Create Shell script to create database, The file is following /home/hadoop/app/src/exe_create_database.sh

#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
inform "[OP] Start ";

local _create_db_hql=${app_root_dir}/ddl/create_database.hql

SECONDS=0

hive -f ${_create_db_hql} || {
error "Creating Database failed";
exit 1;
}

inform "Database is Created Successfully";
inform "Execution Time: ${convert_readable_time ${SECONDS}}";
inform "[OP] End";

}

main "${@}"
inform -> Another function which is source from func_logs.sh file to print
the log during running shell script

error -> Another function which will create error logs during run shell script
in terminal

Run the following shell script and check the database is created into hive

> sh src/exe_create_database.sh
> hive -e "show databases"

# hive -e will take the native database query command to run the
# DML (Data Manipulation Language)
# and DDL (Data Defination Language)

Create department table on following bellow file /home/hadoop/app/ddl/tbl_department.hql

CREATE TABLE IF NOT EXISTS ${TBL_DEPARTMENT} (
department_id INT,
department_name STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');
# Notes and Discussion

CREATE TABLE [IF NOT EXISTS] <TABLE NAME> ()
[PROPERTIES ...]

You can create the table by following command
PARTITIONED BY (Column Name Data Type)--> Create the Partition in Hive
and Hadoop Cluste Following column name and data type

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
-> This property provide the how data will be saved
Info Hadoop Cluster, The table will support delemited data with
field by ',' and line or each row by new line

STORED AS [STORE TYPE] -> we can define how the data will be saved into
Hadoop Cluster in hive warehouse location

TBLPROPERTIES ("transactional" = 'false') -> we can sepcifiy the
ACID transaction, we make it false, in text formatted data update
and delete will not work, we have to overwrite it

Create employee table to /home/hadoop/app/tbl_employee.hql

CREATE TABLE IF NOT EXISTS ${TBL_EMPLOYEE} (
employee_id INT,
employee_name STRING,
salary_emp DOUBLE,
department_id INT
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');

Create sales Info table to /home/hadoop/app/ddl/tbl_sale_info.hql which will be used for saved the data for every sales info for employee for each day.


CREATE TABLE IF NOT EXISTS ${TBL_SALES_INFO} (
employee_id INT,
product_name STRING,
sales_count INT,
sales_price DOUBLE
)
PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');

Create Sales Report table into /home/hadoop/app/ddl/tbl_sale_report.hql , which will save the data for sales report after generating from sales info table and served as report table

CREATE TABLE IF NOT EXISTS ${TBL_SALES_REPORT} (
employee_id INT,
employee_name STRING,
department_id INT,
department_name STRING,
product_name STRING,
sales_count INT,
sales_price DOUBLE,
total_sales DOUBLE
)
PARTITIONED BY (dt STRING)
TBLPROPERTIES ("transactional" = 'false');

Shell Script for create the table into hive from bash or shell, i created the into following localtion /home/hadoop/app/src/exe_create_table.sh


#!/bin/bash
set -e
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
inform "[OP] Start ";
local _table_name=
if [[ "$#" -eq "0" ]]; then
error "Please provide the table name to create table info Database";
exit 1;
elif [[ "$#" -eq "1" ]];then
_table_name=$1
else
error "Error happend during provide the table name";
exit 1;
fi

local _hive_ini_file=${app_root_dir}/conf/hive.conf.ini.hql
local _create_table_hql=${app_root_dir}/ddl/tbl_${_table_name}.hql
# Check the Date function

SECONDS=0

hive -i ${_hive_ini_file} -f ${_create_table_hql} || {
error "Creating Table is failed.";
exit 1;
}

inform "Table is Created Successfully";
inform "Execution Time: $(convert_readable_time ${SECONDS})";
inform "[OP] End";
}
main "${@}"

Run command to create table for employee, department, sales_info, sales_report from shell or terminal


> chmod +x /home/hadoop/app/src/exe_create_table.sh

> sh src/exe_create_table.sh department
> sh src/exe_create_table.sh employee
> sh src/exe_create_table.sh sales_info
> sh src/exe_create_table.sh sales_report

> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
-e "USE ${DATABASE_NAME};SHOW TABLES"
chmod +x -> provide the execution permission to following shell for the user

hive -e <init file> -e <HQL Script> -> to run the hive query from HQL directly
from shell

hive -i <init file> -f <HQL file Name> ->
-i take the intilization configuration file for hive
-f take the script file for run the assigend HQL

Create DML and Shell Script file to insert data to hadoop cluster by Hive

Insert data from local datasource to hive table, create the dml file into /home/hadoop/app/dml/load_data_into_table.hql

SET hivevar:TBL_NAME=${DATABASE_NAME}.${TABLE_NAME};
LOAD DATA LOCAL INPATH '${DATA_SOURCE_FILE_PATH}'
OVERWRITE INTO TABLE ${TBL_NAME}
PARTITION(${PARTITION_NAME}='${PARTIITON_VALUE}');
SET hivevar:<VARIABLE_NAME>=<VALUE>
# Set the hivevar during working with hive from terminal

# LOAD DATA [LOCAL] INPATH <SOURCE/LOCATION>
# [OVERWRITE] INTO TABLE <TABLE_NAME>
# [PARTITION (COLUMN_NAME=VALUE)]

[LOCAL] -> will takes the system local path (absulate path) for data source.
If we do not specify, command will searching for file in HDFS location

[OVERWRITE] -> will overwrite the file if we using the parition and Table data
store format is TEXT

[PARTITION] -> will create the parition into specified table where the data will
be saved into hadoop cluster

Create shell file at /home/hadoop/app/src/exe_load_data_to_table.sh

#!/bin/bash
# It will print all the logs
set -e
app_root_dir=/home/hadoop/app

# source the log and date function from different file
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# Create the function to load data from local file system to hive
function main {
inform "[OP ] START";

# Select the HQL file which is create up for loading
# data from local system
local _hql_file_path=${app_root_dir}/dml/load_data_into_table.hql

# Check the shell command will contains 4 arugment atlast
if [[ $# -lt 4 ]]; then
error "Please Insert the currect Arguments"
exit 1;
fi


# local variable for data source file
local _DATA_SOURCE_FILE_PATH="${app_root_dir}/data/$1.csv";
# Which hive table you want to manage the data
local _TABLE_NAME=$2;
# Which Table partition column will contains the data
local _PARTITION_NAME=$3;
# Which hive parition and which location in hadoop cluster will
# Contains data
local _PARTIITON_VALUE=$4;

# logging the information
inform "DataSource: ${_DATA_SOURCE_FILE_PATH}.csv"
inform "Table Name: ${_TABLE_NAME}"
inform "Partition Name: ${_PARTITION_NAME}"
inform "Parition Value: ${_PARTIITON_VALUE}"

# Run hive command to load data from local system
# hive -i <HIVE_CONFIG>
# [--hivevar variable_name=variable_value]
# -f [Hive Query file name]
# || -> or conditional Structure for Bash
hive -i ${_hive_config_file} \
-hivevar DATA_SOURCE_FILE_PATH=${_DATA_SOURCE_FILE_PATH} \
-hivevar PARTITION_NAME=${_PARTITION_NAME} \
-hivevar PARTIITON_VALUE=${_PARTIITON_VALUE} \
-hivevar TABLE_NAME=${_TABLE_NAME} \
-f ${_hql_file_path} || {
# Error Logs into terminal (error function source it from func_logs.sh)
error "Putting file into HDFS is failed. Or HDFS service is not running"
# Exit the shell file is error occured
exit 1;
}

inform "Data Insert Into Database table....";
inform "[OP ] END"
}

main "${@}"
<bash function> || {} -> Sort hand or operator, we can use it as 
(if ...else) statement

To load employee data into employees table with partition by running following command and check

# run shell file with sh pass the 4 arugments
# arg 1 -> local file name without extension
# arg 2 -> table name in hive
# arg 3 -> partition name
# arg 4 -> partition value
> sh src/exe_load_data_to_table.sh employee employees dt 2023-10-10
> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
-e "SELECT * FROM ${TBL_EMPLOYEE}"

To Load Department Data into departments table run following command and check

# run shell file with sh pass the 4 arugments
# arg 1 -> local file name without extension
# arg 2 -> table name in hive
# arg 3 -> partition name
# arg 4 -> partition value

> sh src/exe_load_data_to_table.sh department departments dt 2023-10-10
> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
-e "SELECT * FROM ${TBL_DEPARTMENT}"

Load Sales Info Data into sales_info table run following command and check

> sh \
> src/exe_load_data_to_table.sh sales_info_2023-10-10 sales_info dt 2023-10-10

> sh \
> src/exe_load_data_to_table.sh sales_info_2023-10-11 sales_info dt 2023-10-11

> hive -e "SELECT * FROM ${TBL_SALES_INFO}"

Generate The sales Report (CSV file) by HQL and Shell Script

Create HQL file, which using for processing the sales report data and save the data Sales report by date partition /home/hadoop/app/dml/procress_sales_report.hql

INSERT OVERWRITE TABLE ${TBL_SALES_REPORT} 
PARTITION (dt='${dt}')
SELECT * FROM (
WITH emp AS (
SELECT
emp.employee_id,
emp.employee_name,
emp.department_id
FROM
${TBL_EMPLOYEE} as emp
WHERE
dt='2023-10-10'
),
dept AS (
SELECT
dept.department_id,
dept.department_name
FROM
${TBL_DEPARTMENT} as dept
WHERE
dt='2023-10-10'
)
SELECT
emp.employee_id,
emp.employee_name,
dept.department_id,
dept.department_name,
sales.product_name,
SUM(sales.sales_count) as sales_count,
sales.sales_price,
SUM(sales.sales_count * sales.sales_price) as total_sales
FROM
emp
JOIN
dept ON emp.department_id = dept.department_id
JOIN
${TBL_SALES_INFO} as sales ON sales.employee_id = emp.employee_id
WHERE
sales.dt='${dt}'
GROUP BY
emp.employee_id,
employee_name,
dept.department_id,
department_name,
product_name,
sales_count,
sales_price,
sales.dt
) AS SALES_INFO;
INSERT INTO [OVERWRITE] TABLE <TABLE_NAME>
[PARTITION (PARTITION_NAME=PARTITION_VALUE)
SELECT [col1,col2...] FROM <TABLE_NAME>
-> Structure for transfer data from one table to another with parition

${HIVE_VARIABLE} -> Add the value from hive variable

WITH <NAME> AS () -> Using for handling the SubQuery and Remove complexility
Looks of Query

JOIN -> run the inner join between tables

SUM -> sum function for Hive

GROUP BY -> we have aggrigation function into our query, we need to group by
with other field except group by field

Using Beeline and HiveServer2 to process the report

To run Hiveserver2 open the another tab into terminal or run as service into linux or other operating system,

> hiveserver2

Then we can connect to beeline by following, we are using basic authentication for access to hive using username and password

> beeline -u jdbc:hive2://<hostname>/<database_name> \
-n <username>\
-p <password> \
-i <hive initializtion file> \
--hivevar <hive_variable_name>=<hive_variable_value> \
[-f <hive query file location>]
[-e <hive Query>]

Run Insert Query by below following script below, I create that script into following location /home/hadoop/app/src/exe_process_sales_report.sh

#!/bin/bash
set -e
# Application root directory
app_root_dir=/home/hadoop/app
# Source the log and date function from following below file
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
# hive configuration or initialization file location
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# function for run the process the report data
function main {
inform "[Test ] Logging function is working perfectly";

# get the datetime information from arugment for shell file
local _dt=
# check the shell command has an arugment
if [[ $# -eq 0 ]]; then
error "Please provie date information";
exit 1;
elif [[ $# -eq 1 ]]; then
# take the arg 1 as date time
_dt=$1
else
error "Error occured during provide arguments";
exit 1;
fi

# check the format of the date
is_date_yyyy_mm_dd ${_dt} || {
error "[Invalid] ${_dt} is in invalid format"
exit 1;
}


local _HOST_IP="<hiveserver2 ip>:<Running Port>/default"
local _HIVE_USER_NAME=<hive_user_name>
local _HIVE_PASSWORD=<hive_password>
# output format of the beeline
local _OUTPUT_FORMAT=csv2
# file location of HQL run the beeline
local _HQL_FILE_PATH=${app_root_dir}/dml/process_sales_report.hql

# beeline command to run hive query
beeline -u jdbc:hive2://${_HOST_IP} -n ${_HIVE_USER_NAME} \
-p ${_HIVE_PASSWORD} \
-i ${_hive_config_file} \
--hivevar dt=${_dt} \
-f ${_HQL_FILE_PATH} || {
error "Something Went Wrong During Generating Report Please see the logs";
exit 1;
}

# Check the Date function
inform "Report Generated Successfully.";
inform "[OP ] END"

}
main "${@}"

Then we can run the shell file from terminal

# export env variable to shell by (datetime into shell) following command
> export _dt=2023-10-10
# run the shell script for process report with ${_dt} env variable
> sh /home/hadoop/app/src/exe_process_sales_report.sh ${_dt}

After running the report processing script, check hive already create the partition in HDFS and it already has data

# export the database name and table in to enviroment variable 
> export _database=hive_shell_sample
> export _table=sales_report
# hdfs dfs -ls <Haddoop directory location> will shows the location of
# the directory and file where the hive save the data into hdfs
# /user/hive/warehouse is hive warehouse location
> hdfs dfs -ls \
/user/hive/warehouse/${_database}.db/${_table}/${_dt}/

Generate Report for Sales Report Every Day with CSV format

Create the HQL file into /home/hadoop/app/dml/generate_sales_report.hql location to generate salese report by date partition

SELECT 
employee_id as `employe ID`,
employee_name as `employee Name`,
department_id as `Department ID`,
department_name as `Department Name`,
product_name as `Product Name`,
sales_count as `Sales Count`,
sales_price as `Sales Price`,
total_sales as `Total Sales`
FROM
${TBL_SALES_REPORT}
WHERE dt='${dt}';

Create shell script /home/hadoop/app/src/exe_generate_sales_report.sh following location to generate the report on /home/hadoop/app/reports folder.

#!/bin/bash
set -e
# App root directory
app_root_dir=/home/hadoop/app

# sourcing the log and date function from following below files
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
# Hive initialization file
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# shell function for generating the sales report
function main {
inform "[OP ] Start Generating Report";
# Take the datetime arguments
local _dt=
if [[ $# -eq 0 ]]; then
error "Please provie date information";
exit 1;
elif [[ $# -eq 1 ]]; then
_dt=$1
else
error "Error occured during provide arguments";
exit 1;
fi

inform "[Check ] Start checking Datetime format";

# check the format of the date
is_date_yyyy_mm_dd ${_dt} || {
error "[Invalid] ${_dt} is in invalid format"
exit 1;
}

local _HOST_IP="<HiveServer2 IP>:<Running Port>/<Database Name>"
local _HIVE_USER_NAME=<Hive User name>
local _HIVE_PASSWORD=<Hive Password>
# Output format [CSV2, TSV2, CSV, TSV]
local _OUTPUT_FORMAT=csv2
# HQL file which contains Query for generate report
local _HQL_FILE_PATH=${app_root_dir}/dml/generate_sales_report.hql
# Export to -> which local system folder contains the Generated Report
local _EXPORT_TO=${app_root_dir}/reports
# Generated File name with date (CSV formatted file)
local _EXPORT_FILE=${_EXPORT_TO}/gen_sales_report_${_dt}.csv


# create the sample logs for running beeline command
inform "[CMD] beeline -u jdbc:hive2://${_HOST_IP} \
-n ${_HIVE_USER_NAME} \
-p ${_HIVE_PASSWORD} \
-i ${_hive_config_file} \
--outputformat=csv2 \
--verbose=false \
--showHeader=true \
--silent=true \
--fastConnect=true \
--hivevar dt=${_dt} \
-f ${_HQL_FILE_PATH} | sed '/^$/d' > ${_EXPORT_FILE}"

# --outputformat = takes the output file format in file system
# --verbose = print the logs with error and warning, takes (true,false)
# --showHeader = Add the header row into output File
# --silent = beeline run without create the logs into terminal
# --fastConnect = When connecting, skip building a list of all tables
# and columns for tab-completion of HiveQL statements (true)
# or build the list (false)
# sed '/^$/d' = Remove blank space and line from output files
beeline -u jdbc:hive2://${_HOST_IP} \
-n ${_HIVE_USER_NAME} \
-p ${_HIVE_PASSWORD} \
-i ${_hive_config_file} \
--outputformat=csv2 \
--verbose=false \
--showHeader=true \
--silent=true \
--fastConnect=true \
--hivevar dt=${_dt} \
-f ${_HQL_FILE_PATH} | sed '/^$/d' > ${_EXPORT_FILE} ||
{
error "Something Went Wrong During Generating Report Please see the logs";
exit 1;
}


# Check the Date function
inform "Report Generated Successfully.";
inform "[OP ] END"

}
main "${@}"

To Generate Report by running command below

# Export the _dt (datetime) enviroment variable into Shell
> export _dt=2023-10-10
# Run the following shell file below with datetime argument(_dt)
> sh /home/hadoop/app/src/exe_generate_sales_report.sh ${_dt}

To Check Report is generate into reports folder by following command below

# Export datetime envirioment variable
> export _dt=2023-10-10
# using the cat command to see the generated file
> cat /home/hadoop/app/reports/gen_sales_report_${_dt}.csv

All the codebase are available at following github link

--

--

Tariqul Islam
Tariqul Islam

Written by Tariqul Islam

Tariqul Islam have 9+ years of software development experience. He knows Python, Node Js and Java, C# , PHP.