Hadoop , Hive, Shell Script(BASH), Beeline and hiveserver2 (Data Pipeline Automation Part 2)

14 min readNov 6, 2023

Using Shell Script(Bash) with Hive and Hadoop, we can build robust automated data pipeline solution for project. In this part,I will discuss about minimal knowledge base of learn shell script: how to create function and how to run the shell script and how to create environment variable for shell script and Hive. Also build the shell script, which will then run the HQL(Hive Query Language) files and commands to generate the sample report for application with CSV format.

For Next Article (Part 3), I will add the Azkaban Scheduler to create jobs and scheduler to run shell script to automated the daily task for data pipelining.

Prerequisite

1. Linux Dabien  or (Ubuntu)
2. Hive and Hadoop
3. Configured Bash

Create the User hadoop and Log into User in Debian

> sudo adduser hadoop
> sudo adduser haddop sudo
> sudo reboot

After restart, login as hadoop user to run all of the below operation

> su - hadoop
> pwd
/home/hadoop

Creating application Structure

> mkdir -p app
> cd app
> mkdir -p conf data ddl dml func src reports

# Application Structure
/home/hadoop/app
|-- conf
|   |-- hive.conf.ini.hql
|-- data
|   |-- *.csv
|-- ddl
|   |-- tbl_<table_name>*.hql
|   |-- create_database.hql
|-- dml
|   |-- *.hql
|-- func
|   |-- func_<name>.sh
|-- src
|   |-- exe_<process_name>*.sh
|
|-- reports
|   |--gen_<report_name>_<datetime>.csv

All the data related to application you will find from

hive-hadoop-shell-sla/data at main · tariqulislam/hive-hadoop-shell-sla

Create the scheduled flow application with hive hadoop and Azkaban Scheduler - hive-hadoop-shell-sla/data at main ·…

github.com

If you want to know about installing hive hadoop and configured it please read my following previous published article

Hadoop , Hive, Python and Azkaban Scheduler (Data Pipeline Automation Part 1)

Automating the data pipeline is challenging works for any software development environment. To handle Big Data and…

tariqul-islam-rony.medium.com

Basic Knowledge about Shell Script and Create functions to use it for Data pipeline application

Configure the Bash in Debian, because by default debien configured with Dash

How to use Bash for sh in Ubuntu

I am installing a huge program, which has its resources as an rpm file. It stuck at the line of #!/bin/sh…

unix.stackexchange.com

Create the sample function which will generate the log during running from terminal

#!/bin/bash

# -------
# arugments 
# $1 = type of log
# $2 = message
# $3 = filename of script
# -------

function __log {
  local _level="$1";
  local _script="$2";
  local _message="$3";
  # Add the datetime into log rows 
  local _date_time=$(date +"%F %T");
  local _print_msg="[${_level}] [${_date_time}]${_script} ${_message}";
  echo "${_print_msg}"; 
}

# caller returns the context of any active subroutine call
# Log function to return the error message with argument
function error {
  local _caller=($(caller));
  __log "ERROR" "[${_caller[1]}](${_caller[0]})" "$@";
}

# Warn function to return the warn message with arguments
function warn {
  local _caller=($(caller));
  __log "WARN" "[${_caller[1]}](${_caller[0]})" "$@";
}

# log function to return any log and success message with aruguments
function inform {
   local _caller=($(caller));
  __log "INFO" "[${_caller[1]}](${_caller[0]})" "$@"
}

function <function_name> {
} -> function or method declaration block for bash script

local -> is block variable declaration, work with in block of function

caller -> caller. caller [ expr ] Returns the context of 
any active subroutine call (a shell function or a script executed
within it

$1 -> Command Arg or Parameter handling for terminal or shell or bash

$@ -> internal bash variables that represents all parameters 
passed into a function or a scrip

Sample script to test log function

#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "function file is missing"; exit 1; }

function main {
    inform "[Test ] Logging function is working perfectly";
}

main "$@"

> sh src/exe_insert_department_info.sh 
[INFO] [2023-10-25 11:32:47][src/exe_insert_department_info.sh](8) [Test ] Logging function is working perfectly

source -> using for sourcing the other shell files, it makes the other shell
files function and variable within caller script and functions

Create date format function to use during check date argument from terminal

#!/bin/bash

function is_date_yyyy_mm_dd {
    local _check_date_format=$1

    # if condition to check the date format
    if [[ "$(date +'%Y-%m-%d' -d ${_check_date_format} 2> /dev/null)" == "${_check_date_format}" ]]; then
      true
    else
      false
    fi
}

#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
    inform "[Test ] is_date_yyyy_mm_dd function";
    
    # Check the Date function
    local _dt=
    if [[ $# -eq 0 ]]; then
      _dt=$(date -d "+%Y-%m-%d");
    elif [[ $# -eq 1 ]]; then
      _dt=$1;
    else
       error "Invalid number of arugments missing date";
       exit 1;
    fi

    is_date_yyyy_mm_dd ${_dt} || {
       error "[Invalid] ${_dt} is in invalid format"
       exit 1;
    }
  
}
main "$@"

<function_name> "$@" --> pass the arugment into following functions

exit 1; --> exit the excution of functions and script

$# --> Stores the total number of arguments

is_date_yyyy_mm_dd ->  is the function to check the date format which is
source is from func_date.sh file

$(date +'%Y-%m-%d' -d ${_check_date_format} 2> /dev/null) -> bash function
to check the date formated by following '%Y-%m-%d' format

Create hive properties or initialization config file using during run hive to get and save data by hive into hadoop cluster

/home/hadoop/app/conf/hive.config.ini.hql

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

SET hivevar:DATABASE_NAME=hive_shell_sample;
SET hivevar:TBL_EMPLOYEE=${DATABASE_NAME}.employees;
SET hivevar:TBL_DEPARTMENT=${DATABASE_NAME}.departments;
SET hivevar:TBL_SALES_INFO=${DATABASE_NAME}.sales_info;

Database Creation and Design DDL and Shell script to create the data structure into Hive

Create database HQL file into/home/hadoop/app/ddl/create_database.hql

CREATE DATABASE IF NOT EXISTS ${DATABASE_NAME};

Create Shell script to create database, The file is following /home/hadoop/app/src/exe_create_database.sh

#!/bin/bash
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
    inform "[OP] Start ";
    
    local _create_db_hql=${app_root_dir}/ddl/create_database.hql
    
    SECONDS=0

    hive -f ${_create_db_hql} || {
        error "Creating Database failed";
        exit 1;
    }

    inform "Database is Created Successfully";
    inform "Execution Time: ${convert_readable_time ${SECONDS}}";
    inform "[OP] End";

}

main "${@}"

inform -> Another function which is source from func_logs.sh file to print
the log during running shell script

error -> Another function which will create error logs during run shell script
in terminal

Run the following shell script and check the database is created into hive

> sh src/exe_create_database.sh
> hive -e "show databases"

# hive -e will take the native database query command to run the 
# DML (Data Manipulation Language)
# and DDL (Data Defination Language)

Create department table on following bellow file /home/hadoop/app/ddl/tbl_department.hql

CREATE TABLE IF NOT EXISTS ${TBL_DEPARTMENT} (
    department_id INT,
    department_name STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');

# Notes and Discussion

CREATE TABLE [IF NOT EXISTS] <TABLE NAME> ()
[PROPERTIES ...]

You can create the table by following command 
PARTITIONED BY (Column Name  Data Type)--> Create the Partition in Hive 
and Hadoop Cluste Following  column name and data type

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
 -> This property provide the how data will be saved
Info Hadoop Cluster, The table will support delemited data with
field by ',' and line or each row by new line

STORED AS [STORE TYPE] -> we can define how the data will be saved into
Hadoop Cluster in hive warehouse location

TBLPROPERTIES ("transactional" = 'false') -> we can sepcifiy the 
ACID transaction, we make it false, in text formatted data update
and delete will not work, we have to overwrite it

Create employee table to /home/hadoop/app/tbl_employee.hql

CREATE TABLE IF NOT EXISTS ${TBL_EMPLOYEE} (
    employee_id INT,
    employee_name STRING,
    salary_emp DOUBLE,
    department_id INT
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');

Create sales Info table to /home/hadoop/app/ddl/tbl_sale_info.hql which will be used for saved the data for every sales info for employee for each day.


CREATE TABLE IF NOT EXISTS ${TBL_SALES_INFO} (
    employee_id INT,
    product_name STRING,
    sales_count INT,
    sales_price DOUBLE
)
PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ("transactional" = 'false');

Create Sales Report table into /home/hadoop/app/ddl/tbl_sale_report.hql , which will save the data for sales report after generating from sales info table and served as report table

CREATE TABLE IF NOT EXISTS ${TBL_SALES_REPORT} (
    employee_id INT,
    employee_name STRING,
    department_id INT,
    department_name STRING,
    product_name STRING,
    sales_count INT,
    sales_price DOUBLE,
    total_sales DOUBLE
)
PARTITIONED BY (dt STRING)
TBLPROPERTIES ("transactional" = 'false');

Shell Script for create the table into hive from bash or shell, i created the into following localtion /home/hadoop/app/src/exe_create_table.sh


#!/bin/bash
set -e
app_root_dir=/home/hadoop/app
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }

function main {
    inform "[OP] Start ";
    local _table_name=
    if [[ "$#" -eq "0" ]]; then
        error "Please provide the table name to create table info Database";
        exit 1;
    elif [[ "$#" -eq "1" ]];then
       _table_name=$1
    else
       error "Error happend during provide the table name";
       exit 1;
    fi

    local _hive_ini_file=${app_root_dir}/conf/hive.conf.ini.hql
    local _create_table_hql=${app_root_dir}/ddl/tbl_${_table_name}.hql
    # Check the Date function
    
    SECONDS=0

    hive -i ${_hive_ini_file} -f ${_create_table_hql} || {
        error "Creating Table is failed.";
        exit 1;
    }

    inform "Table is Created Successfully";
    inform "Execution Time: $(convert_readable_time ${SECONDS})";
    inform "[OP] End";
}
main "${@}"

Run command to create table for employee, department, sales_info, sales_report from shell or terminal


> chmod +x /home/hadoop/app/src/exe_create_table.sh

> sh src/exe_create_table.sh department
> sh src/exe_create_table.sh employee
> sh src/exe_create_table.sh sales_info
> sh src/exe_create_table.sh sales_report

> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
    -e "USE ${DATABASE_NAME};SHOW TABLES"

chmod +x -> provide the execution permission to following shell for the user

hive -e <init file> -e <HQL Script> -> to run the hive query from HQL directly
from shell

hive -i <init file> -f <HQL file Name> -> 
    -i take the intilization  configuration file for hive
    -f take the script file for run the assigend HQL

Create DML and Shell Script file to insert data to hadoop cluster by Hive

Insert data from local datasource to hive table, create the dml file into /home/hadoop/app/dml/load_data_into_table.hql

SET hivevar:TBL_NAME=${DATABASE_NAME}.${TABLE_NAME};
LOAD DATA LOCAL INPATH '${DATA_SOURCE_FILE_PATH}'
OVERWRITE INTO TABLE ${TBL_NAME}
PARTITION(${PARTITION_NAME}='${PARTIITON_VALUE}');

SET hivevar:<VARIABLE_NAME>=<VALUE>
# Set the hivevar during working with hive from terminal

# LOAD DATA [LOCAL] INPATH <SOURCE/LOCATION>
# [OVERWRITE] INTO TABLE <TABLE_NAME>
# [PARTITION (COLUMN_NAME=VALUE)]

[LOCAL] -> will takes the system local path (absulate path) for data source.
If we do not specify, command will searching for file in HDFS location

[OVERWRITE] ->  will overwrite the file if we using the parition and Table data
store format is TEXT

[PARTITION] -> will create the parition into specified table where the data will
be saved into hadoop cluster

Create shell file at /home/hadoop/app/src/exe_load_data_to_table.sh

#!/bin/bash
# It will print all the logs
set -e
app_root_dir=/home/hadoop/app

# source the log and date function from different file
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# Create the function to load data from local file system to hive
function main {
    inform "[OP ] START";

    # Select the HQL file which is create up for loading
    # data from local system 
    local _hql_file_path=${app_root_dir}/dml/load_data_into_table.hql
    
    # Check the shell command will contains 4 arugment atlast 
    if [[ $# -lt 4 ]]; then
      error "Please Insert the currect Arguments"
      exit 1;
    fi
     

    # local variable for data source file
    local _DATA_SOURCE_FILE_PATH="${app_root_dir}/data/$1.csv";
    # Which hive table you want to manage the data
    local _TABLE_NAME=$2;
    # Which Table partition column will contains the data 
    local _PARTITION_NAME=$3;
    # Which hive parition and which location in hadoop cluster will
    # Contains data
    local _PARTIITON_VALUE=$4;
     
    # logging the information
    inform "DataSource: ${_DATA_SOURCE_FILE_PATH}.csv"
    inform "Table Name: ${_TABLE_NAME}"
    inform "Partition Name: ${_PARTITION_NAME}"
    inform "Parition Value: ${_PARTIITON_VALUE}"
    
    # Run hive command to load data from local system
    # hive -i <HIVE_CONFIG> 
    #   [--hivevar variable_name=variable_value]
    # -f [Hive Query file name]
    # || -> or conditional Structure for Bash
    hive -i ${_hive_config_file} \
          -hivevar  DATA_SOURCE_FILE_PATH=${_DATA_SOURCE_FILE_PATH} \
          -hivevar  PARTITION_NAME=${_PARTITION_NAME} \
          -hivevar  PARTIITON_VALUE=${_PARTIITON_VALUE} \
          -hivevar  TABLE_NAME=${_TABLE_NAME} \
          -f ${_hql_file_path} || {
        # Error Logs into terminal (error function source it from func_logs.sh)
        error "Putting file into HDFS is failed. Or HDFS service is not running"
        # Exit the shell file is error occured
        exit 1;
    }

    inform "Data Insert Into Database table....";
    inform "[OP ] END"
}

main "${@}"

<bash function> || {} -> Sort hand or operator, we can use it as 
(if ...else) statement

To load employee data into employees table with partition by running following command and check

# run shell file with sh pass the 4 arugments
# arg 1 -> local file name without extension
# arg 2 -> table name in hive
# arg 3 -> partition name
# arg 4 -> partition value
> sh src/exe_load_data_to_table.sh employee employees dt 2023-10-10
> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
    -e "SELECT * FROM ${TBL_EMPLOYEE}"

To Load Department Data into departments table run following command and check

# run shell file with sh pass the 4 arugments
# arg 1 -> local file name without extension
# arg 2 -> table name in hive
# arg 3 -> partition name
# arg 4 -> partition value

> sh src/exe_load_data_to_table.sh department departments dt 2023-10-10
> hive -i /home/hadoop/app/conf/hive.conf.ini.hql \
-e "SELECT * FROM ${TBL_DEPARTMENT}"

Load Sales Info Data into sales_info table run following command and check

> sh \
> src/exe_load_data_to_table.sh sales_info_2023-10-10 sales_info dt 2023-10-10

> sh \
> src/exe_load_data_to_table.sh sales_info_2023-10-11 sales_info dt 2023-10-11

> hive -e "SELECT * FROM ${TBL_SALES_INFO}"

Generate The sales Report (CSV file) by HQL and Shell Script

Create HQL file, which using for processing the sales report data and save the data Sales report by date partition /home/hadoop/app/dml/procress_sales_report.hql

INSERT OVERWRITE TABLE ${TBL_SALES_REPORT} 
PARTITION (dt='${dt}')
SELECT * FROM (
WITH emp AS (
SELECT
     emp.employee_id,
     emp.employee_name,
     emp.department_id
   FROM
    ${TBL_EMPLOYEE} as emp
   WHERE
     dt='2023-10-10'
),
dept AS (
   SELECT
     dept.department_id,
     dept.department_name
   FROM
    ${TBL_DEPARTMENT} as dept
   WHERE
     dt='2023-10-10'
)
SELECT 
   emp.employee_id,
   emp.employee_name,
   dept.department_id,
   dept.department_name,
   sales.product_name,
   SUM(sales.sales_count) as sales_count,
   sales.sales_price,
   SUM(sales.sales_count * sales.sales_price) as total_sales
FROM 
   emp
JOIN
   dept ON emp.department_id = dept.department_id
JOIN 
   ${TBL_SALES_INFO} as sales ON sales.employee_id = emp.employee_id
WHERE
   sales.dt='${dt}'
GROUP BY
   emp.employee_id,
   employee_name,
   dept.department_id,
   department_name,
   product_name,
   sales_count,
   sales_price,
   sales.dt
) AS SALES_INFO;

INSERT INTO [OVERWRITE] TABLE <TABLE_NAME>
[PARTITION (PARTITION_NAME=PARTITION_VALUE)
SELECT [col1,col2...] FROM <TABLE_NAME>
-> Structure for transfer data from one table to another with parition  

${HIVE_VARIABLE} -> Add the value from hive variable

WITH <NAME> AS () -> Using for handling the SubQuery and Remove complexility
Looks of Query

JOIN -> run the inner join between tables

SUM -> sum function for Hive 

GROUP BY -> we have aggrigation function into our query, we need to group by
with other field except group by field

Using Beeline and HiveServer2 to process the report

To run Hiveserver2 open the another tab into terminal or run as service into linux or other operating system,

> hiveserver2

Then we can connect to beeline by following, we are using basic authentication for access to hive using username and password

> beeline -u jdbc:hive2://<hostname>/<database_name> \
     -n <username>\
     -p <password> \
     -i <hive initializtion file> \
     --hivevar <hive_variable_name>=<hive_variable_value> \
     [-f <hive query file location>]
     [-e <hive Query>]

Run Insert Query by below following script below, I create that script into following location /home/hadoop/app/src/exe_process_sales_report.sh

#!/bin/bash
set -e
# Application root directory
app_root_dir=/home/hadoop/app
# Source the log and date function from following below file 
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
# hive configuration or initialization file location
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# function for run the process the report data
function main {
    inform "[Test ] Logging function is working perfectly";
    
    # get the datetime information from arugment for shell file
    local _dt=
    # check the shell command has an arugment
    if [[ $# -eq 0 ]]; then
       error "Please provie date information";
       exit 1;
    elif [[ $# -eq 1 ]]; then
       # take the arg 1 as date time
       _dt=$1
    else
       error "Error occured during provide arguments";
       exit 1;
    fi
    
    # check the format of the date
    is_date_yyyy_mm_dd ${_dt} || {
       error "[Invalid] ${_dt} is in invalid format"
       exit 1;
    }


    local _HOST_IP="<hiveserver2 ip>:<Running Port>/default"
    local _HIVE_USER_NAME=<hive_user_name>
    local _HIVE_PASSWORD=<hive_password>
    # output format of the beeline
    local _OUTPUT_FORMAT=csv2
    # file location of HQL run the beeline
    local _HQL_FILE_PATH=${app_root_dir}/dml/process_sales_report.hql

    # beeline command to run hive query
    beeline -u jdbc:hive2://${_HOST_IP} -n ${_HIVE_USER_NAME} \
     -p ${_HIVE_PASSWORD}  \
     -i ${_hive_config_file} \
     --hivevar dt=${_dt} \
     -f ${_HQL_FILE_PATH} || {
        error "Something Went Wrong During Generating Report Please see the logs";
        exit 1;
      }
    
    # Check the Date function
    inform "Report Generated Successfully.";
    inform "[OP ] END"
    
}
main "${@}"

Then we can run the shell file from terminal

# export env variable to shell by (datetime into shell) following command
> export _dt=2023-10-10
# run the shell script for process report with ${_dt} env variable
> sh /home/hadoop/app/src/exe_process_sales_report.sh ${_dt}

After running the report processing script, check hive already create the partition in HDFS and it already has data

# export the database name and table in to enviroment variable 
> export _database=hive_shell_sample
> export _table=sales_report
# hdfs dfs -ls <Haddoop directory location> will shows the location of
# the directory and file where the hive save the data into hdfs
# /user/hive/warehouse is hive warehouse location
> hdfs dfs -ls \ 
     /user/hive/warehouse/${_database}.db/${_table}/${_dt}/

Generate Report for Sales Report Every Day with CSV format

Create the HQL file into /home/hadoop/app/dml/generate_sales_report.hql location to generate salese report by date partition

SELECT 
    employee_id as `employe ID`,
    employee_name as `employee Name`,
    department_id as `Department ID`,
    department_name as `Department Name`,
    product_name as `Product Name`,
    sales_count as `Sales Count`,
    sales_price as `Sales Price`,
    total_sales as  `Total Sales`
FROM 
   ${TBL_SALES_REPORT}
WHERE dt='${dt}';

Create shell script /home/hadoop/app/src/exe_generate_sales_report.sh following location to generate the report on /home/hadoop/app/reports folder.

#!/bin/bash
set -e
# App root directory
app_root_dir=/home/hadoop/app

# sourcing the log and date function from following below files
source ${app_root_dir}/func/func_logs.sh || { >&2 echo "${app_root_dir}/func/func_logs.sh function file is missing"; exit 1; }
source ${app_root_dir}/func/func_date.sh || { >&2 echo "${app_root_dir}/func/func_date.sh function file is missing"; exit 1; }
# Hive initialization file
_hive_config_file=${app_root_dir}/conf/hive.conf.ini.hql

# shell function for generating the sales report
function main {
    inform "[OP ] Start Generating Report";
    # Take the datetime arguments
    local _dt=
    if [[ $# -eq 0 ]]; then
       error "Please provie date information";
       exit 1;
    elif [[ $# -eq 1 ]]; then
       _dt=$1
    else
       error "Error occured during provide arguments";
       exit 1;
    fi

    inform "[Check ] Start checking Datetime format";

    # check the format of the date
    is_date_yyyy_mm_dd ${_dt} || {
       error "[Invalid] ${_dt} is in invalid format"
       exit 1;
    }
    
    local _HOST_IP="<HiveServer2 IP>:<Running Port>/<Database Name>"
    local _HIVE_USER_NAME=<Hive User name>
    local _HIVE_PASSWORD=<Hive Password>
    # Output format [CSV2, TSV2, CSV, TSV]
    local _OUTPUT_FORMAT=csv2
    # HQL file which contains Query for generate report
    local _HQL_FILE_PATH=${app_root_dir}/dml/generate_sales_report.hql
    # Export to -> which local system folder contains the Generated Report
    local _EXPORT_TO=${app_root_dir}/reports
    # Generated File name with date (CSV formatted file)
    local _EXPORT_FILE=${_EXPORT_TO}/gen_sales_report_${_dt}.csv
 
   
   # create the sample logs for running beeline command
   inform "[CMD]  beeline -u jdbc:hive2://${_HOST_IP} \
     -n ${_HIVE_USER_NAME} \
     -p ${_HIVE_PASSWORD} \
     -i ${_hive_config_file} \
     --outputformat=csv2 \
     --verbose=false \
     --showHeader=true \
     --silent=true \
     --fastConnect=true \
     --hivevar dt=${_dt} \
     -f ${_HQL_FILE_PATH} |  sed '/^$/d' > ${_EXPORT_FILE}"

   # --outputformat = takes the output file format in file system
   # --verbose = print the logs with error and warning, takes (true,false)
   # --showHeader = Add the header row into output File
   # --silent = beeline run without create the logs into terminal
   # --fastConnect = When connecting, skip building a list of all tables 
   # and columns for tab-completion of HiveQL statements (true) 
   # or build the list (false)
   # sed '/^$/d' = Remove blank space and line from output files
   beeline -u jdbc:hive2://${_HOST_IP} \
     -n ${_HIVE_USER_NAME} \
     -p ${_HIVE_PASSWORD} \
     -i ${_hive_config_file} \
     --outputformat=csv2 \
     --verbose=false \
     --showHeader=true \
     --silent=true \
     --fastConnect=true \
     --hivevar dt=${_dt} \
     -f ${_HQL_FILE_PATH} |  sed '/^$/d' > ${_EXPORT_FILE} ||
     {
        error "Something Went Wrong During Generating Report Please see the logs";
        exit 1;
      }


    # Check the Date function
    inform "Report Generated Successfully.";
    inform "[OP ] END"
    
}
main "${@}"

To Generate Report by running command below

# Export the _dt (datetime) enviroment variable into Shell
> export _dt=2023-10-10
# Run the following shell file below with datetime argument(_dt) 
> sh /home/hadoop/app/src/exe_generate_sales_report.sh ${_dt}

To Check Report is generate into reports folder by following command below

# Export datetime envirioment variable
> export _dt=2023-10-10
# using the cat command to see the generated file
> cat /home/hadoop/app/reports/gen_sales_report_${_dt}.csv

All the codebase are available at following github link

GitHub - tariqulislam/hive-hadoop-shell-sla: Create the scheduled flow application with hive hadoop…

Create the scheduled flow application with hive hadoop and Azkaban Scheduler - GitHub …