Unit 4. Advanced Shell Programming

INDEX
4.1. Splitting, Comparing, Sorting, Merging & Ordering Files.
4.2. Filtering utilities: grep, sed etc.
4.3. awk utility
🔧 Advanced Shell Programming: File Operations in Linux
When working with large sets of data, files, or logs, shell programming provides powerful tools to split, compare, sort, merge, and order files.
These operations help automate data processing and improve productivity.
📂 1. Splitting Files
The split
command divides a large file into smaller chunks.
✅ Syntax:
split [options] file prefix
✅ Example:
split -l 1000 bigfile.txt part_
🔹 This splits bigfile.txt
into smaller files with 1000 lines each, named part_aa
, part_ab
, etc.
🆚 2. Comparing Files
Linux offers multiple tools to compare files:
a. diff
Compares two files line-by-line.
diff file1.txt file2.txt
🔹 Outputs the differences between the files.
b. cmp
Compares files byte-by-byte.
cmp file1.txt file2.txt
🔹 Shows the first mismatch location.
c. comm
Compares two sorted files line-by-line.
comm file1.txt file2.txt
🔹 Shows common lines and unique lines from each file.
🔃 3. Sorting Files
The sort
command arranges the contents of a file based on alphabetical or numerical order.
✅ Syntax:
sort [options] filename
✅ Examples:
Sort alphabetically:
sort names.txt
Sort numerically:
sort -n numbers.txt’
Reverse sort:
sort -r data.txt
Sort by a specific column:
sort -k 2 employees.txt
🔀 4. Merging Files
You can merge sorted files using sort
or cat
, or use paste
for side-by-side merging.
✅ a. cat
– Line-by-line merge:
cat file1.txt file2.txt > merged.txt
✅ b. paste
– Horizontal merge:
paste file1.txt file2.txt
🔹 Joins corresponding lines of each file with a tab.
📊 5. Ordering Files
Ordering refers to organizing data logically, especially for reports or processing.
Alphabetical order:
sort names.txt > ordered_names.txt
Numeric order by field:
sort -k 3 -n employee_data.txt
Unique lines only:
sort -u file.txt
Sort and remove duplicates (case insensitive):
sort -fu file.txt
Using awk
and cut
for Custom File Handling.
Extract specific columns:
cut -d’,’ -f1,3 data.csv
Conditional sorting/filtering:
- awk ‘$3 > 50’ marks.txt | sort -k 3 -n
🔍 Advanced Shell Programming: Filtering Utilities.
Filtering utilities in shell programming are powerful tools used to extract, manipulate, and format data from text files or streams.
They are especially useful for log processing, pattern matching, and text transformation in automated scripts.
1. grep
– Global Regular Expression Print
grep
is used to search for patterns in files. It returns lines that match a given pattern.
✅ Basic Syntax:
grep [options] pattern filename
✅ Examples:
Search for the word “error” in a log file:
grep “error” server.log
Case-insensitive search:
grep -i “login” auth.log
Show line numbers with matches:
grep -n “failed” access.log
Use regular expressions:
grep “^user” users.txt # lines starting with ‘user’
2. sed
– Stream Editor
sed
is a powerful text-processing utility that can perform find-and-replace operations, insert or delete lines, and more.
✅ Basic Syntax:
sed [options] ‘command’ filename
✅ Examples:
Replace the first occurrence of “foo” with “bar”:
sed ‘s/foo/bar/’ file.txt
Replace all occurrences on a line:
sed ‘s/foo/bar/g’ file.txt
Delete a line containing a pattern:
sed ‘/delete/d’ file.txt
Print only modified lines:
sed -n ‘s/old/new/p’ file.txt
3. awk
– Pattern Scanning and Processing
awk
is a full scripting language designed for data extraction and reporting.
✅ Basic Syntax:
awk ‘pattern {action}’ filename
✅ Examples:
Print the second column of a file:
awk ‘{print $2}’ data.txt
Print lines where the third column is greater than 50:
awk ‘$3 > 50’ marks.txt
Use field separator (e.g., comma):
awk -F, ‘{print $1, $3}’ data.csv
4. cut
– Extract Columns
cut
is used to extract specific fields from a file, especially structured with delimiters.
✅ Examples:
Get the first column:
cut -d’,’ -f1 file.csv
Extract multiple fields:
cut -d’,’ -f1,3 file.csv
5. sort
, uniq
, and tr
– Supporting Filters.
sort
: Sorts input data.sort names.txt
uniq
: Removes duplicates from a sorted list.sort names.txt | uniq
tr
: Translates or deletes characters.tr ‘a-z’ ‘A-Z’ < input.txt
AWK Utility in Advanced Shell Programming.
awk
is a powerful text-processing tool in Unix/Linux systems, commonly used for pattern scanning, field extraction, and reporting.It works by reading input line by line, splitting each line into fields, and performing actions based on patterns or conditions.
✅ Key Features of awk
:
Works line-by-line and splits lines into fields (default delimiter is whitespace).
Supports patterns, conditionals, loops, functions, and string manipulation.
Useful for filtering, transforming, summarizing, and formatting data.
📌 Basic Syntax:
awk ‘pattern { action }’ filename
pattern
– a condition or regex.action
– code to execute when the pattern is matched.filename
– the file to process.
🔹 Common Built-in Variables:
💡 Examples:
Print each line:
awk ‘{ print }’ file.txt
Print the first column:
awk ‘{ print $1 }’ file.txt
Print lines where second field equals “pass”:
awk ‘$2 == “pass” { print $0 }’ results.txt
Add numbers in column 3:
awk ‘{ sum += $3 } END { print “Total:”, sum }’ data.txt
Change the field separator (e.g., CSV file):
awk -F’,’ ‘{ print $2 }’ data.csv
🧠 When to Use AWK:
Extract specific columns from files.
Generate reports from structured data.
Perform inline calculations on data.
Filter and reformat text files.
✅ Summary.
Operation | Command | Purpose |
---|---|---|
Split | split | Divide files into parts |
Compare | diff, cmp, comm | Find differences |
Sort | sort | Alphabetical/numerical sorting |
Merge | cat, paste | Combine files |
Order | sort, uniq, awk | Organize data |
Utility | Purpose | Example |
---|---|---|
grep | Search text using patterns | grep "login" log.txt |
sed | Modify text streams | sed 's/error/fixed/' file.txt |
awk | Extract and process fields | awk '{print $1}' data.txt |
cut | Cut specific columns | cut -d',' -f1 file.csv |
sort | Sort lines | sort names.txt |
uniq | Remove duplicates | sort file.txt | uniq |
tr | Translate characters | tr 'a-z' 'A-Z' |