join
Basic Usage of join - Merging Files by Common Fields
The join command is used to combine lines from two files based on matching fields. It works similarly to database joins and is commonly used in shell scripting, reporting, CSV processing, and automation.
join file1 file2
Example file1:
1 Alice
2 Bob
3 Charlie
Example file2:
1 Admin
2 User
3 Developer
Command:
join file1.txt file2.txt
Output:
1 Alice Admin
2 Bob User
3 Charlie Developer
joinmerges lines with matching fields- By default, matching is performed on first field
- Both files must usually be sorted first
- Commonly used for structured text processing
Important Requirement: Files Must Be Sorted
The join command expects sorted input.
Incorrect workflow:
join users.txt roles.txt
Possible error:
join: file is not in sorted order
Correct workflow:
sort users.txt -o users.txt
sort roles.txt -o roles.txt
join users.txt roles.txt
sortarranges lines alphabeticallyjoindepends on ordered input- Very important requirement in production usage
Understanding Matching Fields
Example file1:
1001 Alice
1002 Bob
1003 Charlie
Example file2:
1001 IT
1002 HR
1003 Finance
Command:
join employees.txt departments.txt
Matching occurs using:
1001
1002
1003
These values form the join key.
Result:
1001 Alice IT
1002 Bob HR
1003 Charlie Finance
Joining Using Different Fields
By default, join uses field 1 from both files.
To specify different fields:
join -1 2 -2 1 file1.txt file2.txt
Breakdown:
-1 2means use field 2 from first file-2 1means use field 1 from second file
Useful when file structures differ.
Example with Different Join Fields
File1:
Alice 1001
Bob 1002
File2:
1001 Admin
1002 User
Command:
join -1 2 -2 1 file1.txt file2.txt
Output:
1001 Alice Admin
1002 Bob User
- Matches second field from file1
- Matches first field from file2
Using Custom Delimiters
By default, fields are separated by spaces.
For CSV files:
join -t "," file1.csv file2.csv
Example file1:
1,Alice
2,Bob
Example file2:
1,Admin
2,User
Output:
1,Alice,Admin
2,Bob,User
-t ","defines comma delimiter- Essential for CSV processing
Displaying Unmatched Lines
Normally, unmatched lines are ignored.
To display unmatched lines from first file:
join -a1 users.txt roles.txt
Example:
File1:
1 Alice
2 Bob
3 Charlie
File2:
1 Admin
2 User
Output:
1 Alice Admin
2 Bob User
3 Charlie
-a1includes unmatched lines from file1- Missing fields remain empty
To include unmatched lines from second file:
join -a2 users.txt roles.txt
Displaying Only Unmatched Lines
To display only unmatched lines:
join -v1 users.txt roles.txt
-v1shows unmatched lines from first file only- Useful for audits and comparisons
Example output:
3 Charlie
Handling Empty Fields
To replace missing fields with custom value:
join -e "N/A" users.txt roles.txt
Example output:
3 Charlie N/A
-edefines replacement string- Useful for reporting
Selecting Specific Output Fields
By default, all fields are displayed.
To customize output:
join -o 1.2 2.2 users.txt roles.txt
Breakdown:
| Value | Meaning |
|---|---|
1.2 |
field 2 from file1 |
2.2 |
field 2 from file2 |
Example output:
Alice Admin
Bob User
- Useful for report generation
- Similar to SQL SELECT behavior
Combining join with Pipes
The join command is commonly used with sort.
Example:
sort users.txt > users.sorted
sort roles.txt > roles.sorted
join users.sorted roles.sorted
Another example:
join <(sort file1) <(sort file2)
- Process substitution avoids temporary files
- Common in advanced shell scripting
Combining Multiple Options
Example:
join -t "," -a1 -e "N/A" users.csv roles.csv
Breakdown:
-t ","uses comma delimiter-a1includes unmatched lines from first file-e "N/A"fills missing values
Common Administrative Examples
Join usernames with login shells:
join users.txt shells.txt
Merge employee IDs with departments:
join employees.csv departments.csv
Compare inventory records:
join stock_old.txt stock_new.txt
Find users missing roles:
join -v1 users.txt roles.txt
Practical Script Example (Step-by-Step Explanation)
Script
#!/bin/bash
USERS="users.txt"
ROLES="roles.txt"
sort $USERS -o $USERS
sort $ROLES -o $ROLES
echo "Merged user data:"
join $USERS $ROLES
Step 1: Shebang
#!/bin/bash
- Defines Bash interpreter
- Ensures script executes correctly
Step 2: Defining file variables
USERS="users.txt"
ROLES="roles.txt"
- Stores file names in variables
- Makes script easier to maintain
Example values:
users.txt
roles.txt
Step 3: Sorting first file
sort $USERS -o $USERS
Breakdown:
sortsorts file contents-owrites output back into same file
This prepares file for join.
Step 4: Sorting second file
sort $ROLES -o $ROLES
- Performs same operation for second file
- Both files must be sorted before joining
Step 5: Displaying informational message
echo "Merged user data:"
- Prints readable heading
- Organizes script output
Step 6: Joining the files
join $USERS $ROLES
- Matches lines using first field
- Combines matching records
Example:
Input:
1 Alice
1 Admin
Output:
1 Alice Admin
What this script does
Step-by-step flow:
- Defines input files
- Sorts both files
- Displays heading
- Merges matching records
- Prints combined output
Why this matters in production
The join command is useful for:
- merging structured data
- generating reports
- comparing datasets
- CSV processing
- automation workflows
It is commonly used in:
- DevOps automation
- shell scripting
- system audits
- inventory management
- reporting systems
Common Beginner Mistakes
Trying to join unsorted files:
join file1 file2
without sorting first.
Correct workflow:
sort file1 -o file1
sort file2 -o file2
join file1 file2
Another mistake:
Using wrong delimiter.
Incorrect:
join file1.csv file2.csv
Correct:
join -t "," file1.csv file2.csv
Another mistake:
Expecting SQL-style relational joins.
join is simpler and works strictly on sorted text files.
Summary
In this guide, you learned:
- how the
joincommand merges files - why sorting is required
- using custom join fields
- handling CSV delimiters
- displaying unmatched lines
- customizing output fields
- replacing missing values
- combining join with pipes
- practical shell scripting with
join
These skills are essential for:
- Linux administration
- shell scripting
- data processing
- report generation
- automation
Additional join parameters not covered in this guide include:
--check-order: Verify input sorting
--nocheck-order: Disable order checking
-j: Shortcut for setting join fields
-o auto: Automatic output formatting
-z: Use null-terminated lines
--help: Display help information
--version: Display version information