join

Basic Usage of join - Merging Files by Common Fields

The join command is used to combine lines from two files based on matching fields. It works similarly to database joins and is commonly used in shell scripting, reporting, CSV processing, and automation.

join file1 file2

Example file1:

1 Alice
2 Bob
3 Charlie

Example file2:

1 Admin
2 User
3 Developer

Command:

join file1.txt file2.txt

Output:

1 Alice Admin
2 Bob User
3 Charlie Developer
  • join merges lines with matching fields
  • By default, matching is performed on first field
  • Both files must usually be sorted first
  • Commonly used for structured text processing

Important Requirement: Files Must Be Sorted

The join command expects sorted input.

Incorrect workflow:

join users.txt roles.txt

Possible error:

join: file is not in sorted order

Correct workflow:

sort users.txt -o users.txt
sort roles.txt -o roles.txt

join users.txt roles.txt
  • sort arranges lines alphabetically
  • join depends on ordered input
  • Very important requirement in production usage

Understanding Matching Fields

Example file1:

1001 Alice
1002 Bob
1003 Charlie

Example file2:

1001 IT
1002 HR
1003 Finance

Command:

join employees.txt departments.txt

Matching occurs using:

1001
1002
1003

These values form the join key.

Result:

1001 Alice IT
1002 Bob HR
1003 Charlie Finance

Joining Using Different Fields

By default, join uses field 1 from both files.

To specify different fields:

join -1 2 -2 1 file1.txt file2.txt

Breakdown:

  • -1 2 means use field 2 from first file
  • -2 1 means use field 1 from second file

Useful when file structures differ.


Example with Different Join Fields

File1:

Alice 1001
Bob 1002

File2:

1001 Admin
1002 User

Command:

join -1 2 -2 1 file1.txt file2.txt

Output:

1001 Alice Admin
1002 Bob User
  • Matches second field from file1
  • Matches first field from file2

Using Custom Delimiters

By default, fields are separated by spaces.

For CSV files:

join -t "," file1.csv file2.csv

Example file1:

1,Alice
2,Bob

Example file2:

1,Admin
2,User

Output:

1,Alice,Admin
2,Bob,User
  • -t "," defines comma delimiter
  • Essential for CSV processing

Displaying Unmatched Lines

Normally, unmatched lines are ignored.

To display unmatched lines from first file:

join -a1 users.txt roles.txt

Example:

File1:

1 Alice
2 Bob
3 Charlie

File2:

1 Admin
2 User

Output:

1 Alice Admin
2 Bob User
3 Charlie
  • -a1 includes unmatched lines from file1
  • Missing fields remain empty

To include unmatched lines from second file:

join -a2 users.txt roles.txt

Displaying Only Unmatched Lines

To display only unmatched lines:

join -v1 users.txt roles.txt
  • -v1 shows unmatched lines from first file only
  • Useful for audits and comparisons

Example output:

3 Charlie

Handling Empty Fields

To replace missing fields with custom value:

join -e "N/A" users.txt roles.txt

Example output:

3 Charlie N/A
  • -e defines replacement string
  • Useful for reporting

Selecting Specific Output Fields

By default, all fields are displayed.

To customize output:

join -o 1.2 2.2 users.txt roles.txt

Breakdown:

Value Meaning
1.2 field 2 from file1
2.2 field 2 from file2

Example output:

Alice Admin
Bob User
  • Useful for report generation
  • Similar to SQL SELECT behavior

Combining join with Pipes

The join command is commonly used with sort.

Example:

sort users.txt > users.sorted
sort roles.txt > roles.sorted

join users.sorted roles.sorted

Another example:

join <(sort file1) <(sort file2)
  • Process substitution avoids temporary files
  • Common in advanced shell scripting

Combining Multiple Options

Example:

join -t "," -a1 -e "N/A" users.csv roles.csv

Breakdown:

  • -t "," uses comma delimiter
  • -a1 includes unmatched lines from first file
  • -e "N/A" fills missing values

Common Administrative Examples

Join usernames with login shells:

join users.txt shells.txt

Merge employee IDs with departments:

join employees.csv departments.csv

Compare inventory records:

join stock_old.txt stock_new.txt

Find users missing roles:

join -v1 users.txt roles.txt

Practical Script Example (Step-by-Step Explanation)

Script

#!/bin/bash

USERS="users.txt"
ROLES="roles.txt"

sort $USERS -o $USERS
sort $ROLES -o $ROLES

echo "Merged user data:"

join $USERS $ROLES

Step 1: Shebang

#!/bin/bash
  • Defines Bash interpreter
  • Ensures script executes correctly

Step 2: Defining file variables

USERS="users.txt"
ROLES="roles.txt"
  • Stores file names in variables
  • Makes script easier to maintain

Example values:

users.txt
roles.txt

Step 3: Sorting first file

sort $USERS -o $USERS

Breakdown:

  • sort sorts file contents
  • -o writes output back into same file

This prepares file for join.


Step 4: Sorting second file

sort $ROLES -o $ROLES
  • Performs same operation for second file
  • Both files must be sorted before joining

Step 5: Displaying informational message

echo "Merged user data:"
  • Prints readable heading
  • Organizes script output

Step 6: Joining the files

join $USERS $ROLES
  • Matches lines using first field
  • Combines matching records

Example:

Input:

1 Alice
1 Admin

Output:

1 Alice Admin

What this script does

Step-by-step flow:

  1. Defines input files
  2. Sorts both files
  3. Displays heading
  4. Merges matching records
  5. Prints combined output

Why this matters in production

The join command is useful for:

  • merging structured data
  • generating reports
  • comparing datasets
  • CSV processing
  • automation workflows

It is commonly used in:

  • DevOps automation
  • shell scripting
  • system audits
  • inventory management
  • reporting systems

Common Beginner Mistakes

Trying to join unsorted files:

join file1 file2

without sorting first.

Correct workflow:

sort file1 -o file1
sort file2 -o file2

join file1 file2

Another mistake:

Using wrong delimiter.

Incorrect:

join file1.csv file2.csv

Correct:

join -t "," file1.csv file2.csv

Another mistake:

Expecting SQL-style relational joins.

join is simpler and works strictly on sorted text files.


Summary

In this guide, you learned:

  • how the join command merges files
  • why sorting is required
  • using custom join fields
  • handling CSV delimiters
  • displaying unmatched lines
  • customizing output fields
  • replacing missing values
  • combining join with pipes
  • practical shell scripting with join

These skills are essential for:

  • Linux administration
  • shell scripting
  • data processing
  • report generation
  • automation

Additional join parameters not covered in this guide include:

--check-order: Verify input sorting
--nocheck-order: Disable order checking
-j: Shortcut for setting join fields
-o auto: Automatic output formatting
-z: Use null-terminated lines
--help: Display help information
--version: Display version information