uniq

Basic Usage of uniq - Removing Duplicate Lines

The uniq command is used to filter or remove repeated lines from text input. It is commonly used together with the sort command in Linux shell scripting, log analysis, and data processing.

uniq file.txt

Example file:

apple
apple
banana
banana
orange

Command:

uniq file.txt

Output:

apple
banana
orange

uniq removes duplicate consecutive lines
Only repeated adjacent lines are detected
Frequently combined with sort
Common tool in shell scripting and automation

Important Concept: Consecutive Duplicates Only

The uniq command only works on consecutive duplicate lines.

Example file:

apple
banana
apple
orange

Command:

uniq file.txt

Output:

apple
banana
apple
orange

Nothing changes because duplicate lines are not next to each other.

Correct workflow:

sort file.txt | uniq

Output:

apple
banana
orange

sort groups identical lines together
uniq then removes duplicates
This combination is extremely common in Linux

Displaying Duplicate Counts

To count duplicate occurrences:

uniq -c file.txt

Example input:

apple
apple
banana
banana
banana
orange

Output:

2 apple
3 banana
1 orange

-c means count occurrences
Displays number of repeated lines
Useful for statistics and log analysis

Displaying Only Duplicate Lines

To display only duplicated entries:

uniq -d file.txt

Example input:

apple
apple
banana
banana
orange

Output:

apple
banana

-d means duplicates only
Unique lines are hidden
Useful for finding repeated data

Displaying Only Unique Lines

To display only lines that appear once:

uniq -u file.txt

Example input:

apple
apple
banana
orange

Output:

banana
orange

-u means unique lines only
Duplicate lines are removed completely
Useful for identifying uncommon entries

Ignoring Case Differences

By default, uniq is case-sensitive.

Example input:

Apple
apple
APPLE

Default behavior:

uniq file.txt

Output:

Apple
apple
APPLE

To ignore case differences:

uniq -i file.txt

-i means case-insensitive comparison
Useful for user-generated text data

Skipping Fields

Sometimes only part of each line should be compared.

Example file:

2026 apple
2027 apple
2028 banana

Command:

uniq -f1 file.txt

-f1 skips first field
Comparison starts from second field

Result:

2026 apple
2028 banana

Useful for log processing and structured data

Skipping Characters

To ignore first characters during comparison:

uniq -s3 file.txt

-s3 skips first 3 characters
Useful for fixed-width text formats

Combining uniq with sort

This is the most common real-world usage pattern.

Example:

sort names.txt | uniq

Count unique values:

sort access.log | uniq -c

Find duplicate IP addresses:

sort ips.txt | uniq -d

sort prepares data for uniq
Essential workflow in Linux text processing

Using uniq with Pipes

The uniq command is commonly used in pipelines.

Example:

cat users.txt | sort | uniq

Count active shells:

cut -d ":" -f7 /etc/passwd | sort | uniq -c

Example output:

15 /bin/bash
3 /usr/sbin/nologin

Useful for reports and audits
Common in system administration

Combining Multiple Options

Example:

uniq -ci file.txt

Breakdown:

-c counts duplicates
-i ignores case

Another example:

uniq -du file.txt

This combination is invalid because:

-d shows duplicates only
-u shows unique lines only

These options conflict logically.

Common Administrative Examples

Count failed login attempts:

cut -d " " -f1 auth.log | sort | uniq -c

Find duplicate usernames:

sort users.txt | uniq -d

List unique shells:

cut -d ":" -f7 /etc/passwd | sort | uniq

Count most common IP addresses:

sort access.log | uniq -c | sort -nr

Practical Script Example (Step-by-Step Explanation)

Script

#!/bin/bash

LOG="/var/log/access.log"

echo "Top IP addresses:"

cut -d " " -f1 $LOG | sort | uniq -c | sort -nr | head -10

Step 1: Shebang

#!/bin/bash

Defines Bash interpreter
Ensures script executes correctly

Step 2: Defining log file

LOG="/var/log/access.log"

Stores log file location
Makes script easier to maintain

Example value:

/var/log/access.log

Step 3: Printing informational message

echo "Top IP addresses:"

Displays readable heading
Helps organize script output

Step 4: Extracting IP addresses

cut -d " " -f1 $LOG

Breakdown:

-d " " defines space delimiter
-f1 extracts first field
First field usually contains client IP address

Example log line:

192.168.1.15 GET /index.html

Extracted result:

192.168.1.15

Step 5: Sorting the data

sort

Groups identical IP addresses together
Required before using uniq

Example:

192.168.1.10
192.168.1.10
192.168.1.20

Step 6: Counting duplicates

uniq -c

Counts repeated lines
Displays frequency of each IP address

Example output:

15 192.168.1.10
8 192.168.1.20

Step 7: Sorting by highest count

sort -nr

Breakdown:

-n numeric sorting
-r reverse order

Displays most frequent IP addresses first.

Step 8: Limiting output

head -10

Displays first 10 lines only
Prevents overwhelming output
Shows top 10 IP addresses

What this script does

Step-by-step flow:

Reads access log
Extracts IP addresses
Sorts addresses
Counts occurrences
Sorts by highest frequency
Displays top 10 results

Why this matters in production

This workflow is heavily used for:

traffic analysis
security monitoring
detecting brute-force attacks
identifying abusive clients
web server analytics

The combination of:

cut
sort
uniq

forms one of the most important Linux text-processing pipelines.

Common Beginner Mistakes

Using uniq without sorting first:

uniq file.txt

This only removes consecutive duplicates.

Correct workflow:

sort file.txt | uniq

Another mistake:

Assuming uniq removes all duplicates automatically.

Incorrect expectation:

apple
banana
apple

Result remains unchanged without sorting.

Another mistake:

Using conflicting options:

uniq -du file.txt

This combination does not make logical sense.

Summary

In this guide, you learned:

how uniq removes duplicate lines
why sorting is important before using uniq
counting duplicates
displaying duplicate-only lines
displaying unique-only lines
case-insensitive comparison
skipping fields and characters
combining uniq with pipes
practical shell scripting with uniq

These skills are essential for:

Linux administration
shell scripting
log analysis
automation
text processing

Additional uniq parameters not covered in this guide include:

-w: Compare only specified number of characters
-z: Use null-terminated lines
--group: Display grouped duplicates
--all-repeated: Show all duplicate groups
--help: Display help information
--version: Display version information