OS: Linux Tools

Linux provides several tools to search inside files and across the filesystem. These commands help you quickly locate information without manually browsing directories or opening every file.

Searching in Linux (grep, find, locate)

Search Inside Files

grep looks for text patterns inside files and outputs matching lines. It’s one of the most used commands in Linux.

  • Search for a word in a file:
  • bash
    grep "error" logfile.txt

  • Search case-insensitively:
  • bash
    grep -i "error" logfile.txt

  • Search recursively in a directory:
  • bash
    grep -r "TODO" ~/projects

  • Show line numbers:
  • bash
    grep -n "main" program.c

  • Exclude matching lines (-v):
  • bash
    grep -v "debug" logfile.txt

    → Displays all lines that do not contain the word _debug_.

 grep is excellent for searching logs, source code, or configuration files.


 `find` – Search for Files by Attributes

find locates files and directories based on name, type, size, or modification time. Unlike locate, it searches the filesystem in real time.

  • Find a file by name:
  • bash
    find /home -name "notes.txt"

  • Find all .log files modified in the last 7 days:
  • bash
    find /var/log -name "*.log" -mtime -7

  • Find large files (>100MB):
  • bash
    find / -size +100M

 find is extremely flexible, though slower than locate.


 `locate` – Instant File Search (Database-Based)

locate searches a pre-built database of files, making it much faster than find.

  • Simple search:
  • bash
    locate passwd

  • Update the database (required if new files don’t show up):
  • bash
    sudo updatedb

Great for quick searches, but may not show files created after the last database update.


Text Processing (less, head, tail, wc, cut)

Linux provides lightweight but powerful tools to read, preview, and manipulate text directly from the command line.

 `less` – View Files Page by Page

  • Lets you scroll through large files without loading the whole file into memory.
  • Usage:
  • bash
    less logfile.txt

  • Controls:
  • Arrow keys → move up/down
  • /pattern → search for text
  • q → quit

 `head` – Show the Beginning of a File

  • Displays the first 10 lines by default.
  • Example:
  • bash
    head logfile.txt

  • Show first 20 lines:
  • bash
    head -n 20 logfile.txt


 `tail` – Show the End of a File

  • Displays the last 10 lines by default.
  • Example:
  • bash
    tail logfile.txt

  • Show last 50 lines:
  • bash
    tail -n 50 logfile.txt

  • Follow a file in real-time (useful for logs):
  • bash
    tail -f /var/log/syslog


 `wc` – Word, Line, and Character Count

  • Counts lines, words, and characters.
  • Example:
  • bash
    wc notes.txt

    Output format: lines words characters filename

  • Just count lines:
  • bash
    wc -l notes.txt


 `cut` – Extract Columns from Text

  • Useful for splitting text into fields (like CSV or log files).
  • Extract first 10 characters of each line:
  • bash
    cut -c 1-10 file.txt

  • Extract by delimiter (e.g., CSV with commas):
  • bash
    cut -d',' -f2 data.csv

    → Shows the 2nd column.


Text Processing Essentials: `sort`, `uniq`, `sed`, `awk`

Linux shines at working with text. These four tools are the backbone of log parsing, data cleanup, quick reports, and one-liners. They’re powerful alone—and even better together in pipelines.

 `sort` — Order lines

sort arranges input lines lexicographically by default. It can also sort numerically, by columns (fields), reverse, and more.

Common options

  • -n → numeric sort (e.g., 2 < 10 becomes 2,10 with lexicographic; -n fixes it)
  • -r → reverse order
  • -u → unique (suppress duplicates _after_ sorting)
  • -k M[,N] → sort by key/field range (uses whitespace by default)
  • -t 'SEP' → field separator (e.g., -t, for CSV)
  • -h → human numbers (e.g., 1K 2M 900), handy for du -h
  • -V → “version” sort (e.g., v1.9 < v1.10)

Examples

bash
# Sort lines alphabetically
sort file.txt

# Numeric sort descending (e.g., top 10)
sort -nr scores.txt | head

# Sort CSV by 3rd column numerically
sort -t, -k3,3n data.csv

# Human-readable sizes (e.g., from du -h)
du -h /var/log | sort -h

Gotchasort -u removes duplicates _only after sorting_; to deduplicate unsorted data while preserving the first occurrence, use awk '!seen[$0]++'.


 `uniq` — Collapse adjacent duplicates

uniq filters out adjacent duplicate lines. It pairs naturally with sort (which groups duplicates).

Common options

  • _(no flag)_ → remove adjacent duplicates
  • -c → prefix counts
  • -d → only duplicates
  • -u → only unique (non-repeated) lines
  • -i → case-insensitive

Examples

bash
# Count occurrences of each line (case-sensitive)
sort access.log | uniq -c | sort -nr

# Show only lines that appear exactly once
sort items.txt | uniq -u

# Case-insensitive unique
sort names.txt | uniq -ci

Gotcha: Without sort, duplicates that are not adjacent remain.


 `sed` — Stream editor (substitute, delete, insert)

sed edits text streams non-interactively. The most common task is substitution; it also deletes lines, prints ranges, and performs simple transforms.

Common patterns

  • s/OLD/NEW/ → substitute first match per line
  • s/OLD/NEW/g → substitute all matches per line
  • -i → edit file in place (use with care; consider -i.bak)
  • Addressing: N (line number), /regex/, ranges like 1,10/start/,/end/

Examples

bash
# Replace first occurrence of foo with bar per line
sed 's/foo/bar/' file.txt

# Replace all occurrences
sed 's/foo/bar/g' file.txt

# In-place rename .txt to .md inside links (make backup)
sed -i.bak 's/\.txt)/.md)/g' README.md

# Delete blank lines
sed '/^$/d' notes.txt

# Print lines 10..20
sed -n '10,20p' file.txt

# Change only in lines matching a pattern
sed '/ERROR/s/timeout/Timed Out/g' app.log

Gotchas

  • Delimiter can be changed (useful with slashes): sed 's|/var/log|/logs|g'
  • macOS sed -i requires a backup suffix (e.g., -i '' for none).

 `awk` — Pattern scanning & field processing

awk reads line by line, splits into fields (default: whitespace), and runs actions on matches. It’s great for column reports, filtering, and small computations.

Core syntax

awk 'pattern { action }' file

Special blocks: BEGIN { … } (before input), END { … } (after input)

Built-ins

  • Variables: $1, $2, … (fields), $0 (whole line), NF (num fields), NR (record/line number)
  • -F 'SEP' → custom field separator (CSV, TSV, etc.)
  • printf for controlled formatting

Examples

bash
# Print 1st and 3rd fields
awk '{print $1, $3}' data.txt

# Sum the 2nd column (numeric)
awk '{sum += $2} END {print sum}' numbers.txt

# CSV: 2nd and 5th fields
awk -F, '{print $2, $5}' data.csv

# Filter rows where 3rd field > 100 and print id + value
awk '$3 > 100 {print $1, $3}' table.txt

# Pretty table with header
awk 'BEGIN {printf "%-10s %-10s\n","Name","Score"} {printf "%-10s %-10s\n",$1,$2}' scores.txt

Gotchas

  • For true CSV (quotes/commas inside quotes), prefer csvtoolxsvmlr, or Python.
  • Use -v var=value to pass shell vars: awk -v th=100 '$3 > th {print $1,$3}' file.

Powerful Pipelines (combine them)

  • Top N frequent items
  • bash
    sort items.txt | uniq -c | sort -nr | head

  • Unique lines, original order preserved
  • bash
    awk '!seen[$0]++' file.txt

  • Extract ERROR timestamps (first 2 fields), count per minute
  • bash
    awk '/ERROR/ {print $1, $2}' app.log | sort | uniq -c | sort -nr

  • CSV: average of column 3
  • bash
    awk -F, '{sum+=$3; n++} END {if  print sum/n}' data.csv

  • Normalize whitespace, lowercase, then count
  • bash
    tr -s '[:space:]' ' ' < text.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr

  • Batch rename in text with backup
  • bash
    sed -i.bak 's/\.jpeg/\.jpg/g' gallery.md


  • sort
  • Alphabetic: sort
  • Numeric: sort -n
  • By field: sort -t, -k2,2n
  • Human sizes: sort -h
  • uniq
  • Count: uniq -c
  • Only dups: uniq -d
  • Only uniques: uniq -u
  • sed
  • Replace all: sed 's/old/new/g'
  • Delete lines: sed '/regex/d'
  • In place: sed -i.bak 's/a/b/' file
  • awk
  • Fields: awk '{print $1,$3}'
  • Filter + sum: awk '$2>100 {s+=$2} END{print s}'
  • CSV: awk -F, '{print $2}'

Pipelines & Redirection (>, >>, |, 2>)

In Linux, the shell provides ways to redirect input/output and chain commands together. This is what makes the command line so powerful.

Output Redirection

  • > → Redirect output to a file (overwrite).
  • bash
    ls > files.txt

    → Saves the output of ls into files.txt, replacing existing content.

  • >> → Append output to a file.
  • bash
    echo "New entry" >> log.txt

    → Adds text at the end of log.txt without deleting old content.


Input Redirection

  • < → Take input from a file instead of keyboard.
  • bash
    sort < names.txt

→ Sorts the contents of names.txt.


Pipelines

  • | → Send the output of one command into another command.
  • bash
    ls -l | grep ".txt"

→ Lists files and filters only .txt files.

  • Example: Count the number of lines containing "error" in a log:
  • bash
    grep "error" logfile.txt | wc -l


Error Redirection

  • 2> → Redirect errors to a file.
  • bash
    ls /root 2> errors.txt

→ Saves permission-denied errors into errors.txt.

  • 2>&1 → Redirect errors to the same place as normal output.
  • bash
    command > output.txt 2>&1