Searching in Linux (grep, find, locate)
Search Inside Files
grep looks for text patterns inside files and outputs matching lines. It’s one of the most used commands in Linux.
- Search for a word in a file:
grep "error" logfile.txt
- Search case-insensitively:
grep -i "error" logfile.txt
- Search recursively in a directory:
grep -r "TODO" ~/projects
- Show line numbers:
grep -n "main" program.c
- Exclude matching lines (
-v):
grep -v "debug" logfile.txt
→ Displays all lines that do not contain the word _debug_.
grep is excellent for searching logs, source code, or configuration files.
`find` – Search for Files by Attributes
find locates files and directories based on name, type, size, or modification time. Unlike locate, it searches the filesystem in real time.
- Find a file by name:
find /home -name "notes.txt"
- Find all
.logfiles modified in the last 7 days:
find /var/log -name "*.log" -mtime -7
- Find large files (>100MB):
find / -size +100M
find is extremely flexible, though slower than locate.
`locate` – Instant File Search (Database-Based)
locate searches a pre-built database of files, making it much faster than find.
- Simple search:
locate passwd
- Update the database (required if new files don’t show up):
sudo updatedb
Great for quick searches, but may not show files created after the last database update.
Text Processing (less, head, tail, wc, cut)
Linux provides lightweight but powerful tools to read, preview, and manipulate text directly from the command line.
`less` – View Files Page by Page
- Lets you scroll through large files without loading the whole file into memory.
- Usage:
less logfile.txt
- Controls:
- Arrow keys → move up/down
/pattern→ search for textq→ quit
`head` – Show the Beginning of a File
- Displays the first 10 lines by default.
- Example:
head logfile.txt
- Show first 20 lines:
head -n 20 logfile.txt
`tail` – Show the End of a File
- Displays the last 10 lines by default.
- Example:
tail logfile.txt
- Show last 50 lines:
tail -n 50 logfile.txt
- Follow a file in real-time (useful for logs):
tail -f /var/log/syslog
`wc` – Word, Line, and Character Count
- Counts lines, words, and characters.
- Example:
wc notes.txt
Output format: lines words characters filename
- Just count lines:
wc -l notes.txt
`cut` – Extract Columns from Text
- Useful for splitting text into fields (like CSV or log files).
- Extract first 10 characters of each line:
cut -c 1-10 file.txt
- Extract by delimiter (e.g., CSV with commas):
cut -d',' -f2 data.csv
→ Shows the 2nd column.
Text Processing Essentials: `sort`, `uniq`, `sed`, `awk`
Linux shines at working with text. These four tools are the backbone of log parsing, data cleanup, quick reports, and one-liners. They’re powerful alone—and even better together in pipelines.
`sort` — Order lines
sort arranges input lines lexicographically by default. It can also sort numerically, by columns (fields), reverse, and more.
Common options
-n→ numeric sort (e.g.,2 < 10becomes2,10with lexicographic;-nfixes it)-r→ reverse order-u→ unique (suppress duplicates _after_ sorting)-k M[,N]→ sort by key/field range (uses whitespace by default)-t 'SEP'→ field separator (e.g.,-t,for CSV)-h→ human numbers (e.g.,1K 2M 900), handy fordu -h-V→ “version” sort (e.g.,v1.9 < v1.10)
Examples
# Sort lines alphabetically
sort file.txt
# Numeric sort descending (e.g., top 10)
sort -nr scores.txt | head
# Sort CSV by 3rd column numerically
sort -t, -k3,3n data.csv
# Human-readable sizes (e.g., from du -h)
du -h /var/log | sort -h
Gotcha: sort -u removes duplicates _only after sorting_; to deduplicate unsorted data while preserving the first occurrence, use awk '!seen[$0]++'.
`uniq` — Collapse adjacent duplicates
uniq filters out adjacent duplicate lines. It pairs naturally with sort (which groups duplicates).
Common options
- _(no flag)_ → remove adjacent duplicates
-c→ prefix counts-d→ only duplicates-u→ only unique (non-repeated) lines-i→ case-insensitive
Examples
# Count occurrences of each line (case-sensitive)
sort access.log | uniq -c | sort -nr
# Show only lines that appear exactly once
sort items.txt | uniq -u
# Case-insensitive unique
sort names.txt | uniq -ci
Gotcha: Without sort, duplicates that are not adjacent remain.
`sed` — Stream editor (substitute, delete, insert)
sed edits text streams non-interactively. The most common task is substitution; it also deletes lines, prints ranges, and performs simple transforms.
Common patterns
s/OLD/NEW/→ substitute first match per lines/OLD/NEW/g→ substitute all matches per line-i→ edit file in place (use with care; consider-i.bak)- Addressing:
N(line number),/regex/, ranges like1,10,/start/,/end/
Examples
# Replace first occurrence of foo with bar per line
sed 's/foo/bar/' file.txt
# Replace all occurrences
sed 's/foo/bar/g' file.txt
# In-place rename .txt to .md inside links (make backup)
sed -i.bak 's/\.txt)/.md)/g' README.md
# Delete blank lines
sed '/^$/d' notes.txt
# Print lines 10..20
sed -n '10,20p' file.txt
# Change only in lines matching a pattern
sed '/ERROR/s/timeout/Timed Out/g' app.log
Gotchas
- Delimiter can be changed (useful with slashes):
sed 's|/var/log|/logs|g' - macOS
sed -irequires a backup suffix (e.g.,-i ''for none).
`awk` — Pattern scanning & field processing
awk reads line by line, splits into fields (default: whitespace), and runs actions on matches. It’s great for column reports, filtering, and small computations.
Core syntax
awk 'pattern { action }' file
Special blocks: BEGIN { … } (before input), END { … } (after input)
Built-ins
- Variables:
$1, $2, …(fields),$0(whole line),NF(num fields),NR(record/line number) -F 'SEP'→ custom field separator (CSV, TSV, etc.)printffor controlled formatting
Examples
# Print 1st and 3rd fields
awk '{print $1, $3}' data.txt
# Sum the 2nd column (numeric)
awk '{sum += $2} END {print sum}' numbers.txt
# CSV: 2nd and 5th fields
awk -F, '{print $2, $5}' data.csv
# Filter rows where 3rd field > 100 and print id + value
awk '$3 > 100 {print $1, $3}' table.txt
# Pretty table with header
awk 'BEGIN {printf "%-10s %-10s\n","Name","Score"} {printf "%-10s %-10s\n",$1,$2}' scores.txt
Gotchas
- For true CSV (quotes/commas inside quotes), prefer
csvtool,xsv,mlr, or Python. - Use
-v var=valueto pass shell vars:awk -v th=100 '$3 > th {print $1,$3}' file.
Powerful Pipelines (combine them)
- Top N frequent items
sort items.txt | uniq -c | sort -nr | head
- Unique lines, original order preserved
awk '!seen[$0]++' file.txt
- Extract ERROR timestamps (first 2 fields), count per minute
awk '/ERROR/ {print $1, $2}' app.log | sort | uniq -c | sort -nr
- CSV: average of column 3
awk -F, '{sum+=$3; n++} END {if print sum/n}' data.csv
- Normalize whitespace, lowercase, then count
tr -s '[:space:]' ' ' < text.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr
- Batch rename in text with backup
sed -i.bak 's/\.jpeg/\.jpg/g' gallery.md
- sort
- Alphabetic:
sort - Numeric:
sort -n - By field:
sort -t, -k2,2n - Human sizes:
sort -h
- uniq
- Count:
uniq -c - Only dups:
uniq -d - Only uniques:
uniq -u
- sed
- Replace all:
sed 's/old/new/g' - Delete lines:
sed '/regex/d' - In place:
sed -i.bak 's/a/b/' file
- awk
- Fields:
awk '{print $1,$3}' - Filter + sum:
awk '$2>100 {s+=$2} END{print s}' - CSV:
awk -F, '{print $2}'
Pipelines & Redirection (>, >>, |, 2>)
In Linux, the shell provides ways to redirect input/output and chain commands together. This is what makes the command line so powerful.
Output Redirection
>→ Redirect output to a file (overwrite).
ls > files.txt
→ Saves the output of ls into files.txt, replacing existing content.
>>→ Append output to a file.
echo "New entry" >> log.txt
→ Adds text at the end of log.txt without deleting old content.
Input Redirection
<→ Take input from a file instead of keyboard.
sort < names.txt
→ Sorts the contents of names.txt.
Pipelines
|→ Send the output of one command into another command.
ls -l | grep ".txt"
→ Lists files and filters only .txt files.
- Example: Count the number of lines containing "error" in a log:
grep "error" logfile.txt | wc -l
Error Redirection
2>→ Redirect errors to a file.
ls /root 2> errors.txt
→ Saves permission-denied errors into errors.txt.
2>&1→ Redirect errors to the same place as normal output.
command > output.txt 2>&1