The Linux Command Line by William E. Shotts Jr
Author:William E. Shotts Jr.
Language: eng
Format: mobi, epub
Tags: COMPUTERS / Operating Systems / Linux
ISBN: 9781593274269
Publisher: No Starch Press, Inc.
Published: 2012-01-13T05:00:00+00:00
POSIX Character Classes
The traditional character ranges are an easily understood and effective way to handle the problem of quickly specifying sets of characters. Unfortunately, they don’t always work. While we have not encountered any problems with our use of grep so far, we might run into problems using other programs.
Back in Chapter 4, we looked at how wildcards are used to perform pathname expansion. In that discussion, we said that character ranges could be used in a manner almost identical to the way they are used in regular expressions, but here’s the problem:
[me@linuxbox ˜]$ ls /usr/sbin/[ABCDEFGHIJKLMNOPQRSTUVWXYZ]* /usr/sbin/MAKEFLOPPIES /usr/sbin/NetworkManagerDispatcher /usr/sbin/NetworkManager
(Depending on the Linux distribution, we will get a different list of files, possibly an empty list. This example is from Ubuntu.) This command produces the expected result — a list of only the files whose names begin with an uppercase letter. But with this command we get an entirely different result (only a partial listing of the results is shown):
[me@linuxbox ˜]$ ls /usr/sbin/[A-Z]* /usr/sbin/biosdecode /usr/sbin/chat /usr/sbin/chgpasswd /usr/sbin/chpasswd /usr/sbin/chroot /usr/sbin/cleanup-info /usr/sbin/complain /usr/sbin/console-kit-daemon
Why is that? It’s a long story, but here’s the short version.
Back when Unix was first developed, it only knew about ASCII characters, and this feature reflects that fact. In ASCII, the first 32 characters (numbers 0–31) are control codes (things like tabs, backspaces, and carriage returns). The next 32 (32–63) contain printable characters, including most punctuation characters and the numerals zero through nine. The next 32 (numbers 64–95) contain the uppercase letters and a few more punctuation symbols. The final 31 (numbers 96–127) contain the lowercase letters and yet more punctuation symbols. Based on this arrangement, systems using ASCII used a collation order that looked like this:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
This differs from proper dictionary order, which is like this:
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
As the popularity of Unix spread beyond the United States, there grew a need to support characters not found in US English. The ASCII table was expanded to use a full 8 bits, adding character numbers 128–255, which accommodated many more languages. To support this ability, the POSIX standards introduced a concept called a locale, which could be adjusted to select the character set needed for a particular location. We can see the language setting of our system using this command:
[me@linuxbox ˜]$ echo $LANG en_US.UTF-8
With this setting, POSIX-compliant applications will use a dictionary collation order rather than ASCII order. This explains the behavior of the commands above. A character range of [A-Z], when interpreted in dictionary order, includes all of the alphabetic characters except the lowercase a — hence our results.
To partially work around this problem, the POSIX standard includes a number of character classes, which provide useful ranges of characters. They are described in Table 19-2.
Table 19-2. POSIX Character Classes
Character Class
Description
Download
The Linux Command Line by William E. Shotts Jr.epub
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7766)
Filmora Efficient Editing by Alexander Zacharias(5787)
The Infinite Retina by Robert Scoble Irena Cronin(5266)
Learn Wireshark - Fundamentals of Wireshark. by Lisa Bock(3979)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3935)
Edit Like a Pro with iMovie by Regit(3430)
Linux Administration Best Practices by Scott Alan Miller(2858)
Linux Command Line and Shell Scripting Techniques by Vedran Dakic & Jasmin Redzepagic(2836)
MCSA Windows Server 2016 Study Guide: Exam 70-740 by William Panek(2520)
Mastering PowerShell Scripting - Fourth Edition by Chris Dent(2403)
Docker on Windows by Stoneman Elton(2319)
Kali Linux - An Ethical Hacker's Cookbook: End-to-end penetration testing solutions by Sharma Himanshu(2315)
Creative Projects for Rust Programmers by Carlo Milanesi(2251)
Hands-On AWS Penetration Testing with Kali Linux by Karl Gilbert(2109)
Hands-On Linux for Architects by Denis Salamanca(2052)
Programming in C (4th Edition) (Developer's Library) by Stephen G. Kochan(2005)
Computers For Seniors For Dummies by Nancy C. Muir(2001)
The Old New Thing by Raymond Chen(1941)
Linux Kernel Debugging by Kaiwan N Billimoria(1762)
