Vous êtes sur la page 1sur 8

If you wish to include a double quote inside the string, that can be done by escaping it

with a backslash (\), for example, "This string contains \"double quotes\".". To insert a
literal backslash, one must double it, e.g. "A backslash looks like this: \\".

Backslashes may be used to enter control characters, etc., into a string:


Escape Meaning
\\ Literal backslash
\" Double quote
\' Single quote
\n Newline (line feed)

\r Carriage return
\b Backspace
\t Horizontal tab
\f Form feed
\a Alert (bell)
\v Vertical tab
\? Question mark (used to escape trigraphs)
\nnn Character with octal value nnn
\xhh Character with hexadecimal value hh

The use of other backslash escapes is not defined by the C standard, although compiler
vendors often provide additional escape codes as language extensions.
[edit] String literal concatenation

Adjacent string literals are concatenated at compile time; this allows long strings to be
split over multiple lines, and also allows string literals resulting from C preprocessor
defines and macros to be appended to strings at compile time:

printf(__FILE__ ": %d: Hello "


"world\n", __LINE__);

will expand to

printf("helloworld.c" ": %d: Hello "


"world\n", 10);

which is syntactically equivalent to

printf("helloworld.c: %d: Hello world\n", 10);

[edit] Character constants

Individual character constants are represented by single-quotes, e.g. 'A', and have type int
(in C++ char). The difference is that "A" represents a pointer to the first element of a
null-terminated array, whereas 'A' directly represents the code value (65 if ASCII is
used). The same backslash-escapes are supported as for strings, except that (of course) "
can validly be used as a character without being escaped, whereas ' must now be escaped.
A character constant cannot be empty (i.e. '' is invalid syntax), although a string may be
(it still has the null terminating character). Multi-character constants (e.g. 'xy') are valid,
although rarely useful they let one store several characters in an integer (e.g. 4 ASCII
characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the
characters are packed into one int is not specified, portable use of multi-character
constants is difficult.
[edit] Wide character strings

Since type char is usually 1 byte wide, a single char value typically can represent at most
255 distinct character codes, not nearly enough for all the characters in use worldwide.
To provide better support for international characters, the first C standard (C89)
introduced wide characters (encoded in type wchar_t) and wide character strings, which
are written as L"Hello world!"

Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as
UTF-16) or 4 bytes (usually UTF-32), but Standard C does not specify the width for
wchar_t, leaving the choice to the implementor. Microsoft Windows generally uses UTF-
16, thus the above string would be 26 bytes long for a Microsoft compiler; the Unix
world prefers UTF-32, thus compilers such as GCC would generate a 52-byte string. A 2-
byte wide wchar_t suffers the same limitation as char, in that certain characters (those
outside the BMP) cannot be represented in a single wchar_t; but must be represented
using surrogate pairs.

The original C standard specified only minimal functions for operating with wide
character strings; in 1995 the standard was modified to include much more extensive
support, comparable to that for char strings. The relevant functions are mostly named
after their char equivalents, with the addition of a "w" or the replacement of "str" with
"wcs"; they are specified in <wchar.h>, with <wctype.h> containing wide-character
classification and mapping functions.
[edit] Variable width strings

A common alternative to wchar_t is to use a variable-width encoding, whereby a logical


character may extend over multiple positions of the string. Variable-width strings may be
encoded into literals verbatim, at the risk of confusing the compiler, or using numerical
backslash escapes (e.g. "\xc3\xa9" for "" in UTF-8). The UTF-8 encoding was
specifically designed (under Plan 9) for compatibility with the standard library string
functions; supporting features of the encoding include a lack of embedded nulls, no valid
interpretations for subsequences, and trivial resynchronisation. Encodings lacking these
features are likely to prove incompatible with the standard library

functions; encoding-aware string functions are


often used in such case.
[edit] Library functions

Strings, both constant and variable, may be manipulated without using the standard
library. However, the library contains many useful functions for working with null-
terminated strings. It is the programmer's responsibility to ensure that enough storage has
been allocated to hold the resulting strings.

The most commonly used string functions are:

* strcat(dest, source) - appends the string source to the end of string dest
* strchr(s, c) - finds the first instance of character c in string s and returns a pointer to it
or a null pointer if c is not found
* strcmp(a, b) - compares strings a and b (lexicographical ordering); returns negative if
a is less than b, 0 if equal, positive if greater.
* strcpy(dest, source) - copies the string source onto the string dest
* strlen(st) - return the length of string st
* strncat(dest, source, n) - appends a maximum of n characters from the string source
to the end of string dest and null terminates the string at the end of input or at index n+1
when the max length is reached
* strncmp(a, b, n) - compares a maximum of n characters from strings a and b (lexical
ordering); returns negative if a is less than b, 0 if equal, positive if greater
* strrchr(s, c) - finds the last instance of character c in string s and returns a pointer to
it or a null pointer if c is not found

Other standard string functions include:

* strcoll(s1, s2) - compare two strings according to a locale-specific collating sequence


* strcspn(s1, s2) - returns the index of the first character in s1 that matches any
character in s2
* strerror(errno) - returns a string with an error message corresponding to the code in
errno
* strncpy(dest, source, n) - copies n characters from the string source onto the string
dest, substituting null bytes once past the end of source; does not null terminate if max
length is reached
* strpbrk(s1, s2) - returns a pointer to the first character in s1 that matches any
character in s2 or a null pointer if not found
* strspn(s1, s2) - returns the index of the first character in s1 that matches no character
in s2
* strstr(st, subst) - returns a pointer to the first occurrence of the string subst in st or a
null pointer if no such substring exists
* strtok(s1, s2) - returns a pointer to a token within s1 delimited by the characters in s2
* strxfrm(s1, s2, n) - transforms s2 onto s1, such that s1 used with strcmp gives the
same results as s2 used with strcoll

There is a similar set of functions for handling wide character strings.


[edit] Structures and unions
[edit] Structures

Structures in C are defined as data containers consisting of a sequence of named members


of various types. They are similar to records in other programming languages. The
members of a structure are stored in consecutive locations in memory, although the
compiler is allowed to insert padding between or after members (but not before the first
member) for efficiency. The size of a structure is equal to the sum of the sizes of its
members, plus the size of the padding.
[edit] Unions

Unions in C are related to structures and are defined as objects that may hold (at different
times) objects of different types and sizes. They are analogous to variant records in other
programming languages. Unlike structures, the components of a union all refer to the
same location in memory. In this way, a union can be used at various times to hold
different types of objects, without the need to create a separate object for each new type.
The size of a union is equal to the size of its largest component type.
[edit] Declaration

Structures are declared with the struct keyword and unions are declared with the union
keyword. The specifier keyword is followed by an optional identifier name, which is used
to identify the form of the structure or union. The identifier is followed by the declaration
of the structure or union's body: a list of member declarations, contained within curly
braces, with each declaration terminated by a semicolon. Finally, the declaration
concludes with an optional list of identifier names, which are declared as instances of the
structure or union.

For example, the following statement declares a structure named s that contains three
members; it will also declare an instance of the structure known as t:

struct s
{
int x;
float y;
char *z;
} t;

And the following statement will declare a similar union named u and an instance of it
named n:

union u
{
int x;
float y;
char *z;
} n;

Once a structure or union body has been declared and given a name, it can be considered
a new data type using the specifier struct or union, as appropriate, and the name. For
example, the following statement, given the above structure declaration, declares a new
instance of the structure s named r:

struct s r;

It is also common to use the typedef specifier to eliminate the need for the struct or union
keyword in later references to the structure. The first identifier after the body of the
structure is taken as the new name for the structure type. For example, the following
statement will declare a new type known as s_type that will contain some structure:
typedef struct {} s_type;

Future statements can then use the specifier s_type (instead of the expanded struct
specifier) to refer to the structure.
[edit] Accessing members

Vous aimerez peut-être aussi