In simple terms, UTF-8 is an 8 bit encoding of 16 bit Unicode to achieve an international character representation.
In more technical terms, in UTF-8, characters are encoded using sequences of 1 to 6 octets. The only octet of a sequence of one has the higher-order bit set to 0, the remaining 7 bits are used to encode the character value. UTF-8 uses all bits of an octet, but has the quality of preserving the full US-ASCII range. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching work as expected.