Some comments on Shannon48
When H(M)<|M| that means there is redundancy
In human languages, there is redundancy
- If truly random numbers are used:
- 26 characters: H(R)=|R|=log2(26)=4.7 bits per symbol
- Real english has redundancy:
- statistics show H(E)=2.9 bits per symbol
- Slightly different for other human languages!
- Different for computer languages, animal communication!!!
- Interesting thing to do:
- Write a content analysis program and run it on a lot of different kinds of files - what do you see about them?