Regardless of how well your code is written and running on the machine, there will be relatively long periods of CPU idle time where the CPU is just waiting on something to happen. Cache misses are a subset of the problem, waiting for I/O, user input, etc. can all lead to lengthy stalls in the CPU where the progress can still be made on the second set of registers. Also, there are several causes of cache misses that you can't plan for/around (an example is pushing new instructions on a branch since you executable probably doesn't all fit into Level 3 cache).
One of the main reasons that Silvermont went away from HT is the fact that at 22 nm, you have a lot of die (relatively) to play with. As a result, you can get away with more physical cores for increased parallelism.
ARM and AMD have not implemented hyper threading because it is Intel's proprietary technology.