Thursday, December 22, 2011


csdn的邮箱丑事今天闹得沸沸扬扬了,更是有网友专门统计了密码的分布情况,我也无聊理工一把,统计下注册邮箱吧 - -

数据 6428632行,包含用户名、密码、注册邮箱

top 100的注册邮箱后缀如下:


  1. qq和网易邮箱遥遥领先,这个正符合国内邮箱的现状

  2. gmail.com不是很给力我觉得是晚出生,大家已经用其他的邮箱注册过了,希望这个能够成长

  3. sina和sohu的不给力显然是,在后来体验不行,刚开始还有人用,后来日渐少了


  5. hotmail.com显然是抱了msn的大腿

  6. 值得注意的是.seu.cn结尾的那些上榜的学校,有上交、西交、浙大等、也是无聊理工居多哈哈,符合程序员分布

  7. 还有,看得出huawei的人经常逛csdn啊,内部的wiki不知道建设的如何?    1909280    1740884    796352    348518    203274    200273    185130    103293    86101    71085    52795    49928
QQ.COM    38002    34868    28218
163.COM    25965    25653    24576    18967    18656    18523    18252    16209    15043    14023    13239
126.COM    11520    10718    8492    7120    6548    5333    4817    4211    3913    3857    2925    2779
SINA.COM    2617    2550    2544    2491
qq.COM    2407    2154    2141    2034    1910    1872    1833    1788    1781
YAHOO.COM.CN    1718    1662    1631    1554    1491    1475
HOTMAIL.COM    1472    1362    1321    1311    1276    1274
SOHU.COM    1245    1170    1153
TOM.COM    1054    1047    1038    1032    1021    996    991
139.COM    983    980    962    954    928    920    919    871    869    853    835    826    817    801    795    793    758    750    739    719    717
21CN.COM    693
GMAIL.COM    690    690    655    637    615

Friday, December 9, 2011



The javascript code on this page attempts to guess if you have recently visited a website by loading an image from the target website. If the loading completes fast (less than 10ms), it is highly likely that it was loaded from browser's local cache as the network latency and speed of most Internet connections cannot deliver sub 10ms speed. If it takes longer, it's not in the cache. To avoid polluting the cache, the loading is interrupted at the 10ms mark. This is important because any subsequent tests will yield the same results.

根据你load的这些站点的image的速度来判断是否从本地cache load的,假如很久,就不是从cache来的。

Although the idea is simple, two factors affect the speed and accuracy of the test: the speed at which the browser loads content from its cache, and the realtimeness of the OS and the browser event system to allow interrupting the request at a precise time.



Wednesday, November 30, 2011

programming languages and their statements

最近看了《黑客与画家》,其中有一点感受就是语言的能力是有差异的,姑且不去研究究竟谁是老大,了解各种语言的优劣总是没有害处的,正巧在hack news上看到了这篇文章programming languages and their statements,关于各个语言以及对这些语言的陈述,同时还可以对各种语言进行对比。按照字母序罗列了50种语言,如下:

programming languages

  1. Ada

  2. Agda

  3. APL

  4. Assembler

  5. AWK

  6. C

  7. C#

  8. C++

  9. Clojure

  10. Cobol

  11. Common Lisp

  12. Coq

  13. D

  14. Delphi

  15. Eiffel

  16. ELisp

  17. Erlang

  18. F#

  19. Factor

  20. Forth

  21. Fortran

  22. Go

  23. Groovy

  24. Haskell

  25. Haxe

  26. Io

  27. J

  28. Java

  29. Javascript

  30. Lua

  31. Mathematica

  32. Matlab

  33. Mozart-Oz

  34. Objective C

  35. O'Caml

  36. Pascal

  37. Perl

  38. PHP

  39. Prolog

  40. Python

  41. R

  42. REBOL

  43. Ruby

  44. Scala

  45. Scheme

  46. Shell

  47. Smalltalk

  48. Standard ML

  49. TCL

  50. Visual Basic 


  1. 说明在这系列活动中有多少人响应了该语言

  2. 罗列该语言在哪些statement的rank较高或者较低,各10条,从这些statements中我们很容易了解该语言适合做什么以及不适合做什么

  3. 然后列出和这种语言最像以及最不像的5种语言,方面比较

  4. 提供和某种语言比较的功能,选择某种语言既可以比较

  5. 关于这个语言的所有陈述(排名越高说明赞成的人越多),从这个可以进一步了解该语言,当然更好的了解语言的方法应该是去使用它,但是可能不是每个人都能够这么做



Based on 87822 responses from 7295 people, we've built up the following picture of Python

All statements

  1. I would use this language for casual scripting

  2. This language would be good for teaching children to write software

  3. This language is good for beginners

  4. Code written in this language is very readable

  5. I find this language easy to prototype in

  6. I would use this language as a scripting language embedded inside a larger application

  7. It is easy to tell at a glance what code in this language does

  8. This language excels at text processing

  9. I would use this language to write a command-line app

  10. This language is well suited for an agile development approach using short iterations.

  11. I would use this language for a web project

  12. This language has a good community

  13. This language has a good library distribution mechanism.

  14. This language encourages writing code that is easy to maintain.

  15. Libraries in this language tend to be well documented.

  16. This language has a wide variety of agreed-upon conventions, which are generally adhered to reasonably well, and which increase my productivity

  17. This language is best for very small projects

  18. I would use this language for a desktop GUI project

  19. The resources for learning this language are of high quality

  20. This is a high level language

  21. This language is expressive

  22. I find code written in this language very elegant

  23. This language is good for scientific computing

  24. Third-party libraries are readily available, well-documented, and of high quality

  25. I often write things in this language with the intent of rewriting them in something else later

  26. This language encourages writing reusable code.

  27. I use this language out of choice

  28. I usually use this language on solo projects

  29. I enjoy using this language

  30. This language has well-organized libraries with consistent, carefully thought-out interfaces

  31. I can imagine this will be a popular language in twenty years time

  32. This language is very flexible

  33. I can imagine using this language in my day job

  34. I rarely have difficulty abstracting patterns I find in my code

  35. This language is well documented

  36. I regularly use this language

  37. There are many good open-source tools for this language

  38. I would use this language for writing server programs

  39. There is a wide variety of open source code written in this language

  40. I would like to write more of this language than I currently do

  41. It is easy to debug programs written in this language when it goes wrong

  42. This language excels at symbolic manipulation

  43. This language has unusual features that I often miss when using other languages

  44. Programs written in this language tend to play well with others

  45. When I run into problems my colleagues can provide me with immediate help with this language

  46. Code written in this language tends to be terse

  47. I still discover new features of this language on a fairly regular basis

  48. This language has a very coherent design

  49. Code written in this language will usually run in all the major implementations if it runs in one of them.

  50. I would list this language on my resume

  51. I would recommend most programmers learn this language, regardless of whether they have a specific need for it

  52. This language is good for numeric computing

  53. I usually use this language on projects with many other members

  54. This language is best for very large projects

  55. This language has a very rigid idea of how things should be done

  56. This language has a very dogmatic community

  57. This language is likely to have a strong influence on future languages

  58. When I write code in this language I can be very sure it is correct

  59. Learning this language significantly changed how I use other languages.

  60. I know many other people who use this language

  61. I know this language well

  62. Code written in this language tends to be very reliable

  63. There are many good tools for this language

  64. Learning this language improved my ability as a programmer

  65. This language is easier to use for it's problem domain by removing unneeded expressiveness (such as not being Turing complete).

  66. This language is good for distributed computing

  67. I would use this language for mobile applications

  68. This language has a high quality implementation

  69. This language is likely to be a passing fad

  70. I use many applications written in this language

  71. If this language didn't exist, I would have trouble finding a satisfactory replacement

  72. This language is large

  73. This language matches it's problem domain particularly well.

  74. It's unusual for me to discover unfamiliar features

  75. This is a mainstream language

  76. This language has many features which feel "tacked on"

  77. I find it easy to write efficient code in this language

  78. This language is built on a small core of orthogonal features

  79. This language is likely to be around for a very long time

  80. The semantics of this language are much different than other languages I know.

  81. This language excels at concurrency

  82. This language is minimal

  83. There are many good commercial tools for this language

  84. I use a lot of code written in this language which I really don't want to have to make changes to

  85. I would use this language for writing embedded programs

  86. This language is frequently used for applications it isn't suitable for

  87. If my code in this language successfully compiles, there is a good chance my code is correct.

  88. I would use this language for writing programs for an embedded hardware platform

  89. This language allows me to write programs where I know exactly what they are doing under the hood

  90. I enjoy playing with this language but would never use it for "real code"

  91. This language has a niche in which it is great

  92. Programs written in this language will usually work in future versions of the language

  93. It is too easy to write code in this language that looks like it does one thing but actually does something else

  94. I am sometimes embarrassed to admit to my peers that I know this language

  95. Code written in this language tends to be verbose

  96. Programs written in this language tend to be efficient

  97. This language is suitable for real-time applications

  98. I am reluctant to admit to knowing this language

  99. I learned this language early in my career as a programmer

  100. The thought that I may still be using this language in twenty years time fills me with dread

  101. This language has a strong static type system

  102. This is a low level language

  103. Developers who primarily use this language often burn out after a few years

  104. This language has a niche outside of which I would not use it

  105. I often feel like I am not smart enough to write this language

  106. This language makes it easy to shoot yourself in the foot

  107. This language has an annoying syntax

  108. There is a lot of accidental complexity when writing code in this language

  109. I often get angry when writing code in this language

  110. Writing code in this language is a lot of work

  111. This language is unusually bad for beginners

Saturday, November 26, 2011

Could Not Find HelloAndroid.apk! win7


Could Not Find HelloAndroid.apk!

把我的市区设置成美帝的市区时区,然后run helloworld 就ok了,随后既可以将市区时区更改回来:)


Thursday, November 3, 2011

bash 条件表达式 'test' '[' '[['

一直以来对test、[、和[[没有深研究,今天代码被挑刺了,就在这个几点上,因此决定好好在研究一下,翻了几个blog,同时又看了Bash Reference,本文的文字主要摘自Bash Reference,外加自己的一点解释、以及例子等。

首先看到6.4 Bash Conditional Expressions

Bash Conditional Expressions

开头说到条件表达式通常用于 [[ compound command and the test and [builtin commands.

Expressions may be unary or binary. Unary expressions are often used to examine the status of a file. There are string operators and numeric comparison operators as well(用于检查文件的status,以及字符串操作和算术比较). If the fileargument to one of the primaries is of the form/dev/fd/N, then file descriptor N is checked. If the fileargument to one of the primaries is one of/dev/stdin,/dev/stdout, or/dev/stderr, file descriptor 0, 1, or 2, respectively, is checked.

When used with ‘[[’, The ‘<’ and ‘>’ operators sort lexicographically using the current locale(此时基于字典序).

Unless otherwise specified, primaries that operate on files follow symbolic links and operate on the target of the link, rather than the link itself.

-a file
True if file exists.
-b file
True if file exists and is a block special file.
-c file
True if file exists and is a character special file.
-d file
True if file exists and is a directory.
-e file
True if file exists.
-f file
True if file exists and is a regular file.
-g file
True if file exists and its set-group-id bit is set.
-h file
True if file exists and is a symbolic link.
-k file
True if file exists and its "sticky" bit is set.
-p file
True if file exists and is a named pipe (FIFO).
-r file
True if file exists and is readable.
-s file
True if file exists and has a size greater than zero.
-t fd
True if file descriptor fd is open and refers to a terminal.
-u file
True if file exists and its set-user-id bit is set.
-w file
True if file exists and is writable.
-x file
True if file exists and is executable.
-O file
True if file exists and is owned by the effective user id.
-G file
True if file exists and is owned by the effective group id.
-L file
True if file exists and is a symbolic link.
-S file
True if file exists and is a socket.
-N file
True if file exists and has been modified since it was last read.
file1 -nt file2
True if file1 is newer (according to modification date) than file2, or if file1 exists and file2 does not.
file1 -ot file2
True if file1 is older than file2, or if file2 exists and file1 does not.
file1 -ef file2
True if file1 and file2 refer to the same device and inode numbers.
-o optname
True if shell option optnameis enabled. The list of options appears in the description of the-ooption to the set builtin (see The Set Builtin).
-z string
True if the length of string is zero.
-n string
True if the length of string is non-zero.
string1 == string2
string1 = string2
True if the strings are equal.=’ should be used with the test command for posix conformance.
string1 != string2
True if the strings are not equal.
string1 < string2
True if string1 sorts before string2 lexicographically.
string1 > string2
True if string1 sorts after string2 lexicographically.
arg1 OP arg2
OPis one of ‘-eq’, ‘-ne’, ‘-lt’, ‘-le’, ‘-gt’, or ‘-ge
’. These arithmetic binary operators return true if arg1 is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to arg2, respectively. Arg1 and arg2 may be positive or negative integers(必须是整数).
下面就依次来看看'[['和'test' '['


位于3.2.4.2 Conditional Constructs


          [[ expression ]]

Return a status of 0 or 1 depending on the evaluation of the conditional expression expression. Expressions are composed of the primaries described below in Bash Conditional Expressions. Word splitting and filename expansion are not performed on the words between the ‘[[’ and ‘]]’; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed. Conditional operators such as ‘-f’ must be unquoted to be recognized as primaries.

When used with ‘[[’, The ‘<’ and ‘>’ operators sort lexicographically using the current locale.

When the ‘==’ and ‘!=’ operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below in Pattern Matching. If the shell option nocasematch (see the description of shopt in The Shopt Builtin) is enabled, the match is performed without regard to the case of alphabetic characters. The return value is 0 if the string matches (‘==’) or does not match (‘!=’)the pattern, and 1 otherwise. Any part of the pattern may be quoted to force it to be matched as a string.

An additional binary operator, ‘=~’, is available, with the same precedence as ‘==’ and ‘!=’. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex3)). The return value is 0 if the string matches the pattern, and 1 otherwise. If the regular expression is syntactically incorrect, the conditional expression's return value is 2. If the shell option nocasematch (see the description of shopt in The Shopt Builtin) is enabled, the match is performed without regard to the case of alphabetic characters. Any part of the pattern may be quoted to force it to be matched as a string. Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.

Expressions may be combined using the following operators, listed in decreasing order of precedence:

( expression )
Returns the value of expression. This may be used to override the normal precedence of operators.
! expression
True if expression is false.
expression1 && expression2
True if both expression1 and expression2 are true.
expression1 || expression2
True if either expression1 or expression2 is true.
The && and || operators do not evaluate expression2 if the value of expression1 is sufficient to determine the return value of the entire conditional expression.

再来看 'test'和'['

位于4.1 Bourne Shell Builtins

test和[是等价的,不过用的时候形式不一样如if  test expression,和if [ expression ]

Evaluate a conditional expression expr. Each operator and operand must be a separate argument. Expressions are composed of the primaries described below in Bash Conditional Expressions. testdoes not accept any options, nor does it accept and ignore an argument of--as signifying the end of options.When the [ form is used, the last argument to the command must be a ].Expressions may be combined using the following operators, listed in decreasing order of precedence. The evaluation depends on the number of arguments; see below.
! expr
True if expr is false.
( expr )
Returns the value of expr. This may be used to override the normal precedence of operators.
expr1 -a expr2
True if both expr1 and expr2 are true.
expr1 -o expr2
True if either expr1 or expr2 is true.
The test and [ builtins evaluate conditional expressions using a set of rules based on the number of arguments.这里把test其实是一个函数,根据不同的参数个数来判断出结果

0 arguments
The expression is false.
1 argument
The expression is true if and only if the argument is not null.
2 arguments
If the first argument is ‘!’, the expression is true if and only if the second argument is null. If the first argument is one of the unary conditional operators (see Bash Conditional Expressions), the expression is true if the unary test is true. If the first argument is not a valid unary operator, the expression is false.
3 arguments
If the second argument is one of the binary conditional operators (see Bash Conditional Expressions), the result of the expression is the result of the binary test using the first and third arguments as operands. The ‘-a’ and ‘-o’ operators are considered binary operators when there are three arguments. If the first argument is ‘!’, the value is the negation of the two-argument test using the second and third arguments. If the first argument is exactly ‘(’ and the third argument is exactly ‘)’, the result is the one-argument test of the second argument. Otherwise, the expression is false.
4 arguments
If the first argument is ‘!’, the result is the negation of the three-argument expression composed of the remaining arguments. Otherwise, the expression is parsed and evaluated according to precedence using the rules listed above.
5 or more arguments
The expression is parsed and evaluated according to precedence using the rules listed above.

[ ]和[[ ]]的区别



==和!=的右操作符为pattern进行3.5.8.1 Pattern Matching


$ [[ "GFW" == G*W ]]
$ echo $?
$ [[ "GFW" == "G*W" ]]
$ echo $?


$ [[ "GFW" =~ "G*W" ]]
$ echo $?
$ [[ "GFW" =~ 'G*W' ]]
$ echo $?
$ [[ "GFW" =~ G*W ]]
$ echo $?



$ [ "GFW" == $fuck  ]
bash: [: GFW: unary operator expected


$ [ "GFW" == "$fuck" ]
$ echo $?


$ [[ "GFW" == $fuck ]]
$ echo $?
$ [[ "GFW" == "$fuck" ]]
$ echo $?

另外还有个(( ))


          (( expression ))

The arithmetic expression is evaluated according to the rules described below (see Shell Arithmetic). If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1. This is exactly equivalent to
          let "expression"

See Bash Builtins, for a full description of the let builtin.

Shell Arithmetic如下

The shell allows arithmetic expressions to be evaluated, as one of the shell expansions or by the letand the

-ioption to the declarebuiltins.

Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language. The following list of operators is grouped into levels of equal-precedence operators. The levels are listed in order of decreasing precedence.

id++ id--
variable post-increment and post-decrement
++id --id
variable pre-increment and pre-decrement
- +
unary minus and plus
! ~
logical and bitwise negation
* / %
multiplication, division, remainder
+ -
addition, subtraction
<< >>
left and right bitwise shifts
<= >= < >
== !=
equality and inequality
bitwise AND
bitwise exclusive OR
bitwise OR
logical AND
logical OR
expr ? expr : expr
conditional operator
= *= /= %= += -= <<= >>= &= ^= |=
expr1 , expr2
Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax. A shell variable that is null or unset evaluates to 0 when referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression when it is referenced, or when a variable which has been given the integerattribute using ‘declare -i’ is assigned a value. A null value evaluates to 0. A shell variable need not have its integer attribute turned on to be used in an expression.

Constants with a leading 0 are interpreted as octal numbers. A leading ‘0x’ or ‘0X’ denotes hexadecimal. Otherwise, numbers take the form [base#]n, where base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base. If base#is omitted, then base 10 is used. The digits greater than 9 are represented by the lowercase letters, the uppercase letters, ‘@’, and ‘_’, in that order. If base is less than or equal to 36, lowercase and uppercase letters may be used interchangeably to represent numbers between 10 and 35.

Operators are evaluated in order of precedence. Sub-expressions in parentheses are evaluated first and may override the precedence rules above.

Friday, October 14, 2011

G12 HTC Desire S so on机器获取root权限

之前如的g12,港版原封,so on,一直没有获取root权限,虽然系统的应用删除不掉,但是还不至于影响我的使用,但是最近电子市场被GFW fuck这件事情让我对获取root权限更加重视起来。

咱不是玩手机的,所以只能Google之了,找的很辛苦,因为大多数都是so off的机器获取root的办法,还有些人说so on的机器便so off只能找奸商,我靠,奸商不也是自己破解,所以还是自己动手来搞。


首先是由so on变成so off,step by step即可,最后可以check自己的机器成了so off

然后是so off获取root HTC G12台*湾最新ROM版本1.47.950.3以及ROOT最优方法,后者我只是选择了部分步骤。即小白级、低级人员操作方法中的2、3、4,因为我不想刷机,这几步step by step就ok了

Saturday, October 8, 2011

Steve Jobs’ death

乔布斯逝去,tim cook、larry page、gat、obama都表达了自己的看法


Tim Cook: "No words can adequately express our sadness at Steve's death"

Tim Cook, who took over for Steve Jobs as Apple CEO in August of 2011, has sent a new e-mail to employees in the wake of Steve Jobs' recent death. In the e-mail seen by Ars Technica, Cook addresses Apple's employees to announce that Jobs has passed and offers reflections on Jobs' influence on the company. Below is the full text of the e-mail:

I have some very sad news to share with all of you. Steve passed away earlier today.

Apple has lost a visionary and creative genius, and the world has lost an amazing human being. Those of us who have been fortunate enough to know and work with Steve have lost a dear friend and an inspiring mentor. Steve leaves behind a company that only he could have built, and his spirit will forever be the foundation of Apple.

We are planning a celebration of Steve’s extraordinary life for Apple employees that will take place soon. If you would like to share your thoughts, memories and condolences in the interim, you can simply email

No words can adequately express our sadness at Steve’s death or our gratitude for the opportunity to work with him. We will honor his memory by dedicating ourselves to continuing the work he loved so much.


The e-mail address given in Cook's message is the same as the one Apple has publicly posted on its website. If you have further stories and memories to share, we'd love to hear them in the comments.


I am very, very sad to hear the news about Steve. He was a great man with incredible achievements and amazing brilliance. He always seemed to be able to say in very few words what you actually should have been thinking before you thought it. His focus on the user experience above all else has always been an inspiration to me. He was very kind to reach out to me as I became CEO of Google and spend time offering his advice and knowledge even though he was not at all well. My thoughts and Google's are with his family and the whole Apple family.

Sergey Brin  -  2011-10-6  -  公开

From the earliest days of Google, whenever Larry and I sought inspiration for vision and leadership, we needed to look no farther than Cupertino. Steve, your passion for excellence is felt by anyone who has ever touched an Apple product (including the macbook I am writing this on right now). And I have witnessed it in person the few times we have met.

On behalf of all of us at Google and more broadly in technology, you will be missed very much. My condolences to family, friends, and colleagues at Apple.


I’m truly saddened to learn of Steve Jobs’ death. Melinda and I extend our sincere condolences to his family and friends, and to everyone Steve has touched through his work.

Steve and I first met nearly 30 years ago, and have been colleagues, competitors and friends over the course of more than half our lives.

The world rarely sees someone who has had the profound impact Steve has had, the effects of which will be felt for many generations to come. 

For those of us lucky enough to get to work with him, it’s been an insanely great honor.  I will miss Steve immensely. 

The White House Blog

President Obama on the Passing of Steve Jobs: "He changed the way each of us sees the world."

Posted by Kori Schulman on October 05, 2011 at 09:15 PM EDT

Following the loss of visionary Apple co-founder Steve Jobs, President Obama released this statement:
Michelle and I are saddened to learn of the passing of Steve Jobs. Steve was among the greatest of American innovators - brave enough to think differently, bold enough to believe he could change the world, and talented enough to do it.

By building one of the planet’s most successful companies from his garage, he exemplified the spirit of American ingenuity.  By making computers personal and putting the internet in our pockets, he made the information revolution not only accessible, but intuitive and fun.  And by turning his talents to storytelling, he has brought joy to millions of children and grownups alike. Steve was fond of saying that he lived every day like it was his last.  Because he did, he transformed our lives, redefined entire industries, and achieved one of the rarest feats in human history: he changed the way each of us sees the world.

The world has lost a visionary. And there may be no greater tribute to Steve’s success than the fact that much of the world learned of his passing on a device he invented. Michelle and I send our thoughts and prayers to Steve’s wife Laurene, his family, and all those who loved him.



Monday, September 26, 2011

What is a good explanation of Latent Dirichlet Allocation





  • I ate a banana and spinach smoothie for breakfast

  • I like to eat broccoli and bananas.

  • Chinchillas and kittens are cute.

  • My sister adopted a kitten yesterday.

  • Look at this cute hamster munching on a piece of broccoli.


  • Sentences 1 and 2: 100% Topic A

  • Sentences 3 and 4: 100% Topic B

  • Sentence 5: 60% Topic A, 40% Topic B

  • Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, ... (at which point, you could interpret topic A to be about food)

  • Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, ... (at which point, you could interpret topic B to be about cute animals)


LDA Model


  • Decide on the number of words N the document will have (say, according to a Poisson distribution).

  • Choose a topic mixture for the document (according to a Dirichlet distribution over a fixed set of K topics). For example, assuming that we have the two food and cute animal topics above, you might choose the document to consist of 1/3 food and 2/3 cute animals.

  • Generate each word in the document by:

  • ....First picking a topic (according to the multinomial distribution that you sampled above; for example, you might pick the food topic with 1/3 probability and the cute animals topic with 2/3 probability).

  • ....Then using the topic to generate the word itself (according to the topic's multinomial distribution). For instance, the food topic might output the word "broccoli" with 30% probability, "bananas" with 15% probability, and so on.




  • Decide that D will be 1/2 about food and 1/2 about cute animals.

  • Pick 5 to be the number of words in D.

  • Pick the first word to come from the food topic, which then gives you the word "broccoli".

  • Pick the second word to come from the cute animals topic, which gives you "panda".

  • Pick the third word to come from the cute animals topic, giving you "adorable".

  • Pick the fourth word to come from the food topic, giving you "cherries".

  • Pick the fifth word to come from the food topic, giving you "eating".

上述流程其实就是模拟人写文档的一个过程,这个由LDA model生成的文档为"broccoli panda adorable cherries eating"(LDA是一系列words的模型)


首先假定有一个文档集合,已经选择了K个topics用于发现,希望使用LDA来学习出每个文档的主题表示(topic representation),以及每个主题相关的词(words)。怎么做呢?一种方法如下(collapsed Gibbs sampling):

  • Go through each document, and randomly assign each word in the document to one of the K topics.(对每个文档,将每个word随机赋予某个topic)

  • Notice that this random assignment already gives you both topic representations of all the documents and word distributions of all the topics (albeit not very good ones).(这种随机赋值也是一种学习的结果,不过我们需要去改进,方法就是迭代)

  • So to improve on them, for each document d...

  • ....Go through each word w in d...

  • ........And for each topic t, compute two things: 1) p(topic t | document d) = the proportion of words in document d that are currently assigned to topic t, and 2) p(word w | topic t) = the proportion of assignments to topic t over all documents that come from this word w. Reassign w a new topic, where you choose topic t with probability p(topic t | document d) * p(word w | topic t) (according to our generative model, this is essentially the probability that topic t generated word w, so it makes sense that we resample the current word's topic with this probability). (Also, I'm glossing over a couple of things here, such as the use of priors/pseudocounts in these probabilities.)(计算文档属于某topic的概率p(t|d)以及word属于topic的概率p(w|t),然后根据这个来计算当把某word w赋予一个新的topic的时候的概率,即p(t|d)*p(w|t))

  • ........In other words, in this step, we're assuming that all topic assignments except for the current word in question are correct, and then updating the assignment of the current word using our model of how documents are generated.(每次计算某word属于某topic概率的时候,假定其他word的topic是确定的,)

  • After repeating the previous step a large number of times, you'll eventually reach a roughly steady state where your assignments are pretty good. So use these assignments to estimate the topic mixtures of each document (by counting the proportion of words assigned to each topic within that document) and the words associated to each topic (by counting the proportion of words assigned to each topic overall).(迭代直到收敛,使用最后的值来表示每个文档的主题表示,以及每个topic 的words表示)












Tuesday, August 23, 2011


daily hack news看到推荐的一篇文章,原文是比较老的,2007年,How To Become a Better Programmer by Not Programming


  • 对于Does accumulating experience through the years necessarily make programming easier?盖茨的回答如下:

    • I think after the first three or four years, it's pretty cast in concrete whether you're a good programmer or not.

    • 3-4年既可以看出来你是否是个好的程序员,编程经验的积累并不能让编程更加容易

  • The older I get, the more I believe that the only way to become a better programmer is by not programming. You have to come up for air, put down the compiler for a moment, and take stock of what you're really doing. Code is important, but it's a small part of the overall process.

    • 你需要离开电脑一会,仔细思量你做的东西,代码很重要,但只是整个过程的一小部分

  • To truly become a better programmer, you have to to cultivate passion for everything else that goes on around the programming.

    • 将热情倾注到编程相关的所有事情上去

  • You won't-- you cannot-- become a better programmer through sheer force of programming alone. You can only complement and enhance your existing programming skills by branching out. Learn about your users. Learn about the industry. Learn about your business.The more things you are interested in, the better your work will be.

    • 加入纯粹靠编程,你不可能成为一个更好的程序员

    • 了解你的用户,了解业界,了解你的生意,你感兴趣的事情越多,你的工作将会越美好

Monday, August 22, 2011

才储分析:您的性格类型倾向为“ ENFP ”

才储分析:您的性格类型倾向为“ ENFP ”(外向 直觉 情感 知觉 倾向度: E70 N52 F83 P52 不假思索指数:19)

热情洋溢、富有想象力。认为生活是充满很多可能性。能很快地将事情和信息联系起来,然后很自信地根据自己的判断解决问题。很需要别人的肯定,又乐于欣赏和支持别人。灵活、自然不做作,有很强的即兴发挥的能力 ,言语流畅。

ENFP型的人充满热情和新思想。他们乐观、自然、富有创造性和自信,具有独创性的思想和对可能性的强烈感受。对于 ENFP型的人来说,生活是激动人生的戏剧。 ENFP型的人对可能性很感兴趣,所以他们了解所有事物中 的深远意义。他们具有洞察力,是热情的观察者,注意常规以外的任何事物。ENFP型的人好奇,喜欢理解而不是判断。 ENFP型的人具有想象力、适应性和可变性,他们视灵感高于一切,常常是足智多谋的发明人。ENFP型的 人不墨守成规,善于发现做事情的新方法,为思想或行为开辟新道路,并保持它们的开放。在完成新颖想法的过程中,ENFP型的人依赖冲动的能量。他们有大量的主动性,认为问题令人兴奋。他们也从周围其他人中得到能 量,把自己的才能与别人的力量成功地结合在一起。 ENFP型的人具有魅力、充满生机。他们待人热情、彬彬有礼、富有同情心,愿意帮助别人解决问题。他们具有出色的洞察力和观察力,常常关心他人的发展。 ENFP型的 人避免冲突,喜欢和睦。他们把更多的精力倾注于维持个人关系而不是客观事物,喜欢保持一种广泛的关系。



· 人力资源经理
· 变革管理顾问
· 营销经理
· 企业/团队培训师
· 广告客户经理
· 战略规划人员
· 宣传人员
· 事业发展顾问
· 环保律师
· 研究助理
· 播音员
· 开发总裁
· 广告创意
· 广告撰稿人
· 市场营销和宣传策划
· 市场调研人员
· 公关专家
· 公司对外发言人
· 儿童教育老师
· 大学老师(人文类)
· 心理学工作者
· 心理辅导和咨询人员
· 职业规划顾问
· 社会工作者
· 演讲家
· 记者(访谈类)
· 节目策划和主持人
· 专栏作家
· 剧作家
· 设计师
· 卡通制作者
· 电影、电视制片人


———— 迪帕克


· 企业/团队培训师
· 广告客户经理
· 战略规划人员


Saturday, August 20, 2011



1.程序员修炼之道,读了前沿等,似乎评价很高,俨然是作为一个IT民工不得不读的书,于是决定拿下,不然说不过去- -,预计耗时2-3周


3.卓有成效的管理者,该书为老大所赠,被我戏称为effetive management,预计耗时2-3周



Wednesday, August 10, 2011

LINUX:使用 screen 创建并管理多个Shell窗口

  使用 GNU Screen 就可以了。screen 可以在一个控制台(如XshellsecureCRT) 登录 shell 中创建和管理多个 shell 窗口。可以从一个 shell 窗口迅速切换到另一个窗口,最重要的是一天的工作完了之后,不必退出这n多shell窗口,而只是将它们detach,然后退出登录,下次登录的时候可以使用screen重新恢复上次连接的多个窗口。可以理解为是一个虚拟的远程主机,假如有一批服务开启了,如果你退出再次登录,他们都还在运行。

  • screen:第一次使用的时候,输入screen,即新建了一个session,然后默认有一个shell窗口,下面会有状态栏显示窗口的名称,通常是“0 bash”。

  • ctra a + A:假如你想重命名这个窗口,按ctra a,然后A,就可以重命名

  • ctrl a + c:新建窗口

  • screen -t name:新建名字为name的窗口

  • ctrl a + n or p:切换窗口,n next,p previous

  • ctrl a + ":列出所有的窗口,也可以在状态栏观察到

  • screen -S name:新建一个session 名字为name

  • ctra a + d:使得当前的session detached,这个是很重要的一个命令,这个命令使得你临时退出了该session,而下次可以进行重新连接

  • screen -ls:想进行重新连接的时候,你需要知道你的screen session是哪一个,然后使用screen -r id命令,ls列出了所有的screen session的id,当然假如你使用screen -S name创建的session的话,会有一个名字,更便于记忆

  • screen -r id/name:重新连接screen session id/name,即恢复访问你之前打开的诸多窗口

  • exit:退出当前的窗口

  • ctrl a + K:kill当前窗口

  •   这些命令目前对于我来说已经足够了,然后来说说配置,默认的screen的配置是无法在状态栏上显示所有的窗口的,只能显示当前窗口的状态,当然你可以用列表的方式,或者切换来访问其他的窗口。其实可以展示成类似标签的更好的方式,即可以使用状态栏在视觉上进一步区分各个窗口。启动您喜欢的文本编辑器,在主目录中创建包含以下代码的 .screenrc 文件:
    hardstatus on
    hardstatus alwayslastline
    hardstatus string "%{.bW}%-w%{.rW}%n %t%{-}%+w %=%{..G} %H %{..Y} %m/%d %C%a

    然后每个新窗口就会显示在状态栏,包含编号和名称,如“0 econsh”。
    对话 UNIX: 使用 Screen 创建并管理多个 shell
    linux 技巧:使用 screen 管理你的远程会话
    linux下的screen工具配置(针对 string escape)

    Friday, July 29, 2011












    比较old的新闻了,今天才看到,Google Forecloses On Content Farms With “Panda” Algorithm Update
      虽然google官方对于panda的说法不是这样,但是Matt Cutts说“it was going after shallow and low quality content.” 其实就是警示大家,假如你的blog、or空间还是sns的,主要的内容来源是copy,转载,那么你的站点会认为是内容的农场,是低质量的,在ranking上就会受到惩罚。
      最新关于panda的信息参考Official: Google Panda 2.3 Update Is Live

    Sunday, July 3, 2011


      截止今天,烧饭两次,味道不错,主要电压力锅的说明书太好了- -。炒过土豆丝、青椒肉丝,做了鲫鱼汤、番茄蛋汤,还凑合着吃,等修炼一段时间再喊哥们过来尝尝,现在叫过来太难为人了。

    Sunday, June 26, 2011









    Sunday, June 5, 2011


    Poynter看到的关于Google Chart Tools的介绍 How to make searchable, Web-based Google charts。参考该文章,也试用了下Google Chart Tools

      大量的数据可视化需要有专业的知识或者需要话费大量时间精力以及资源,而google的The Google Visualization API(web)使得此工作变得简单,不管你是designer, developer, Web producer 还是hobbyist。


    1. 首先打开Google Visualization API homepage,选择Bar Chart

    2. 在下面的图中,点击Google Visualization API playground的字样。

    3. 然后打开一个新的页面,左侧为API或者code的列表,右侧为当前API或者code的代码,而下面为对应的bar chart,修改code,点击 run code,bar chart会有相应的变化。

    4. 原始的代码如下 [javascript]function drawVisualization() {
      // Create and populate the data table.
      var data = new google.visualization.DataTable();
      var raw_data = [['Austria', 1336060, 1538156, 1576579, 1600652, 1968113, 1901067],
      ['Bulgaria', 400361, 366849, 440514, 434552, 393032, 517206],
      ['Denmark', 1001582, 1119450, 993360, 1004163, 979198, 916965],
      ['Greece', 997974, 941795, 930593, 897127, 1080887, 1056036]];

      var years = [2003, 2004, 2005, 2006, 2007, 2008];

      data.addColumn('string', 'Year');
      for (var i = 0; i < raw_data.length; ++i) {
      data.addColumn('number', raw_data[i][0]);


      for (var j = 0; j < years.length; ++j) {
      data.setValue(j, 0, years[j].toString());
      for (var i = 0; i < raw_data.length; ++i) {
      for (var j = 1; j < raw_data[i].length; ++j) {
      data.setValue(j-1, i+1, raw_data[i][j]);

      // Create and draw the visualization.
      new google.visualization.BarChart(document.getElementById('visualization')).
      {title:"Yearly Coffee Consumption by Country",
      width:600, height:400,
      vAxis: {title: "Year"},
      hAxis: {title: "Cups"}}

    5. 通常情况下,需要修改这几个参数即可。

      1. 第4行var raw_data的内容

      2. 第9行var years的内容

      3. 第11行和第32行对应的Y轴的坐标名称

      4. 第33行的X轴的坐标名称

      5. 第30行的chart的名称

      6. 其他详细参数设置可以参考文档的Configuration Options部分

    6. 我的修改之后的代码和charts分别如下所示:[javascript]function drawVisualization() {
      // Create and populate the data table.
      var data = new google.visualization.DataTable();
      var raw_data = [['econsh', 40000, 55381],
      ['mushi', 20000, 43816],
      ['wods', 30000, 5816]];

      var years =['article num', 'time'];

      data.addColumn('string', 'properties');
      for (var i = 0; i < raw_data.length; ++i) {
      data.addColumn('number', raw_data[i][0]);


      for (var j = 0; j < years.length; ++j) {
      data.setValue(j, 0, years[j].toString());
      for (var i = 0; i < raw_data.length; ++i) {
      for (var j = 1; j < raw_data[i].length; ++j) {
      data.setValue(j-1, i+1, raw_data[i][j]);

      // Create and draw the visualization.
      new google.visualization.BarChart(document.getElementById('visualization')).
      {title:"SBBSert statictics",
      width:600, height:400,
      vAxis: {title: "properties"},
      hAxis: {title: "nums"}}

    7. 对应图片如下

    注意:这个bar chart貌似不支持中文,有了中文的话,就不能正常工作,已经反馈给google

    Wednesday, June 1, 2011

    Machine Learning Demos

    这是Basilio Noris博士的杰作,主要针对现有的机器学习的分类、距离、回归等算法的现有source code并不是很好使用以及理解,实现了一个交互式的GUI,把一些库和例子结合起来,对这些算法进行了更好的可视化和比较,该GUI支持Windows,Linux,以及Mac。用户可以根据自己的机器选择安装进行体验。详细的使用和介绍参考这里Machine Learning Demos



    ClassificationRegressionDynamical SystemsClusteringProjections
    Support Vector Machine (SVM) (C, nu, Pegasos)
    Relevance Vector Machine (RVM)
    Gaussian Mixture Models (GMM)
    Multi-Layer Perceptron + BackPropagation
    Gentle AdaBoost + Naive Bayes
    Approximate K-Nearest Neighbors (KNN)
    Support Vector Regression (SVR)
    Relevance Vector Regression (RVR)
    Gaussian Mixture Regression (GMR)
    MLP + BackProp
    Approximate KNN
    Sparse Optimized Gaussian Processes (SOGP)
    Locally Weighed Projection Regression (LWPR)
    SOGP (Slow!)
    Soft K-Means
    Kernel K-Means
    One Class SVM
    Principal Component Analysis (PCA)
    Kernel PCA
    Independent Component Analysis (ICA)
    Linear Discriminant Analysis (LDA)
    Fisher Linear Discriminant
    EigenFaces to 2D (using PCA)

    Tuesday, May 24, 2011









    Friday, May 20, 2011

    阴影、背景以及边界的CSS builder


      Layer Styles是一个在线的CSS builder for shadows, backgrounds, and borders。它支持的styles包括:

    1. Drop shadow

    2. Inner shadow

    3. Background

    4. Border

    5. Border radius

      每个style都有几个属性(如Background的属性包括Opacity(不透明度),Gradient(梯度),以及梯度的style和angle),属性发生变化时,页面上的div会实时变化,同时下面的CSS code也会跟着变化,这样我们不必精通CSS,也照样可以编写适合自己的CSS代码。当然,对于IE,表示无能为力




    <code>border: 1px solid black;
    border-radius: 13px;
    background-image: -moz-linear-gradient(top, white, black);
    background-image: -webkit-gradient(linear, center top, center bottom, from(white), to(black));
    box-shadow: 0 1px 5px 6px rgba(14,232,50,0.75), inset 0 1px 1px 1px #ffb812;</code>



    Monday, May 16, 2011


    Feedburner最基本的功能就是帮你烧制RSS Feed、提供永久地址,让你无论换多少次blog都不用为feed地址变了而烦心,本人自然也需要这样的feed地址。Feedburner还包含了其他如统计、SNS、邮件订阅等功能,可以自行参考配置。

      如果在使用Feedburner的同时依然像保留RSS地址,最好的方式是在blog的某个明显位置(如右上角)添加Feedburner订阅的链接,并且推荐用户使用。然而一个严重的问题是Feedburner被GFW认证,并不是每一个用户都能成功穿越防火墙。一个解决办法是开启Feedburner的Email订阅功能,在Publicize菜单中的Email subscription选择,激活即可。如本blog的Email订阅地址为

      如果在使用Feedburner的同时不想保留RSS地址,对于Wordpress博客系统来说,可以使用FeedBurner_FeedSmith_Plugin,该插件可以配置相应的blog以及blog评论的Feedburner地址,开启该插件之后,blog的RSS地址将自动跳转到对于的Feedburner地址。不保留RSS地址的弊端在于GFW的认证,不能翻墙的用户只能通过上述的Email订阅功能订阅blog,或者将Feedburner直接加入Google Reader。但是有的用户并不是很喜欢用Email的订阅,或者并不是使用Google Reader,因此更加推荐的方式是第一种方法。注意:开启FeedBurner_FeedSmith_Plugin之后,假如关闭,需要过一段时间RSS地址才能够不跳转,而不是立即生效,这个目前本人不知道为嘛(求解)?耐心等待一段时间即可生效,不要以为是没有关闭。


    Sunday, May 15, 2011

    激活了google storage

    刚刚激活了Google Storage,如何激活可以参照这里,提醒大家一个坑爹的事情,由于激活google storage需要turn on billing,如下:
    Turn on billing.

    Before you can use Google Storage, you need to enable billing for your project. To do so, click the Billing tab and enable billing. Enabling billing does not necessarily mean you will be charged. See Pricing and Terms for more information.

    follow 了步骤去做,利用google checkout,显示需要支付usd0,但是支付的国家里面没有选项china,于是我选择了香港,地址填了九龙的某地方,然后交易发生的时候收到短信扣除网上当地币8.00元,貌似这个是交易费,而不是交易的费用,类似于手续费,因此并不是真正的free的,或者是由于我的卡是大陆的,选择的交易地点 却是hongkong导致。

    激活之后,5G的空间既可以使用了,有两种方式可以访问空间,google storage managerGSUtil。 由于刚刚激活,我简单的浏览了前者的使用,功能基本上包括建立buckets,folder,上传folder,files,再有就是删除和分享,没有我 想象的查看代码的功能(因为是storage for developers)。GSUtil的功能则强大了许多,提供类似linux 命令行的操作,如"gsutil cp","gsutil cat"等。


    Saturday, May 14, 2011





    1. 订阅自己的评论,但是需要看google reader,而且不及时。

    2. 把rss烧制成feedburner地址,并激活邮件订阅服务,貌似这个不work,估计是postfix没有开启,当然也就无法发送邮件了。

    3. 其他插件,如subscribe to comments,mail to commenter,都不能满足我的需求。

    然后,一个偶然的机会,看到了这篇文章,Configure SMTP配置WordPress博客邮件系统,安装Configure SMTP,配置自己的gmail即可。


    Friday, May 13, 2011











    Wednesday, May 11, 2011



    Thursday, April 14, 2011

    c define 特殊字符 # ## #@ \

        偶然的机会读到酷壳的一篇关于语言的歧义的文章,对其中第四个puzzle印象颇深,不过印象更多是因为他说了一句“当然,你首先要了解##和#的用法,如果不懂的话,本题你可以直接跳过。”,我可以跳过了,但是肯定不甘心,果断去找##和#的用法,在Preprocessor directives中发现了#和##的定义,原文如下:

    Function macro definitions accept two special operators (# and ##) in the replacement sequence:

    If the operator # is used before a parameter is used in the replacement sequence, that parameter is replaced by a string literal (as if it were enclosed between double quotes)

    #define str(x) #x
    cout << str(test);
    This would be translated into:
    cout << "test";

    The operator ## concatenates two arguments leaving no blank spaces between them:
    #define glue(a,b) a ## b
    glue(c,out) << "test";
    This would also be translated into:
    cout << "test";


    #define ToChar(x) #@x
    char a = ToChar(1),就是a='1'


    Wednesday, March 9, 2011

    python 实现css @import 替换功能


    python 处女作,不管多烂,帖上,日后来看看

    其间饱受liangqing mm的点拨和照顾,再次感谢一哈
    def replaceimport(i, directory, filename):
        print i
        i = i + 1
        f = open(directory + filename)
        content =
        result = content
        matches = re.findall(r'@import\s+"[^"]+"\s*;', content)
        for match in matches:
        print match
        dirfile = re.compile(r'"([^/]+/)([^/]+)"').search(match)
        ndir =
        nfile =
        print ndir + ',' + nfile
        replace = replaceimport(i, directory + ndir, nfile)
        result = content.replace(match, replace)
        #result = re.sub(match, replace, content) 这货我没有开始弄清楚导致我盲目使用表达式结果造成\的灾难,出现invalid group reference的错误,后来发现replace即可
        content = result
        #print result
        return result

    BlogSpot Twitter

    Sunday, March 6, 2011


    1. twitter 使用时间缩短,更多的是看身边的人的timeline,以及偶尔关注下动态,作用 +
    2. google reader 使用时间缩短,这个主意因为假期的惯性,目前需要做的是不拖拉,尽量把源减少,以达到效果 +
    3. 淫淫网时间增多,主要看视频,懒惰的体现 -
    4. 日志的习惯得以保留,并且很少忘记,继续培养 +
    5. 论文相关工作比较容易进入状态,需要保持 +
    6. 宅的有点多,需要多去户外运动 -
    7. 公司相关知识已经开始着手慢慢学习 +
    8. 电影看的优点多,不知是好还是坏,但应该多看书是没错的,要看完一本书,走之前

    Sunday, January 23, 2011

    收拾心情 回家喽


    Monday, January 17, 2011

