天涯海角

My Web Home

Monthly Archives: 7月 2010

女人床上的谎言及谎言背后

男人是世界上最善于撒谎的动物,每天撒谎的次数几乎与眨眼睛的次数相等。那么女人呢?难道女人就不撒谎吗?其实,在床上,只要男女双方的话题一旦涉及到两性情爱,女人几乎说出的每一句话都是谎言,以下就是发生频率最高的十大谎言及谎言背后的隐情。

女人床上的谎言及谎言背后

谎言一:亲爱的,你打游戏专注的神情真帅,或者问你,你在写博客吗?点击有多少了?
  其实,女人已经开始暗示你了,她不过在找话题罢了。你在看书,会问你书写的什么;你在喂热带鱼,会问你鱼儿产子了吗?还不赶快以迅雷不及掩耳之势扑上床去。如果女人没有什么想法,她会说,“声音小点!”“你明天还上不上班啊?”语气里更多的是埋怨。
谎言二:嘿!老公,你看你现在的肚子,像不像我怀孕4、5个月的时候?那时候,你也像现在我这样,轻轻地、缓慢地揉着、揉着,舒服极了。
  男人遇到这样的场景,千万别以为你挺着的大肚子让老婆感觉很舒服。女人在提醒你,肚子太大,必将要影响行动,影响健康,关键就是当下已经让她感受到你的活力不及过去,就像当初她怀孕的时候,像一只肥企鹅,哪还有什么美感和欲望呢?少点应酬,多些运动,毕竟在床上把你恶心成一头肥猪是一件破坏氛围的事吧!
谎言三:你啊真棒,还像个小伙子,和恋爱那阵一点没变。
  “小伙子”在这里可是鲁莽和草率的代名词,居然结婚到现在,还没有掌握好火候,女人这样说,是希望得到你更加的温柔、体贴。
谎言四:你唱歌真好听,给我唱支情歌好吗?
  女人在转移你的注意力,她想告诉你,即使你那五音不准的所谓情歌,也能让她在此刻感受到你浓浓爱意。
  可以想象,你一定有些力不从心,但是,她绝对没有怪你的意思,她是一个聪明的女人,当男人不行的时候,她明白转移注意力,不要伤及要害是明智的,以免使你背负心理负担,因为她不愿意“天天听你唱情歌”。
谎言五:我的身体知道,你的身体知道吗?
  男人在床上有一句最愚蠢的问话–“你开心吗?”或者“我棒不棒?”等。男人既然如此愚蠢,女人可不会这样弱智。“我的身体知道”,知道什么?你猜!我不会给你确切的答案,首先不能打压你的激情,但也决不能纵容你一贯的“自以为是”。“你的身体知道吗?”其实答案已经很明确了。
谎言六:啊,你演《色戒》一定比梁朝伟还要棒。
  不要得意,只有一个梁朝伟,不然,李安怎么没有找上你?不要追求那些花里胡梢的稀奇动作,“回形针”也好,“连环扣”也罢,这些都是精心设计和排练的。既然,你们没有在拍电影,就是有DV录影,你也不打算让第三个人共赏吧,因此,还是中规中距一些,闪了腰,难道还真的去告李安不成?
谎言七:亲爱的,还记得上一次你搂着我看夜空的情景吗?
  你老婆今天累了,性趣全无,可是,看着你兴致勃勃的样子,她又不愿意扫你的兴致。搂着她,把灯关掉,打开窗户,两人就这么默不作声地看看没有星星的夜空吧。
  此刻,你们彼此的心贴得很近,她呢喃着过去的回忆,渐渐舒缓你的热情,但是,她却愿意这样让你搂着,享受着爱和甜蜜。
谎言八:快点,再快点,亲爱的!
  女人不会、也无法像男人那样在床上有着很夸张的动作,可是,女人很清楚男人对床上修女一样的女人怀着极大的抵触情绪。当男人横冲直闯的时候,女人会拿捏很好地发出“感叹词”。女人希望能够以此来成全男人的征服感,她仅希望男人更迷恋她,更在乎她。
  “快点,我幸福得快要飞起来!”这是床上女人超级的谎言,也是超级善解人意的谎言。如果男人一直为女人如此表现沾沾自喜,而不表现出实际行动,谎言终有说破的那一天。
谎言九:你在我心中谁也替代不了,你是最完美的。
  就像女人总嫌自己衣柜里少一件衣服一样,男人总感觉自己不完美,其实,这是男人的虚荣心在作怪。男人与男人之间十分愿意共享那些所谓的“猛药”和“偏方”,在乎尺短寸长。其实,女人明白鼓励的重要,女人不愿意看见床上男人的尴尬,但也不喜欢一个依靠“偏方”支持的男人。
  还有,有些男人总喜欢询问女人自己与她上一任男友实力的优劣,即便真的存在差距,哪个女人又会冒傻气地说,你不如他呢?
谎言十:如果你再浪漫一点就是一个完美的男人。
  “我们来杯红酒吧!”也许她滴酒不沾,也许你们从没有在卧室的床上饮酒的经历,但她却要求喝点红酒而不是矿泉水,说明她已经厌倦你们缺少情趣的夫妻生活。改变一下场景,给你们的生活增添一些氛围,可以尝试一下喝点红酒,来段音乐,不要像完成家庭作业似的,女人什么时候都需要情调和浪漫,生活中许多小小的改变,就会让一切看起来与众不同。
  男人要明白,不同的场合在女人心目中你总是差那么一点,“个头高一点”,“钱在多一点”,“再温柔一点”……就是一个完美的男人了。

墨子说:国之将亡,必有七患

国防之患:不修国防大兴宫殿
外交之患:大敌当前孤立无援
财政之患:铺张浪费穷尽民用
内政之患:仕皆渔私修法禁言
国君之患:闭门自大坐井观天
团队之患:小人当道离心离德
政权之患:国无贤能赏罚失威

墨子曰:国有七患。七患者何?城郭沟池不可守而治宫室,一患也;边国至境(2),四邻莫救,二患也;先尽民力无用之功,赏赐无能之人,民力尽于无用,财宝虚于待客,三患也;仕者持禄,游者爱佼(3),君修法讨臣,臣慑而不敢拂,四患也;君自以为圣智而不问事,自以为安强而无守备,四邻谋之不知戒,五患也;所信者不忠,所忠者不信,六患也;畜种菽粟不足以食之,大臣不足以事之,赏赐不能喜,诛罚不能威,七患也。 
以七患居国,必无社稷;以七患守城,敌至国倾。七患之所当,国必有殃。 
凡五谷者,民之所仰也,君之所以为养也。故民无仰,则君无养;民无食,则不可事。故食不可不务也,地不可不力也,用不可不节也。五谷尽收,则五味尽御于主,不尽收则不尽御。一谷不收谓之馑,二谷不收谓之旱,三谷不收谓之凶,四谷不收谓之馈(4),五谷不收谓之饥。 
岁馑,则仕者大夫以下皆损禄五分之一;旱,则损五分之二;凶,则损五分之三;馈,则损五分之四;饥,则尽无禄,禀食而已矣。故凶饥存乎国,人君彻鼎食五分之五(5),大夫彻县(6),士不入学,君朝之衣不革制;诸侯之客,四邻之使,雍食而不盛(7);彻骖騑,涂不芸(8),马不食粟,婢妾不衣帛,此告不足之至也。 
今有负其子而汲者,队其子于井中(9),其毋必从而道之。今岁凶,民饥,道饿,重其子此疚于队,其可无察邪!故时年岁善,则民仁且良;时年岁凶,则民吝且恶。夫民何常此之有!为者疾,食者众,则岁无丰。 
故曰:财不足则反之时,食不足则反之用。故先民以时生财,固本而用财,则财足。故虽上世之圣王,岂能使五谷常收而旱水不至哉!然而无冻饿之民者,何也? 其力时急而自养俭也。故《夏书》曰:“禹七年水。”《殷书》曰:“汤五年旱。”此其离凶饿甚矣(10),然而民不冻饿者,何也?其生财密,其用之节也。故仓无备粟,不可以待凶饥;库无备兵,虽有义不能征无义;城郭不备全,不可以自守;心无备虑,不可以应卒(11),是若庆忌无去之心,不能轻出。 
夫桀无待汤之备,故放;纣无待武之备,故杀。桀纣贵为天子,富有天下,然而皆灭亡于百里之君者,何也?有富贵而不为备也。故备者,国之重也。食者,国之宝也;兵者,国之爪也;城者,所以自守也;此三者,国之具也。
故曰:以其极赏,以赐无功;虚其府库,以备车马、衣裘、奇怪;苦其役徒,以治宫室观乐;死又厚为棺椁,多为衣裘。生时治台榭,死又修坟墓。故民苦于外,府库单于内(12),上不厌其乐(13),下不堪其苦。故国离寇敌则伤,民见凶饥则亡,此皆备不具之罪也。且夫食者,圣人之所宝也。故《周书》曰:“国无三年之食者,国非其国也;家无三年之食者,子非其子也。”此之谓国备。 
注释—————————— 
(1)本篇首先分析了给国家造成危亡的七种祸患,然后指出国家防治祸患的根本在于增加生产和节省财用,并对当时统治者竭尽民力和府库之财以追求享乐生活的 做法提出了严正警告。(2)边:“敌”字之误。(3)佼:通“交”。(4)馈:通“匮”,缺乏。(5)五分之五:疑作“五分之三”。(6)县:通“悬”,此指钟磬等悬挂的乐器。(7)雍:当作“饔”,指早餐和晚餐。(8)涂:通“途”。(9)队:通“坠”。(10)离:通“罹”,遭受。(11)卒:通 “猝”。(12)单:通“殚”。(13)厌:通“餍”,满足。
白话译文——————————
墨子说:国家有七种祸患。这七种祸患是什么呢?内外城池壕沟不足守御而去修造宫室,这是第一种祸患;敌兵压境,四面邻国都不愿来救援,这是第二种祸患;把民力耗尽在无用的事情上,赏赐没有才能的人,(结果)民力因做无用的事情而耗尽,财宝因款待宾客而用空,这是第三种祸患;做官的人只求保住俸禄,游学未仕的人只顾结交党类,国君修订法律以诛戮臣下,臣下畏惧而不敢违拂君命,这是第四种祸患;国君自以为神圣而聪明,而不过问国事,自以为安稳而强盛,而不作防御准备,四面邻国在图谋攻打他,而尚不知戒备,这是第五种祸患;所信任的人不忠实,而忠实的人不被信任,这是第六种祸患;家畜和粮食不够吃,大臣对于国事不胜使令,赏赐不能使人欢喜,责罚不能使人畏惧,这是第七种祸患。
治国若存在这七种祸患,必定亡国;守城若存在这七种祸患,国都必定倾毁。七种祸患存在于哪个国家,哪个国家必有祸殃。
五谷是人民所仰赖以生活的东西,也是国君用以养活自己和 民众的。所以如果人民失去仰赖,国君也就没有供养;人民一旦没有吃的,就不可使役了。所以粮食不能不加紧生产,田地不能不尽力耕作,财用不可不节约使用。 五谷全部丰收,国君就可兼进五味。若不全都丰收,国君就不能尽其享受。一谷无收叫做馑,二谷无收叫做旱,三谷不收叫做凶,四谷不收叫做匮,五谷不收叫做 饥。
遇到馑年,做官的自大夫以下都减去俸禄的五分之一;旱年,减去俸禄的五分之二;凶年,减去俸禄的五分之三;匮年,减去俸禄 的五分之四;饥年,免去全部俸禄,只供给饭吃。所以一个国家遇到凶饥,国君撤掉鼎食的五分之三,大夫不听音乐,读书人不上学而去种地,国君的朝服不制新 的;诸侯的客人、邻国的使者,来时饮食都不丰盛,驷马撤掉左右两匹,道路不加修理,马不吃粮食,婢妾不穿丝绸,这都是告诉国家已十分困乏了。
现在假如有一人背着孩子到井边汲水,把孩子掉到井里,那么这位母亲必定设法把孩子从井中救出。现在遇到饥年,路上有饿死的人,这种惨痛比孩子掉入井中更为严重,能忽视这种局面吗?年成好的时候,老百姓就仁慈驯良;年成遇到凶灾,老百姓就吝啬凶恶;民众的性情哪有一定呢!生产的人少,吃饭的人多,就不可能有丰年。
所以说:财用不足就注重农时,粮食不足就注意节约。因此,古代贤人按农时生产财富,搞好农业基础,节省开支,财用自然就充足。所以,即使前世的圣王,岂能使五谷永远丰收,水旱之灾不至呢!但(他们那时)却从无受冻挨饿之民,这是为何呢?这时因为他们努力按 农时耕种而自奉俭朴。《夏书》说:“禹时有七年水灾。”《殷书》说:“汤时有五年旱灾。”那时遭受的凶荒够大的了,然而老百姓却没有受冻挨饿,这是何故呢?因为他们生产的财用多,而使用很节俭。所以,粮仓中没有预备粮,就不能防备凶年饥荒;兵库中没有武器,即使自己有义也不能去讨伐无义;内外城池若不完 备,不可以自行防守;心中没有戒备之心,就不能应付突然的变故。这就好像庆忌没有逐走要离之意,就不可轻出致死。
桀没有防御汤的准备,因此被汤放逐;纣没有防御周武王的准备,因此被杀。桀和纣虽贵为天子,富有天下,然而都被方圆百里的小国之君所灭,这是为何呢?是因为他们虽然富贵,却不做好防备。所以防备是国家最重要的事情。粮食是国家的宝物,兵器是国家爪牙,城郭是用来自我守卫的:这三者是维持国家的工具。
所以说:拿最高的奖赏赐给无功之人;耗尽国库中的贮藏,用以置备车马、衣裘和稀奇古怪之物;使役卒和奴隶受尽苦难,去建造宫室和观赏游乐之所;死后又做 厚重的棺椁,制很多衣服。活着时修造台榭,死后又修造坟墓。因此,老百姓在外受苦,内边的国库耗尽,上面的君主不满足其享受,下面的民众不堪忍受其苦难。 所以,国家一遇敌寇就受损伤,人民一遭凶饥就死亡,这都是平时不做好防备的罪过。再说,粮食也是圣人所宝贵的。《周书》说:“国家若不预备三年的粮食,国 家就不可能成其为这一君主的国家了;家庭若不预备三年的粮食,子女就不能做这一家的子女了。”这就叫做“国备”(国家的根本贮备)。

Science:团藻基因组测序完成

德国比勒费尔德大学7月9日报告说,一个有德国研究者参加的国际研究小组最近完成了对最简单的多细胞生物团藻的基因组测序。科研人员希望以此帮助探寻单细胞生物向多细胞生物演变的奥秘。单细胞生物怎么能演变为多细胞生物乃至人这样高度复杂的生物,一直是生物研究的重要课题。一个由德国、美国、加拿大和日本科研人员组成的研究小组选择从团藻入手,因为团藻的细胞种类十分简单。此外,团藻还有一个单细胞近亲——莱茵衣藻,后者的基因组测序已在2007年完成。

在美国《科学》(Science)杂志7月9日发表的最新研究报告中,上述研究小组发现团藻的基因组有大约1.4亿个碱基对,包含大约1.45万个基因,比人类基因总数仅少不到一半。参与这项研究的比勒费尔德大学专家说,研究小组在比较团藻和莱茵衣藻基因组时意外发现,尽管这两种生物的复杂程度和生命史存在很大差异,二者的基因组却有相似的蛋白编码潜能。与莱茵衣藻相比,专家在团藻细胞内只发现了很少该生物特有的基因。科研人员由此推断,从单细胞生物演变为多细胞生物并非必需大幅提高基因的数目,在这种演变中,基因如何以及何时编码合成特定的蛋白才具有决定意义。

德国专家说,在单细胞生物向多细胞生物演变的分子机理研究方面,团藻基因组测序是了解这一分子机理的重要一步。长期而言,研究简单生物的分子机理有助于更好地理解人类等复杂生物的进化史。(生物谷Bioon.net)

Science DOI: 10.1126/science.1188800

Genomic Analysis of Organismal Complexity in the Multicellular Green Alga Volvox carteri
Simon E. Prochnik,1,* James Umen,2,*, Aurora M. Nedelcu,3 Armin Hallmann,4 Stephen M. Miller,5 Ichiro Nishii,6 Patrick Ferris,2 Alan Kuo,1 Therese Mitros,7 Lillian K. Fritz-Laylin,7 Uffe Hellsten,1 Jarrod Chapman,1 Oleg Simakov,8 Stefan A. Rensing,9 Astrid Terry,1 Jasmyn Pangilinan,1 Vladimir Kapitonov,10 Jerzy Jurka,10 Asaf Salamov,1 Harris Shapiro,1 Jeremy Schmutz,11 Jane Grimwood,11 Erika Lindquist,1 Susan Lucas,1 Igor V. Grigoriev,1 Rüdiger Schmitt,12 David Kirk,13 Daniel S. Rokhsar1,7,
The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are well suited for the investigation of the evolution of multicellularity and development. We sequenced the 138–mega–base pair genome of V. carteri and compared its ~14,500 predicted proteins to those of its unicellular relative Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similar protein-coding potentials and few species-specific protein-coding gene predictions. Volvox is enriched in volvocine-algal–specific proteins, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.
1 U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA 94598, USA.
2 The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
3 University of New Brunswick, Department of Biology, Fredericton, New Brunswick E3B 5A3, Canada.
4 Department of Cellular and Developmental Biology of Plants, University of Bielefeld, D-33615 Bielefeld, Germany.
5 Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
6 Biological Sciences, Nara Women’s University, Nara-shi, Nara Prefecture 630-8506, Japan.
7 Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, CA 94720, USA.
8 European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
9 Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.
10 Genetic Information Research Institute, 1925 Landings Drive, Mountain View, CA 94043, USA.
11 HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA.
12 Department of Genetics, University of Regensburg, D-93040 Regensburg, Germany.
13 Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA.

Nucleic Acids Res.:识别基因组CpG岛的新方法

来自哈尔滨医科大学生物信息科学与技术学院的研究人员在包括癌症在内的一些重大疾病的分子标志物识别方面获得了重要的成果,对人类疾病的早期发现和治疗有着重要的临床意义。这些成果已发表在国际著名学术期刊Nucleic  Acids  Research等10余个专业期刊。
领导这一研究的是哈尔滨医科大学生物信息科学与技术学院院长李霞教授,她作为生物信息学学科带头人,先后承担科研课题28项,其中国家863高科技计划项目2项(主持1项,副组长1项)、国家973前期项目1项(主持)、国家自然科学基金4项(主持3项),获省部级奖3项,厅局级奖8项,在国内外重要学术刊物和学术会议上发表学术论文110余篇(SCI  42篇、EI  18篇),累计SCI影响因子90.137,总引次数114,主编着作等8部。
重大疾病如癌症、心脏病、高血压、糖尿病等,其发生、发展是多基因、多通路、跨层面的遗传与环境因素共同作用的结果。采取何种方法及时发现疾病产生后特有的分子标记,对于了解发病机理、进行早期诊断,选取个性化治疗方案有着重大意义,也是现代医学转变治疗疾病思维重要理念的体现。李霞课题组在发现重大疾病的分子标志研究上取得的突破,有助于解决复杂疾病研究中的特征基因识别难题。
人类发现的肿瘤标志物已有百余种,但临床常用的仅20多种,能用于大规模人群普查的肿瘤标志物更少。临床上肿瘤的高发生率和死亡率迫切需要新的早期诊断用的生物标志物和新的肿瘤标记物检测技术。
另外近期来自阿拉巴马大学伯明翰分校临床蛋白质组学主任James  Mobley希望能利用血清,血浆和尿液,寻找胰腺癌的液体,或者蛋白标记物。但是一开始,他就陷入了“Catch-22”(美国谚语,从《第22条军规》延伸而来,是指互相抵触之规律或条件所造成的无法脱身的困窘)——为了能得到深入实验的经费,他们需要首先获得一份前导实验数据。然而当研究人员试图识别胰腺癌蛋白标记物的时候,摆在他们面前的是大量的不确定性,他们需要选择是从哪一种体液和组织中分析蛋白标记物,还需要从低通量,中通量,还是高通量平台中选择一种。
为了解决这些问题,研究人员首先开始查找文献,了解已经出版的有关血液和尿液样品实验数据,他们发现大多数之前的研究都是利用小型的合并样品,结合低通量实验平台,比如2D胶,来识别和检测蛋白。
肿瘤标记物首先是通过利用从肿瘤细胞中提取的单克隆抗体发现的。在获得抗血清后,通过免疫吸附去掉正常细胞,留下肿瘤特异性的抗原,筛选并评估这些候选物成为了寻找新的肿瘤标记物的经典方法。但是筛选并评估候选物的方法费时又耗力,而且当实验室的经费又不足的时候,如何获得进一步的研究进展,
以及获得更多的科研经费,是许多实验室都头疼的事情,上述的这一研究提供了一个很好的范例,指出了在这种情况下,研究人员应该采取的实验策略,也就是说实
际上,各实验室的研究人员在进行实验的第一步的时候,就要综合考虑好各方面的因素,包括实验经费,遇到的困难,以及如果不成功如何处理等。(生物谷Bioon.net)
原文出处:
Nucleic Acids Research 2010 38(1):e6; doi:10.1093/nar/gkp882

CpG_MI: a novel approach for identifying functional CpG islands in mammalian genomes
Jianzhong Su, Yan Zhang*, Jie Lv, Hongbo Liu, Xiaoyan Tang, Fang Wang, Yunfeng Qi, Yujia Feng and Xia Li*

CpG islands (CGIs) are CpG-rich regions compared to CpG-depleted bulk DNA
of mammalian genomes and are generally regarded as the epigenetic
regulatory regions in association with unmethylation, promoter activity
and histone modifications. Accurate identification of CpG islands with
epigenetic regulatory function in bulk genomes is of wide interest.
Here, the common features of functional CGIs are identified using an
average mutual information method to differentiate functional CGIs from
the remaining CGIs. A new approach (CpG mutual information, CpG_MI) was
further explored to identify functional CGIs based on the cumulative
mutual information of physical distances between two neighboring CpGs.
Compared to current approaches, CpG_MI achieved the highest prediction
accuracy. This approach also identified new functional CGIs overlapping
with gene promoter regions which were missed by other algorithms. Nearly
all CGIs identified by CpG_MI overlapped with histone modification
marks. CpG_MI could also be used to identify potential functional CGIs
in other mammalian genomes, as the CpG dinucleotide contents and
cumulative mutual information distributions are almost the same among
six mammalian genomes in our analysis. It is a reliable quantitative
tool for the identification of functional CGIs from bulk genomes and
helps in understanding the relationships between genomic functional
elements and epigenomic modifications

全新miRNA信息传递机制揭秘

全新miRNA信息传递机制揭秘
miRNA是一种由高等真核生物基因组编码的,可通过和靶基因mRNA碱基配对引导沉默复合体(RISC)降解mRNA或阻碍其翻译的小分子,其在物种进化中也是相当保守。
科学家长期以来持有的观点认为miRNA分子不会在细胞间移动,只会在一个细胞中"定居",而据7月9日Molecular Cell杂志上一篇题为"Secreted Monocytic miR-150 Enhances Targeted Endothelial Cell Migration "的研究论文,南京大学生命科学院的张辰宇教授和他的同事们的发现突破了这一传统的理念。
这项研究是基于一种叫miR-150的微小RNA分子,研究结果表明,单核或巨噬细胞在特定刺激下,其miR-150的分泌量会增加,随后miR-150 会经血液循环进入到内皮细胞中,并通过降低受体细胞中相应靶基因的翻译,刺激内皮细胞迁移。
一些传统的细胞间信号传递过程,比如细胞因子-受体、抗原-抗体等,一般只与一个或几个分子直接作用,因此它们的信号传递是单向的。然而对于miRNA来说,所有类型的细胞都具有分泌和接受miRNA的能力,并且在特定的生理与病理生理条件下,细胞可一次性分泌多种miRNA。另外在靶细胞中miRNA更能调节多个基因的翻译。所以,它的信号传递方式可以是双向或多向的。
张教授表示,miRNA这种比传统信号蛋白更高效的信号传递功能的发现将使科学家更好地理解生物系统的信息传递本质,有助于揭开像糖尿病、红斑狼疮这类疾病的发病机制,或能在未来开创全新的疾病治疗和预防方式,比如切断细胞信号传递通路。
相关研究
据近期生物谷的一篇报道,在经典植物模型拟南芥中对植物根发育所做的一项研究中,科学家发现,一种微RNA(miRNA165/6)与细胞间的通信有关,并且是根细胞命运的一个决定因子。
将水和溶质从根向茎输送的木质部微管的模式形成,被发现取决于一个新颖的双向信号作用通道,该通道涉及一个转录因子在一个方向上、microRNA在另一个方向上的细胞到细胞间的运动。这个转录因子为SHORT ROOT,是在维管柱中产生的,它进入内皮中,在那里与SCARECROW一起激发微RNA MIR165a 和166b,后者又回到维管细胞中,降解它们的目标、编码"Class III homeodomain-leucine zipper"转录因子的信使RNA。
这个调控通道中由在演化上保守的转录因子和miRNA组成的一个级联的参与表明,它也许是对陆地生长条件的一种演化适应。

生物信息学主要英文术语及释义

Abstract Syntax Notation (ASN.l)(NCBI发展的许多程序,如显示蛋白质三维结构的Cn3D等所使用的内部格式)
A language that is used to describe structured data types formally, Within bioinformatits,it has been used by the National Center for Biotechnology Information to encode sequences, maps, taxonomic information, molecular structures, and biographical information in such a way that it can be easily accessed and exchanged by computer software.
Accession number(记录号)
A unique identifier that is assigned to a single database entry for a DNA or protein sequence.
Affine gap penalty(一种设置空位罚分策略)
A gap penalty score that is a linear function of gap length, consisting of a gap opening penalty and a gap extension penalty multiplied by the length of the gap. Using this penalty scheme greatly enhances the performance of dynamic programming methods for sequence alignment. See also Gap penalty.
Algorithm(算法)
A systematic procedure for solving a problem in a finite number of steps, typically involving a repetition of operations. Once specified, an algorithm can be written in a computer language and run as a program.
Alignment(联配/比对/联配)
Refers to the procedure of comparing two or more sequences by looking for a series of individual characters or character patterns that are in the same order in the sequences. Of the two types of alignment, local and global, a local alignment is generally the most useful. See also Local and Global alignments.
Alignment score(联配/比对/联配值)
An algorithmically computed score based on the number of matches, substitutions, insertions, and deletions (gaps) within an alignment. Scores for matches and substitutions Are derived from a scoring matrix such as the BLOSUM and PAM matrices for proteins, and aftine gap penalties suitable for the matrix are chosen. Alignment scores are in log odds units, often bit units (log to the base 2). Higher scores denote better alignments. See also Similarity score, Distance in sequence analysis.
Alphabet(字母表)
The total number of symbols in a sequence-4 for DNA sequences and 20 for protein sequences.
Annotation(注释)
The prediction of genes in a genome, including the location of protein-encoding genes, the sequence of the encoded proteins, anysignificantmatches to other Proteins of known function, and the location of RNA-encoding genes. Predictions are based on gene models; e.g., hidden Markov models of introns and exons in proteins encoding genes, and models of secondary structure in RNA.
Anonymous FTP(匿名FTP)
When a FTP service allows anyone to log in, it is said to provide anonymous FTP ser-vice. A user can log in to an anonymous FTP server by typing anonymous as the user name and his E-mail address as a password. Most Web browsers now negotiate anonymous FTP logon without asking the user for a user name and password. See also FTP.
ASCII
The American Standard Code for Information Interchange (ASCII) encodes unaccented letters a-z, A-Z, the numbers O-9, most punctuation marks, space, and a set of control characters such as carriage return and tab. ASCII specifies 128 characters that are mapped to the values O-127. ASCII tiles are commonly called plain text, meaning that they only encode text without extra markup.
BAC clone(细菌人工染色体克隆)
Bacterial artificial chromosome vector carrying a genomic DNA insert, typically 100–200 kb. Most of the large-insert clones sequenced in the project were BAC clones.
Back-propagation(反向传输)
When training feed-forward neural networks, a back-propagation algorithm can be used to modify the network weights. After each training input pattern is fed through the network, the network’s output is compared with the desired output and the amount of error is calculated. This error is back-propagated through the network by using an error function to correct the network weights. See also Feed-forward neural network.
Baum-Welch algorithm(Baum-Welch算法)
An expectation maximization algorithm that is used to train hidden Markov models.
Baye’s rule(贝叶斯法则)
Forms the basis of conditional probability by calculating the likelihood of an event occurring based on the history of the event and relevant background information. In terms of two parameters A and B, the theorem is stated in an equation: The condition-al probability of A, given B, P(AIB), is equal to the probability of A, P(A), times the conditional probability of B, given A, P(BIA), divided by the probability of B, P(B). P(A) is the historical or prior distribution value of A, P(BIA) is a new prediction for B for a particular value of A, and P(B) is the sum of the newly predicted values for B. P(AIB) is a posterior probability, representing a new prediction for A given the prior knowledge of A and the newly discovered relationships between A and B.
Bayesian analysis(贝叶斯分析)
A statistical procedure used to estimate parameters of an underlyingdistribution based on an observed distribution. See also Baye’s rule.
Biochips(生物芯片)
Miniaturized arrays of large numbers of molecular substrates, often oligonucleotides, in a defined pattern. They are also called DNA microarrays and microchips.
Bioinformatics (生物信息学)
The merger of biotechnology and information technology with the goal of revealing new insights and principles in biology. /The discipline of obtaining information about genomic or protein sequence data. This may involve similarity searches of databases, comparing your unidentified sequence to the sequences in a database, or making predictions about the sequence based on current knowledge of similar sequences. Databases are frequently made publically available through the Internet, or locally at your institution.
Bit score (二进制值/ Bit值)
The value S’ is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.
Bit units
From information theory, a bit denotes the amount of information required to distinguish between two equally likely possibilities. The number of bits of information, AJ, required to convey a message that has A4 possibilities is log2 M = N bits.
BLAST (基本局部联配搜索工具,一种主要数据库搜索程序)
Basic Local Alignment Search Tool. A set of programs, used to perform fast similarity searches. Nucleotide sequences can be compared with nucleotide sequences in a database using BLASTN, for example. Complex statistics are applied to judge the significance of each match. Reported sequences may be homologous to, or related to the query sequence. The BLASTP program is used to search a protein database for a match against a query protein sequence. There are several other flavours of BLAST. BLAST2 is a newer release of BLAST. Allows for insertions or deletions in the sequences being aligned. Gapped alignments may be more biologically significant.
Block(蛋白质家族中保守区域的组块)
Conserved ungapped patterns approximately 3-60 amino acids in length in a set of related proteins.
BLOSUM matrices(模块替换矩阵,一种主要替换矩阵)
An alternative to PAM tables, BLOSUM tables were derived using local multiple alignments of more distantly related sequences than were used for the PAM matrix. These are used to assess the similarity of sequences when performing alignments.
Boltzmann distribution(Boltzmann 分布)
Describes the number of molecules that have energies above a certain level, based on the Boltzmann gas constant and the absolute temperature.Boltzmann probability function(Boltzmann概率函数)
See Boltzmann distribution.
Bootstrap analysis
A method for testing how well a particular data set fits a model. For example, the validity of the branch arrangement in a predicted phylogenetic tree can be tested by resampling columns in a multiple sequence alignment to create many new alignments. The appearance of a particular branch in trees generated from these resampled sequences can then be measured. Alternatively, a sequence may be left out of an analysis to deter-mine how much the sequence influences the results of an analysis.
Branch length(分支长度)
In sequence analysis, the number of sequence changes along a particular branch of a phylogenetic tree.
CDS or cds (编码序列)
Coding sequence.
Chebyshe, d inequality
The probability that a random variable exceeds its mean is less than or equal to the square of 1 over the number of standard deviations from the mean.
Clone (克隆)
Population of identical cells or molecules (e.g. DNA), derived from a single ancestor.
Cloning Vector (克隆载体)
A molecule that carries a foreign gene into a host, and allows/facilitates the multiplication of that gene in a host. When sequencing a gene that has been cloned using a cloning vector (rather than by PCR), care should be taken not to include the cloning vector sequence when performing similarity searches. Plasmids, cosmids, phagemids, YACs and PACs are example types of cloning vectors.
Cluster analysis(聚类分析)
A method for grouping together a set of objects that are most similar from a larger group of related objects. The relationships are based on some criterion of similarity or difference. For sequences, a similarity or distance score or a statistical evaluation of those scores is used.
Cobbler
A single sequence that represents the most conserved regions in a multiple sequence alignment. The BLOCKS server uses the cobbler sequence to perform a database similarity search as a way to reach sequences that are more divergent than would be found using the single sequences in the alignment for searches.
Coding system (neural networks)
Regarding neural networks, a coding system needs to be designed for representing input and output. The level of success found when training the model will be partially dependent on the quality of the coding system chosen.
Codon usageAnalysis of the codons used in a particular gene or organism.
COG(直系同源簇)
Clusters of orthologous groups in a set of groups of related sequences in microorganism and yeast (S. cerevisiae). These groups are found by whole proteome comparisons and include orthologs and paralogs. See also Orthologs and Paralogs.
Comparative genomics(比较基因组学)
A comparison of gene numbers, gene locations, and biological functions of genes in the genomes of diverse organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism.
Complexity (of an algorithm)(算法的复杂性)
Describes the number of steps required by the algorithm to solve a problem as a function of the amount of data; for example, the length of sequences to be aligned.
Conditional probability(条件概率)
The probability of a particular result (or of a particular value of a variable) given one or more events or conditions (or values of other variables).
Conservation (保守)
Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue.
Consensus(一致序列)
A single sequence that represents, at each subsequent position, the variation found within corresponding columns of a multiple sequence alignment.
Context-free grammars
A recursive set of production rules for generating patterns of strings. These consist of a set of terminal characters that are used to create strings, a set of nonterminal symbols that correspond to rules and act as placeholders for patterns that can be generated using terminal characters, a set of rules for replacing nonterminal symbols with terminal characters, and a start symbol.
Contig (序列重叠群/拼接序列)
A set of clones that can be assembled into a linear order. A DNA sequence that overlaps with another contig. The full set of overlapping sequences (contigs) can be put together to obtain the sequence for a long region of DNA that cannot be sequenced in one run in a sequencing assay. Important in genetic mapping at the molecular level.
CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准)
The Common Object Request Broker Architecture (CORBA) is an open industry standard for working with distributed objects, developed by the Object Management Group. CORBA allows the interconnection of objects and applications regardless of computer language, machine architecture, or geographic location of the computers.
Correlation coefficient(相关系数)A numerical measure, falling between – 1 and 1, of the degree of the linear relationship between two variables. A positive value indicates a direct relationship, a negative value indicates an inverse relationship, and the distance of the value away from zero indicates the strength of the relationship. A value near zero indicates no relationship between the variables.
Covariation (in sequences)(共变)
Coincident change at two or more sequence positions in related sequences that may influence the secondary structures of RNA or protein molecules.
Coverage (or depth) (覆盖率/厚度)
The average number of times a nucleotide is represented by a high-quality base in a collection of random raw sequence. Operationally, a ‘high-quality base’ is defined as one with an accuracy of at least 99% (corresponding to a PHRED score of at least 20).
Database(数据库)
A computerized storehouse of data that provides a standardized way for locating, adding, removing, and changing data. See also Object-oriented database, Relational database.
Dendogram
A form of a tree that lists the compared objects (e.g., sequences or genes in a microarray analysis) in a vertical order and joins related ones by levels of branches extending to one side of the list.
Depth (厚度)
See coverage
Dirichlet mixtures
Defined as the conjugational prior of a multinomial distribution. One use is for predicting the expected pattern of amino acid variation found in the match state of a hid-den Markov model (representing one column of a multiple sequence alignment of proteins), based on prior distributions found in conserved protein domains (blocks).
Distance in sequence analysis(序列距离)
The number of observed changes in an optimal alignment of two sequences, usually not counting gaps.
DNA Sequencing (DNA测序)
The experimental process of determining the nucleotide sequence of a region of DNA. This is done by labelling each nucleotide (A, C, G or T) with either a radioactive or fluorescent marker which identifies it. There are several methods of applying this technology, each with their advantages and disadvantages. For more information, refer to a current text book. High throughput laboratories frequently use automated sequencers, which are capable of rapidly reading large numbers of templates. Sometimes, the sequences may be generated more quickly than they can be characterised.
Domain (功能域)
A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function.Dot matrix(点标矩阵图)
Dot matrix diagrams provide a graphical method for comparing two sequences. One sequence is written horizontally across the top of the graph and the other along the left-hand side. Dots are placed within the graph at the intersection of the same letter appearing in both sequences. A series of diagonal lines in the graph indicate regions of alignment. The matrix may be filtered to reveal the most-alike regions by scoring a minimal threshold number of matches within a sequence window.
Draft genome sequence (基因组序列草图)
The sequence produced by combining the information from the individual sequenced clones (by creating merged sequence contigs and then employing linking information to create scaffolds) and positioning the sequence along the physical map of the chromosomes.
DUST (一种低复杂性区段过滤程序)
A program for filtering low complexity regions from nucleic acid sequences.
Dynamic programming(动态规划法)
A dynamic programming algorithm solves a problem by combining solutions to sub-problems that are computed once and saved in a table or matrix. Dynamic programming is typically used when a problem has many possible solutions and an optimal one needs to be found. This algorithm is used for producing sequence alignments, given a scoring system for sequence comparisons.
EMBL (欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一)
European Molecular Biology Laboratories. Maintain the EMBL database, one of the major public sequence databases.
EMBnet (欧洲分子生物学网络)
European Molecular Biology Network: http://www.embnet.org/ was established in 1988, and provides services including local molecular databases and software for molecular biologists in Europe. There are several large outposts of EMBnet, including EXPASY.
Entropy(熵)
From information theory, a measure of the unpredictable nature of a set of possible elements. The higher the level of variation within the set, the higher the entropy.
Erdos and Renyi law
In a toss of a “fair” coin, the number of heads in a row that can be expected is the logarithm of the number of tosses to the base 2. The law may be generalized for more than two possible outcomes by changing the base of the logarithm to the number of out-comes. This law was used to analyze the number of matches and mismatches that can be expected between random sequences as a basis for scoring the statistical significance of a sequence alignment.
EST (表达序列标签的缩写)
See Expressed Sequence Tag
Expect value (E)(E值)
E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning.
Expectation maximization (sequence analysis)
An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement.
Exon (外显子)

Coding region of DNA. See CDS.
Expressed Sequence Tag (EST) (表达序列标签)
Randomly selected, partial cDNA sequence; represents it’s corresponding mRNA. dbEST is a large database of ESTs at GenBank, NCBI.
FASTA (一种主要数据库搜索程序)
The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k-tup" variable which specifies the size of a "word". (Pearson and Lipman)
Extreme value distribution(极值分布)
Some measurements are found to follow a distribution that has a long tail which decays at high values much more slowly than that found in a normal distribution. This slow-falling type is called the extreme value distribution. The alignment scores between unrelated or random sequences are an example. These scores can reach very high values, particularly when a large number of comparisons are made, as in a database similarity search. The probability of a particular score may be accurately predicted by the extreme value distribution, which follows a double negative exponential function after Gumbel.
False negative(假阴性)
A negative data point collected in a data set that was incorrectly reported due to a failure of the test in avoiding negative results.
False positive (假阳性)
A positive data point collected in a data set that was incorrectly reported due to a failure of the test. If the test had correctly measured the data point, the data would have been recorded as negative.
Feed-forward neural network (反向传输神经网络)
Organizes nodes into sequence layers in which the nodes in each layer are fully connected with the nodes in the next layer, except for the final output layer. Input is fed from the input layer through the layers in sequence in a “feed-forward” direction, resulting in output at the final layer. See also Neural network.
Filtering (window size)
During pair-wise sequence alignment using the dot matrix method, random matches can be filtered out by using a sliding window to compare the two sequences. Rather than comparing a single sequence position at a time, a window of adjacent positions in the two sequences is compared and a dot, indicating a match, is generated only if a certain minimal number of matches occur.
Filtering (过滤)
Also known as Masking. The process of hiding regions of (nucleic acid or amino acid) sequence having characteristics that frequently lead to spurious high scores. See SEG and DUST.
Finished sequence(完成序列)
Complete sequence of a clone or genome, with an accuracy of at least 99.99% and no gaps.
Fourier analysis
Studies the approximations and decomposition of functions using trigonometric polynomials.
Format (file)(格式)
Different programs require that information be specified to them in a formal manner, using particular keywords and ordering. This specification is a file format.
Forward-backward algorithm
Used to train a hidden Markov model by aligning the model with training sequences. The algorithm then refines the model to reduce the error when fitted to the given data using a gradient descent approach.
FTP (File Transfer Protocol)(文件传输协议)
Allows a person to transfer files from one computer to another across a network using an FTP-capable client program. The FTP client program can only communicate with machines that run an FTP server. The server, in turn, will make a specific portion of its tile system available for FTP access, providing that the client is able to supply a recognized user name and password to the server.
Full shotgun clone (鸟枪法克隆)
A large-insert clone for which full shotgun sequence has been produced.
Functional genomics(功能基因组学)
Assessment of the function of genes identified by between-genome comparisons. The function of a newly identified gene is tested by introducing mutations into the gene and then examining the resultant mutant organism for an altered phenotype.
gap (空位/间隙/缺口)
A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment.
Gap penalty(空位罚分)
A numeric score used in sequence alignment programs to penalize the presence of gaps within an alignment. The value of a gap penalty affects how often gaps appear in alignments produced by the algorithm. Most alignment programs suggest gap penalties that are appropriate for particular scoring matrices.
Genetic algorithm(遗传算法)
A kind of search algorithm that was inspired by the principles of evolution. A population of initial solutions is encoded and the algorithm searches through these by applying a pre-defined fitness measurement to each solution, selecting those with the highest fitness for reproduction. New solutions can be generated during this phase by crossover and mutation operations, defined in the encoded solutions.
Genetic map (遗传图谱)
A genome map in which polymorphic loci are positioned relative to one another on the basis of the frequency with which they recombine during meiosis. The unit of distance is centimorgans (cM), denoting a 1% chance of recombination.
Genome(基因组)
The genetic material of an organism, contained in one haploid set of chromosomes.
Gibbs sampling method
An algorithm for finding conserved patterns within a set of related sequences. A guessed alignment of all but one sequence is made and used to generate a scoring matrix that represents the alignment. The matrix is then matched to the left-out sequence, and a probable location of the corresponding pattern is found. This prediction is then input into a new alignment and another scoring matrix is produced and tested on a new left-out sequence. The process is repeated until there is no further improvement in the matrix.
Global alignment(整体联配)
Attempts to match as many characters as possible, from end to end, in a set of twomore sequences.
Gopher (一个文档发布系统,允许检索和显示文本文件)
Graph theory(图论)
A branch of mathematics which deals with problems that involve a graph or network structure. A graph is defined by a set of nodes (or points) and a set of arcs (lines or edges) joining the nodes. In sequence and genome analysis, graph theory is used for sequence alignments and clustering alike genes.
GSS(基因综述序列)
Genome survey sequence.
GUI(图形用户界面)
Graphical user interface.
H (相对熵值)
H is the relative entropy of the target and background residue frequencies. (Karlin and Altschul, 1990). H can be thought of as a measure of the average information (in bits) available per position that distinguishes an alignment from chance. At high values of H, short alignments can be distinguished by chance, whereas at lower H values, a longer alignment may be necessary. (Altschul, 1991)
Half-bits
Some scoring matrices are in half-bit units. These units are logarithms to the base 2 of odds scores times 2.
Heuristic(启发式方法)
A procedure that progresses along empirical lines by using rules of thumb to reach a solution. The solution is not guaranteed to be optimal.
Hexadecimal system(16制系统)
The base 16 counting system that uses the digits O-9 followed by the letters A-F.
HGMP (人类基因组图谱计划)
Human Genome Mapping Project.
Hidden Markov Model (HMM)(隐马尔可夫模型)
In sequence analysis, a HMM is usually a probabilistic model of a multiple sequence alignment, but can also be a model of periodic patterns in a single sequence, representing, for example, patterns found in the exons of a gene. In a model of multiple sequence alignments, each column of symbols in the alignment is represented by a frequency distribution of the symbols called a state, and insertions and deletions by other states. One then moves through the model along a particular path from state to state trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to that particular state from a previous one (the transition probability). State and transition probabilities are then multiplied to obtain a probability of the given sequence. Generally speaking, a HMM is a statistical model for an ordered sequence of symbols, acting as a stochastic state machine that generates a symbol each time a transition is made from one state to the next. Transitions betweenstates are specified by transition probabilities.
Hidden layer(隐藏层)
An inner layer within a neural network that receives its input and sends its output to other layers within the network. One function of the hidden layer is to detect covariation within the input data, such as patterns of amino acid covariation that are associated with a particular type of secondary structure in proteins.
Hierarchical clustering(分级聚类)
The clustering or grouping of objects based on some single criterion of similarity or difference.An example is the clustering of genes in a microarray experiment based on the correlation between their expression patterns. The distance method used in phylogenetic analysis is another example.
Hill climbing
A nonoptimal search algorithm that selects the singular best possible solution at a given state or step. The solution may result in a locally best solution that is not a globally best solution.
Homology(同源性)
A similar component in two organisms (e.g., genes with strongly similar sequences) that can be attributed to a common ancestor of the two organisms during evolution.
Horizontal transfer(水平转移)
The transfer of genetic material between two distinct species that do not ordinarily exchange genetic material. The transferred DNA becomes established in the recipient genome and can be detected by a novel phylogenetic history and codon content com-pared to the rest of the genome.
HSP (高比值片段对)
High-scoring segment pair. Local alignments with no gaps that achieve one of the top alignment scores in a given search.
HTGS/HGT(高通量基因组序列)
High-throughout genome sequences
HTML(超文本标识语言)
The Hyper-Text Markup Language (HTML) provides a structural description of a document using a specified tag set. HTML currently serves as the Internet lingua franca for describing hypertext Web page documents.
Hyperplane
A generalization of the two-dimensional plane to N dimensions.
Hypercube
A generalization of the three-dimensional cube to N dimensions.
Identity (相同性/相同率)
The extent to which two (nucleotide or amino acid) sequences are invariant.
Indel(插入或删除的缩略语)
An insertion or deletion in a sequence alignment.
Information content (of a scoring matrix)
A representation of the degree of sequence conservation in a column of ascoring matrix representing an alignment of related sequences. It is also the number of questions that must be asked to match the column to a position in a test sequence. For bases, the max-imum possible number is 2, and for proteins, 4.32 (logarithm to the base 2 of the number of possible sequence characters).
Information theory(信息理论)
A branch of mathematics that measures information in terms of bits, the minimal amount of structural complexity needed to encode a given piece of information.
Input layer(输入层)
The initial layer in a feed-forward neural net. This layer encodes input information that will be fed through the network model.
Interface definition language
Used to define an interface to an object model in a programming language neutral form, where an interface is an abstraction of a service defined only by the operations that can be performed on it.
Internet(因特网)
The network infrastructure, consisting of cables interconnected by routers, that pro-vides global connectivity for individual computers and private networks of computers. A second sense of the word internet is the collective computer resources available over this global network.
Interpolated Markov model
A type of Markov model of sequences that examines sequences for patterns of variable length in order to discriminate best between genes and non-gene sequences.
Intranet(内部网)
Intron (内含子)
Non-coding region of DNA.
Iterative(反复的/迭代的)
A sequence of operations in a procedure that is performed repeatedly.
Java(一种由SUN Microsystem开发的编程语言)
K (BLAST程序的一个统计参数)
A statistical parameter used in calculating BLAST scores that can be thought of as a natural scale for search space size. The value K is used in converting a raw score (S) to a bit score (S’).
K-tuple(字/字长)
Identical short stretches of sequences, also called words.
lambda (λ,BLAST程序的一个统计参数)
A statistical parameter used in calculating BLAST scores that can be thought of as a natural scale for scoring system. The value lambda is used in converting a raw score (S) to a bit score (S’).
LAN(局域网)
Local area network.
Likelihood(似然性)The hypothetical probability that an event which has already occurred would yield a specific outcome. Unlike probability, which refers to future events, likelihood refers to past events.
Linear discriminant analysis
An analysis in which a straight line is located on a graph between two sets of data pointsin a location that best separates the data points into two groups.
Local alignment(局部联配)
Attempts to align regions of sequences with the highest density of matches. In doing so, one or more islands of subalignments are created in the aligned sequences.
Log odds score(概率对数值)
The logarithm of an odds score. See also Odds score.
Low Complexity Region (LCR) (低复杂性区段)
Regions of biased composition including homopolymeric runs, short-period repeats, and more subtle overrepresentation of one or a few residues. The SEG program is used to mask or filter LCRs in amino acid queries. The DUST program is used to mask or filter LCRs in nucleic acid queries.
Machine learning(机器学习)
The training of a computational model of a process or classification scheme to distinguish between alternative possibilities.
Markov chain(马尔可夫链)
Describes a process that can be in one of a number of states at any given time. The Markov chain is defined by probabilities for each transition occurring; that is, probabilities of the occurrence of state sj given that the current state is sp Substitutions in nucleic acid and protein sequences are generally assumed to follow a Markov chain in that each site changes independently of the previous history of the site. With this model, the number and types of substitutions observed over a relatively short period of evolutionary time can be extrapolated to longer periods of time. In performing sequence alignments and calculating the statistical significance of alignment scores, sequences are assumed to be Markov chains in which the choice of one sequence position is not influenced by another.
Masking (过滤)
Also known as Filtering. The removal of repeated or low complexity regions from a sequence in order to improve the sensitivity of sequence similarity searches performed with that sequence.
Maximum likelihood (phylogeny, alignment)(最大似然法)
The most likely outcome (tree or alignment), given a probabilistic model of evolutionary change in DNA sequences.
Maximum parsimony(最大简约法)
The minimum number of evolutionary steps required to generate the observed variation in a set of sequences, as found by comparison of the number of steps in all possible phylogenetic trees.
Method of momentsThe mean or expected value of a variable is the first moment of the values of the variable around the mean, defined as that number from which the sum of deviations to all values is zero. The standard deviation is the second moment of the values about the mean, and so on.
Minimum spanning tree
Given a set of related objects classified by some similarity or difference score, the mini-mum spanning tree joins the most-alike objects on adjacent outer branches of a tree and then sequentially joins less-alike objects by more inward branches. The tree branch lengths are calculated by the same neighbor-joining algorithm that is used to build phylogenetic trees of sequences from a distance matrix. The sum of the resulting branch lengths between each pair of objects will be approximately that found by the classification scheme.
MMDB (分子建模数据库)
Molecular Modelling Database. A taxonomy assigned database of PDB (see PDB) files, and related information.
Molecular clock hypothesis(分子钟假设)
The hypothesis that sequences change at the same rate in the branches of an evolutionary
tree.
Monte Carlo(蒙特卡罗法)
A method that samples possible solutions to a complex problem as a way to estimate a more general solution.
Motif (模序)
A short conserved region in a protein sequence. Motifs are frequently highly conserved parts of domains.
Multiple Sequence Alignment (多序列联配)
An alignment of three or more sequences with gaps inserted in the sequences such that residues with common structural positions and/or ancestral residues are aligned in the same column. Clustal W is one of the most widely used multiple sequence alignment programs
Mutation data matrix(突变数据矩阵,即PAM矩阵)
A scoring matrix compiled from the observation of point mutations between aligned sequences. Also refers to a Dayhoff PAM matrix in which the scores are given as log odds scores.
N50 length (N50长度,即覆盖50%所有核苷酸的最大序列重叠群长度)
A measure of the contig length (or scaffold length) containing a ‘typical’ nucleotide. Specifically, it is the maximum length L such that 50% of all nucleotides lie in contigs (or scaffolds) of size at least L.
Nats (natural logarithm)
A number expressed in units of the natural logarithm.
NCBI (美国国家生物技术信息中心)
National Center for Biotechnology Information (USA). Created by the United States Congress in 1988, to develop information systems to support thebiological research community.
Needleman-Wunsch algorithm(Needleman-Wunsch算法)
Uses dynamic programming to find global alignments between sequences.
Neighbor-joining method(邻接法)
Clusters together alike pairs within a group of related objects (e.g., genes with similar sequences) to create a tree whose branches reflect the degrees of difference among the objects.
Neural network(神经网络)
From artificial intelligence algorithms, techniques that involve a set of many simple units that hold symbolic data, which are interconnected by a network of links associated with numeric weights. Units operate only on their symbolic data and on the inputs that they receive through their connections. Most neural networks use a training algorithm (see Back-propagation) to adjust connection weights, allowing the network to learn associations between various input and output patterns. See also Feed-forward neural network.
NIH (美国国家卫生研究院)
National Institutes of Health (USA).
Noise(噪音)
In sequence analysis, a small amount of randomly generated variation in sequences that is added to a model of the sequences; e.g., a hidden Markov model or scoring matrix, in order to avoid the model overfitting the sequences. See also Overfitting.
Normal distribution(正态分布)
The distribution found for many types of data such as body weight, size, and exam scores. The distribution is a bell-shaped curve that is described by a mean and standard deviation of the mean. Local sequence alignment scores between unrelated or random sequences do not follow this distribution but instead the extreme value distribution which has a much extended tail for higher scores. See also Extreme value distribution.
Object Management Group (OMG)(国际对象管理协作组)
A not-for-profit corporation that was formed to promote component-based software by introducing standardized object software. The OMG establishes industry guidelines and detailed object management specifications in order to provide a common framework for application development. Within OMG is a Life Sciences Research group, a consortium representing pharmaceutical companies, academic institutions, software vendors, and hardware vendors who are working together to improve communication and inter-operability among computational resources in life sciences research. See CORBA.
Object-oriented database(面向对象数据库)
Unlike relational databases (see entry), which use a tabular structure, object-oriented databases attempt to model the structure of a given data set as closely as possible. In doing so, object-oriented databases tend to reduce the appearance of duplicated data and the complexity of query structure often found in relational databases.Odds score(概率/几率值)
The ratio of the likelihoods of two events or outcomes. In sequence alignments and scoring matrices, the odds score for matching two sequence characters is the ratio of the frequency with which the characters are aligned in related sequences divided by the frequency with which those same two characters align by chance alone, given the frequency of occurrence of each in the sequences. Odds scores for a set of individually aligned positions are obtained by multiplying the odds scores for each position. Odds scores are often converted to logarithms to create log odds scores that can be added to obtain the log odds score of a sequence alignment.
OMIM (一种人类遗传疾病数据库)
Online Mendelian Inheritance in Man. Database of genetic diseases with references to molecular medicine, cell biology, biochemistry and clinical details of the diseases.
Optimal alignment(最佳联配)
The highest-scoring alignment found by an algorithm capable of producing multiple solutions. This is the best possible alignment that can be found, given any parameters supplied by the user to the sequence alignment program.
ORF (开放阅读框)
Open Reading Frame. A series of codons (base triplets) which can be translated into a protein. There are six potential reading frames of an unidentifed sequence; TBLASTN (see BLAST) transalates a nucleotide sequence in all six reading frames, into a protein, then attempts to align the results to sequeneces in a protein database, returning the results as a nucleotide sequence. The most likely reading frame can be identified using on-line software (e.g. ORF Finder).
Orthologous(直系同源)
Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. A pair of genes found in two species are orthologous when the encoded proteins are 60-80% identical in an alignment. The proteins almost certainly have the same three-dimensional structure, domain structure, and biological function, and the encoding genes have originated from a common ancestor gene at an earlier evolutionary time. Two orthologs 1 and II in genomes A and B, respectively, may be identified when the complete genomes of two species are available: (1) in a database similarity search of all of the proteome of B using I as a query, II is the best hit found, and (2) I is the best hit when 11 is used as a query of the proteome of B. The best hit is the database sequence with the highest expect value (E). Orthology is also predicted by a very close phylogenetic relationship between sequences or by a cluster analysis. Compare to Paralogs. See also Cluster analysis.
Output layer(输出层)
The final layer of a neural network in which signals from lower levels in the network are input into output states where they are weighted and summed togive an outpu t signal. For example, the output signal might be the prediction of one type of protein secondary structure for the central amino acid in a sequence window.
Overfitting
Can occur when using a learning algorithm to train a model such as a neural net or hid-den Markov model. Overfitting refers to the model becoming too highly representative of the training data and thus no longer representative of the overall range of data that is supposed to be modeled.

P value (P值/概率值)
The probability of an alignment occurring with the score in question or better. The p value is calculated by relating the observed alignment score, S, to the expected distribution of HSP scores from comparisons of random sequences of the same length and composition as the query to the database. The most highly significant P values will be those close to 0. P values and E values are different ways of representing the significance of the alignment.
Pair-wise sequence alignment(双序列联配)
An alignment performed between two sequences.
PAM (可接受突变百分率/可以观察到的突变百分率,它可作为一种进化时间单位)
Percent Accepted Mutation. A unit introduced by Dayhoff et al. to quantify the amount of evolutionary change in a protein sequence. 1.0 PAM unit, is the amount of evolution which will change, on average, 1% of amino acids in a protein sequence. A PAM(x) substitution matrix is a look-up table in which scores for each amino acid substitution have been calculated based on the frequency of that substitution in closely related proteins that have experienced a certain amount (x) of evolutionary divergence.
Paralogous (旁系同源)
Homologous sequences within a single species that arose by gene duplication. Genes that are related through gene duplication events. These events may lead to the production of a family of related proteins with similar biological functions within a species. Paralogous gene families within a species are identified by using an individual protein as a query in a database similarity search of the entireproteome of an organism. The process is repeated for the entire proteome and the resulting sets of related proteins are then searched for clusters that are most likely to have a conserved domain structure and should represent a paralogous gene family.
Parametric sequence alignment
An algorithm that finds a range of possible alignments based on varying the parameters of the scoring system for matches, mismatches, and gap penalties. An example is the Bayes block aligner.
PDB (主要蛋白质结构数据库之一)
Brookhaven Protein Data Bank. A database and format of files which describe the 3D structure of a protein or nucleic acid, as determined by X-ray crystallography or nuclear magnetic resonance (NMR) imaging. Themolecules described by the files are usually viewed locally by dedicated software, but can sometimes be visualised on the world wide web.
Pearson correlation coefficent(Pearson相关系数)
A measure of the correlation between two variables that reflects the degree to which the two variables are related. For example, the coefficient is used as a measure of similarity of gene expression in a microarray experiment. See also Correlation coefficient. Percent identity The percentage of the columns in an alignment of two sequences that includes identical amino acids. Columns in the alignment that include gaps are not scored in the calculation.
Percent similarity(相似百分率)
The percentage of the columns in an alignment of two sequences that includes either identical amino acids or amino acids that are frequently found substituted for each other in sequences of related proteins (conservative substitutions). These substitutions may be found in an amino acid substitution matrix such as the Dayhoff PAM and Henikoff BLOSUM matrices. Columns in the alignment that include gaps are not scored in the calculation.
Perceptron(感知器,模拟人类视神经控制系统的图形识别机)
A neural network in which input and output states are directly connected without intervening hidden layers.
PHRED (一种广泛应用的原始序列分析程序,可以对序列的各个碱基进行识别和质量评价)
A widely used computer program that analyses raw sequence to produce a ‘base call’ with an associated ‘quality score’ for each position in the sequence. A PHRED quality score of X corresponds to an error probability of approximately 10-X/10. Thus, a PHRED quality score of 30 corresponds to 99.9% accuracy for the base call in the raw read.
PHRAP (一种广泛应用的原始序列组装程序)
A widely used computer program that assembles raw sequence into sequence contigs and assigns to each position in the sequence an associated ‘quality score’, on the basis of the PHRED scores of the raw sequence reads. A PHRAP quality score of X corresponds to an error probability of approximately 10-X/10. Thus, a PHRAP quality score of 30 corresponds to 99.9% accuracy for a base in the assembled sequence.
Phylogenetic studies(系统发育研究)
PIR (主要蛋白质序列数据库之一,翻译自GenBank)
A database of translated GenBank nucleotide sequences. PIR is a redundant (see Redundancy) protein sequence database. The database is divided into four categories:
PIR1 – Classified and annotated.
PIR2 – Annotated.
PIR3 – Unverified.
PIR4 – Unencoded or untranslated.
Poisson distribution(帕松分布)
Used to predict the occurrence of infrequent events over a long period of timeor when there are a large number of trials. In sequence analysis, it is used to calculate the chance that one pair of a large number of pairs of unrelated sequences may give a high local alignment score.
Position-specific scoring matrix (PSSM)(特定位点记分矩阵,PSI-BLAST等搜索程序使用)
The PSSM gives the log-odds score for finding a particular matching amino acid in a target sequence. Represents the variation found in the columns of an alignment of a set of related sequences. Each subsequent matrix column corresponds to the next column in the alignment and each row corresponds to a particular sequence character (one of four bases in DNA sequences or 20 amino acids in protein sequences). Matrix values are log odds scores obtained by dividing the counts of the residue in the alignment, dividing by the expected number of counts based on sequence composition, and converting the ratio to a log score. The matrix is moved along sequences to find similar regions by adding the matching log odds scores and looking for high values. There is no allowance for gaps. Also called a weight matrix or scoring matrix.
Posterior (Bayesian analysis)
A conditional probability based on prior knowledge and newly evaluated relationships among variables using Bayes rule. See also Bayes rule.
Prior (Bayesian analysis)
The expected distribution of a variable based on previous data.
Profile(分布型)
A matrix representation of a conserved region in a multiple sequence alignment that allows for gaps in the alignment. The rows include scores for matching sequential columns of the alignment to a test sequence. The columns include substitution scores for amino acids and gap penalties. See also PSSM.
Profile hidden Markov model(分布型隐马尔可夫模型)
A hidden Markov model of a conserved region in a multiple sequence alignment that includes gaps and may be used to search new sequences for similarity to the aligned sequences.
Proteome(蛋白质组)
The entire collection of proteins that are encoded by the genome of an organism. Initially the proteome is estimated by gene prediction and annotation methods but eventually will be revised as more information on the sequence of the expressed genes is obtained.
Proteomics (蛋白质组学)
Systematic analysis of protein expression of normal and diseased tissues that involves the separation, identification and characterization of all of the proteins in an organism.
Pseudocounts
Small number of counts that is added to the columns of a scoring matrix to increase the variability either to avoid zero counts or to add more variation than was found in the sequences used to produce the matrix.PSI-BLAST (BLAST系列程序之一)
Position-Specific Iterative BLAST. An iterative search using the BLAST algorithm. A profile is built after the initial search, which is then used in subsequent searches. The process may be repeated, if desired with new sequences found in each cycle used to refine the profile. Details can be found in this discussion of PSI-BLAST. (Altschul et al.)
PSSM (特定位点记分矩阵)
See position-specific scoring matrix and profile.
Public sequence databases (公共序列数据库,指GenBank、EMBL和DDBJ)
The three coordinated international sequence databases: GenBank, the EMBL data library and DDBJ.
Q20 (Quality score 20)
A quality score of > or = 20 indicates that there is less than a 1 in 100 chance that the base call is incorrect. These are consequently high-quality bases. Specifically, the quality value "q" assigned to a basecall is defined as:
q = -10 x log10(p)
where p is the estimated error probability for that basecall. Note that high quality values correspond to low error probabilities, and conversely.
Quality trimming
This is an algorithm which uses a sliding window of 50 bases and trims from the 5′ end of the read followed by the 3′ end. With each window, the number of low quality (10 or less) bases is determined. If more than 5 bases are below the threshold quality, the window is incremented by one base and the process is repeated. When the low quality test fails, the position where it stopped is recorded. The parameters for window length low quality threshold and number of low quality bases tolerated are fixed. The positions of the 5′ and 3′ boundaries of the quality region are noted in the plot of quality values presented in the" Chromatogram Details" report.
Query (待查序列/搜索序列)
The input sequence (or other type of search term) with which all of the entries in a database are to be compared.
Radiation hybrid (RH) map (辐射杂交图谱)
A genome map in which STSs are positioned relative to one another on the basis of the frequency with which they are separated by radiation-induced breaks. The frequency is assayed by analysing a panel of human–hamster hybrid cell lines, each produced by lethally irradiating human cells and fusing them with recipient hamster cells such that each carries a collection of human chromosomal fragments. The unit of distance is centirays (cR), denoting a 1% chance of a break occuring between two loci
Raw Score (初值,指最初得到的联配值S)
The score of an alignment, S, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (see PAM, BLOSUM). Gap scores are typically calculated as the sum of G, the gap opening penaltyand L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln. The choice of gap costs, G and L is empirical, but it is customary to choose a high value for G (10-15)and a low value for L (1-2).
Raw sequence (原始序列/读胶序列)
Individual unassembled sequence reads, produced by sequencing of clones containing DNA inserts.
Receiver operator characteristic
The receiver operator characteristic (ROC) curve describes the probability that a test will correctly declare the condition present against the probability that the test will declare the condition present when actually absent. This is shown through a graph of the tesls sensitivity against one minus the test specificity for different possible threshold values.
Redundancy (冗余)
The presence of more than one identical item represents redundancy. In bioinformatics, the term is used with reference to the sequences in a sequence database. If a database is described as being redundant, more than one identical (redundant) sequence may be found. If the database is said to be non-redundant (nr), the database managers have attempted to reduce the redundancy. The term is ambiguous with reference to genetics, and as such, the degree of non-redundancy varies according to the database manager’s interpretation of the term. One can argue whether or not two alleles of a locus defines the limit of redundancy, or whether the same locus in different, closely related organisms constitutes redundency. Non-redundant databases are, in some ways, superior, but are less complete. These factors should be taken into consideration when selecting a database to search.
Regular expressions
This computational tool provides a method for expressing the variations found in a set of related sequences including a range of choices at one position, insertions, repeats, and so on. For example, these expressions are used to characterize variations found in protein domains in the PROSITE catalog.
Regularization
A set of techniques for reducing data overfitting when training a model. See also Overfitting.
Relational database(关系数据库)
Organizes information into tables where each column represents the fields of informa-tion that can be stored in a single record. Each row in the table corresponds to a single record. A single database can have many tables and a query language is used to access the data. See also Object-oriented database.
Scaffold (支架,由序列重叠群拼接而成)
The result of connecting contigs by linking information from paired-end reads from plasmids, paired-end reads from BACs, known messenger RNAs or other sources. The contigs in a scaffold are ordered and oriented with respect to one another.
Scoring matrix(记分矩阵)
See Position-specific scoring matrix.
SEG (一种蛋白质程序低复杂性区段过滤程序)
A program for filtering low complexity regions in amino acid sequences. Residues that have been masked are represented as "X" in an alignment. SEG filtering is performed by default in the blastp subroutine of BLAST 2.0. (Wootton and Federhen)
Selectivity (in database similarity searches)(数据库相似性搜索的选择准确性)
The ability of a search method to locate members of a protein family without making a false-positive classification of members of other families.
Sensitivity (in database similarity searches)(数据库相似性搜索的灵敏性)
The ability of a search method to locate as many members of a protein family as possi-ble, including distant members of limited sequence similarity.
Sequence Tagged Site (序列标签位点)
Short cDNA sequences of regions that have been physically mapped. STSs provide unique landmarks, or identifiers, throughout the genome. Useful as a framework for further sequencing.
Significance(显著水平)
A significant result is one that has not simply occurred by chance, and therefore is prob-ably true. Significance levels show how likely a result is due to chance, expressed as a probability. In sequence analysis, the significance of an alignment score may be calcu-lated as the chance that such a score would be found between random or unrelated sequences. See Expect value.
Similarity score (sequence alignment) (相似性值)
Similarity means the extent to which nucleotide or protein sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score. The sum of the number of identical matches and conservative (high scoring) substitu-tions in a sequence alignment divided by the total number of aligned sequence charac-ters. Gaps are usually ignored.
Simulated annealing
A search algorithm that attempts to solve the problem of finding global extrema. The algorithm was inspired by the physical cooling process of metals and the freezing process in liquids where atoms slow down in movement and line up to form a crystal. The algorithm traverses the energy levels of a function, always accepting energy levels that are smaller than previous ones, but sometimes accepting energy levels that are greater, according to the Boltzmann probability distribution.
Single-linkage cluster analysis
An analysis of a group of related objects, e.g., similar proteins in different genomes to identify both close and more distantrelationships, represented on a tree or dendogram. The method joins the most closely related pairs by the neighbor-joining algorithm by representing these pairs as outer branches onthe tree. More distant objects are then pro-gressively added to lower tree branches. The method is also used to predict phylogenet-ic relationships by distance methods. See also Hierarchical clustering, Neighbor-joining method.
Smith-Waterman algorithm(Smith-Waterman算法)
Uses dynamic programming to find local alignments between sequences. The key fea-ture is that all negative scores calculated in the dynamic programming matrix are changed to zero in order to avoid extending poorly scoring alignments and to assist in identifying local alignments starting and stopping anywhere with the matrix.
SNP (单核苷酸多态性)
Single nucleotide polymorphism, or a single nucleotide position in the genome sequence for which two or more alternative alleles are present at appreciable frequency (traditionally, at least 1%) in the human population.
Space or time complexity(时间或空间复杂性)
An algorithms complexity is the maximum amount of computer memory or time required for the number of algorithmic steps to solve a problem.
Specificity (in database similarity searches)(数据库相似性搜索的特异性)
The ability of a search method to locate members of one protein family, including dis-tantly related members.
SSR (简单序列重复)
Simple sequence repeat, a sequence consisting largely of a tandem repeat of a specific k-mer (such as (CA)15). Many SSRs are polymorphic and have been widely used in genetic mapping.
Stochastic context-free grammar
A formal representation of groups of symbols in different parts of a sequence; i.e., not in the same context. An example is complementary regions in RNA that will form sec-ondary
structures. The stochastic feature introduces variability into such regions.
Stringency
Refers to the minimum number of matches required within a window. See also Filtering.
STS (序列标签位点的缩写)
See Sequence Tagged Site
Substitution (替换)
The presence of a non-identical amino acid at a given position in an alignment. If the aligned residues have similar physico-chemical properties the substitution is said to be "conservative".
Substitution Matrix (替换矩阵)
A substitution matrix containing values proportional to the probability that amino acid i mutates into amino acid j for all pairs of amino acids. such matrices are constructed by assembling a large and diverse sample of verified pairwise alignments of amino acids. If the sample is large enough to be statistically significant, the resulting matrices should reflect the true probabilities of mutations occuring through a period of evolution.Sum of pairs method
Sums the substitution scores of all possible pair-wise combinations of sequence charac-ters in one column of a multiple sequence alignment.
SWISS-PROT (主要蛋白质序列数据库之一)
A non-redundant (See Redundancy) protein sequence database. Thoroughly annotated and cross referenced. A subdivision is TrEMBL.
Synteny
The presence of a set of homologous genes in the same order on two genomes.
Threading
In protein structure prediction, the aligning of the sequence of a protein of unknown structure with a known three-dimensional structure to determine whether the amino acid sequence is spatially and chemically compatible with that structure.
TrEMBL (蛋白质数据库之一,翻译自EMBL)
A protein sequence database of Translated EMBL nucleotide sequences.
Uncertainty(不确定性)
From information theory, a logarithmic measure of the average number of choices that must be made for identification purposes. See also Information content.
Unified Modeling Language (UML)
A standard sanctioned by the Object Management Group that provides a formal nota-tion for describing object-oriented design.
UniGene (人类基因数据库之一)
Database of unique human genes, at NCBI. Entries are selected by near identical presence in GenBank and dbEST databases. The clusters of sequences produced are considered to represent a single gene.
Unitary Matrix (一元矩阵)
Also known as Identity Matrix. A scoring system in which only identical characters receive a positive score.
URL(统一资源定位符)
Uniform resource locator.
Viterbi algorithm
Calculates the optimal path of a sequence through a hidden Markov model of sequences using a dynamic programming algorithm.
Weight matrix
See Position-specific scoring matrix.

国际研究小组完成多细胞团藻基因组测序

有助于更好地理解人类等复杂生物的进化史

德国比勒费尔德大学7月9日报告说,一个国际研究小组最近完成了对最简单的多细胞生物团藻的基因组测序。科研人员希望以此帮助探寻单细胞生物向多细胞生物演变的奥秘。

单细胞生物怎么能演变为多细胞生物乃至人这样高度复杂的生物,一直是生物研究的重要课题。一个由德国、美国、加拿大和日本科研人员组成的研究小组选择从团藻入手,因为团藻的细胞种类十分简单。此外,团藻还有一个单细胞近亲——莱茵衣藻,后者的基因组测序已在2007年完成。

在美国《科学》(Science)杂志9日发表的最新研究报告中,上述研究小组发现团藻的基因组有大约1.4亿个碱基对,包含大约1.45万个基因,比人类基因总数仅少不到一半。参与这项研究的比勒费尔德大学专家说,研究小组在比较团藻和莱茵衣藻基因组时意外发现,尽管这两种生物的复杂程度和生命史存在很大差异,二者的基因组却有相似的蛋白编码潜能。与莱茵衣藻相比,专家在团藻细胞内只发现了很少该生物特有的基因。科研人员由此推断,从单细胞生物演变为多细胞生物并非必需大幅提高基因的数目,在这种演变中,基因如何以及何时编码合成特定的蛋白才具有决定意义。

德国专家说,在单细胞生物向多细胞生物演变的分子机理研究方面,团藻基因组测序是了解这一分子机理的重要一步。长期而言,研究简单生物的分子机理有助于更好地理解人类等复杂生物的进化史。

 

Science 9 July 2010:
Vol. 329. no. 5988, pp. 223 – 226
DOI: 10.1126/science.1188800

Reports

Genomic Analysis of Organismal Complexity in the Multicellular Green Alga Volvox carteri

Simon E. Prochnik,1,* James Umen,2,*,{dagger} Aurora M. Nedelcu,3 Armin Hallmann,4 Stephen M. Miller,5 Ichiro Nishii,6 Patrick Ferris,2 Alan Kuo,1 Therese Mitros,7 Lillian K. Fritz-Laylin,7 Uffe Hellsten,1 Jarrod Chapman,1 Oleg Simakov,8 Stefan A. Rensing,9 Astrid Terry,1 Jasmyn Pangilinan,1 Vladimir Kapitonov,10 Jerzy Jurka,10 Asaf Salamov,1 Harris Shapiro,1 Jeremy Schmutz,11 Jane Grimwood,11 Erika Lindquist,1 Susan Lucas,1 Igor V. Grigoriev,1 Rüdiger Schmitt,12 David Kirk,13 Daniel S. Rokhsar1,7,{dagger}

The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are well suited for the investigation of the evolution of multicellularity and development. We sequenced the 138–mega–base pair genome of V. carteri and compared its ~14,500 predicted proteins to those of its unicellular relative Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similar protein-coding potentials and few species-specific protein-coding gene predictions. Volvox is enriched in volvocine-algal–specific proteins, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.

1 U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA 94598, USA.
2 The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
3 University of New Brunswick, Department of Biology, Fredericton, New Brunswick E3B 5A3, Canada.
4 Department of Cellular and Developmental Biology of Plants, University of Bielefeld, D-33615 Bielefeld, Germany.
5 Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
6 Biological Sciences, Nara Women’s University, Nara-shi, Nara Prefecture 630-8506, Japan.
7 Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, CA 94720, USA.
8 European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
9 Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.
10 Genetic Information Research Institute, 1925 Landings Drive, Mountain View, CA 94043, USA.
11 HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA.
12 Department of Genetics, University of Regensburg, D-93040 Regensburg, Germany.
13 Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA.

* These authors contributed equally to this work.

{dagger} To whom correspondence should be addressed. E-mail: umen@salk.edu (J.U.); dsrokhsar@gmail.com (D.S.R.)

Read the Full Text

睡眠过程中的大脑能量补充

睡眠过程中的大脑能量补充
根据6月30日的Journal of Neuroscience杂志上的一项最新研究,在睡眠的初始阶段,大脑一些区域中的能量水平显著降低,而这些区域在清醒状态下是表现活跃的。研究表明大脑在睡眠的时候经历着细胞能量的补充过程,该过程对清醒状态下大脑执行正常的功能来说是必需的。
晚间良好的休息能够使我们恢复精力,但是在睡眠过程中实际发生的生物学过程却很难捉摸。科学家认为大脑的能量水平是夜间恢复的关键。文章作者 Basheer教授表示,他们的这项研究将解释一个重要的生物学问题,即睡眠的功能。在某种程度上他们也有点惊讶,因为最近这些年还没有科研人员使用最敏感的测量方法进行大脑能量研究。
在这项研究中,研究人员测量了大鼠中三磷酸腺苷(ATP)的水平,结果发现当老鼠处在非快速动眼睡眠中,在4个清醒状态下活跃的大脑关键区域中,ATP水平增加,同时伴随着大脑活性的全面降低。当大鼠处在清醒状态时,ATP水平是稳定的。此外,当轻轻触碰正常睡眠时间过去3或6个小时的清醒老鼠,ATP水平也不会增加。因此,研究人员得出结论睡眠对于ATP能量补充是必需的,保持大鼠处在清醒状态则会阻止ATP水平上升。
美国睡眠研究方面的专家Robert Greene博士认为,这项研究为"依赖于睡眠的能量上升对于促进能量补充的生物合成过程来说是必需的"这一理论提供了一个十分有趣的证据。另外 Robert Greene博士就该发现提出了一个疑问,这项研究的作者提出睡眠过程中能量的上升和大脑中细胞活性的降低有关,但是这同样也可能是一些其他的因素导致的,其中就包括大脑的细胞信号过程。

The Journal of Neuroscience doi:10.1523/JNEUROSCI.1423-10.2010
Sleep and Brain Energy Levels: ATP Changes during Sleep
Markus Dworak, Robert W. McCarley, Tae Kim, Anna V. Kalinchuk, and Radhika Basheer
Laboratory of Neuroscience, Department of Psychiatry, Veterans Affairs Boston Healthcare System and Harvard Medical School, West Roxbury, Massachusetts 02132
Sleep is one of the most pervasive biological phenomena, but one whose function remains elusive. Although many theories of function, indirect evidence, and even common sense suggest sleep is needed for an increase in brain energy, brain energy levels have not been directly measured with modern technology. We here report that ATP levels, the energy currency of brain cells, show a surge in the initial hours of spontaneous sleep in wake-active but not in sleep-active brain regions of rat. The surge is dependent on sleep but not time of day, since preventing sleep by gentle handling of rats for 3 or 6 h also prevents the surge in ATP. A significant positive correlation was observed between the surge in ATP and EEG non-rapid eye movement delta activity (0.5–4.5 Hz) during spontaneous sleep. Inducing sleep and delta activity by adenosine infusion into basal forebrain during the normally active dark period also increases ATP. Together, these observations suggest that the surge in ATP occurs when the neuronal activity is reduced, as occurs during sleep. The levels of phosphorylated AMP-activated protein kinase (P-AMPK), well known for its role in cellular energy sensing and regulation, and ATP show reciprocal changes. P-AMPK levels are lower during the sleep-induced ATP surge than during wake or sleep deprivation. Together, these results suggest that sleep-induced surge in ATP and the decrease in P-AMPK levels set the stage for increased anabolic processes during sleep and provide insight into the molecular events leading to the restorative biosynthetic processes occurring during sleep.

mRNA作用远非编码蛋白那么单一

mRNA作用远非编码蛋白那么单一
中心法则告诉我们,遗传信息是从DNA传递给mRNA,再从mRNA传递给蛋白质,从而完成遗传信息的转录和翻译过程的。根据这一法则,mRNA似乎只有唯一的功能,即编码蛋白质。最近,美国柏斯以色列狄肯尼斯医学中心癌症(BIDMC)遗传学研究小组的专家认为,mRNA的功能并没有那么单一。
参与竞争的“卧底”
研究小组发表在近日《自然》杂志上的文章指出,除了编码蛋白质外,RNA之间相互沟通的能力也赋予了它们一种新的功能:通过竞争来参与调控基因表达,并且这种新功能在数以千计的非编码RNA身上也有所体现。这一发现具有十分重要的意义,目前科学家已掌握的功能性遗传信息库很可能因此获得极大的增容。
此前研究认为,小分子RNA(microRNA)可以通过与mRNA绑定来抑制基因表达,阻止mRNA向蛋白质传递遗传信息,因此其与包括癌症在内的许多人类疾病都有关。而这项新研究发现,大自然精心设计了一幕精彩的“潜伏”活动,成千上万的mRNA、非编码RNA以及所谓的假基因纷纷“装扮”成小分子 RNA的面目,共同从事着“卧底”工作,从而形成了一类新的遗传因素。这些遗传因素一旦发生变异,就会引发癌症或其他人类疾病
负责该研究的BIDMC癌症研究中心主任潘多尔菲表示,尽管传统上人们认为小分子RNA会抑制mRNA的功能,但事实可能恰恰相反,换句话说,并不是小分子RNA绑定了mRNA,而是RNA扣押了小分子RNA,从而保护了mRNA的表达,并使小分子RNA对其他靶基因无效。研究人员将这种情况定义为竞争性内源RNA。
为进一步检验他们的假设,研究小组将目光转向不编码蛋白的假基因。由于假基因或多或少与祖先基因相同,因此它们也是正常基因的竞争者,同样能够识别并争夺相同的小分子RNA。
研究小组分析了编码肿瘤抑制基因PTEN的RNA以及与该基因密切相关的假基因PTENP1之间的相互作用。通过这种新机制,他们证明了PTENP1同样也是一种肿瘤抑制因子。之后他们如法炮制,证实了与致癌基因KRAS相关的假基因KRAS1P也是致癌基因。
“潜伏者”的密语令人神往
潘多尔菲及哈佛医学院的乔治·赖斯曼教授说,细胞中的非编码RNA分子同样具有这个新功能,这不仅意味着科学家发现了mRNA的新表达方式,而且1.7万个假基因以及多达1万个长片段非编码RNA所使用的“语言”也可能被破译。如此一来,估计将有3万个新的遗传因素的功能将被科学家掌握,使得细胞和肿瘤生物学的调控方式提升至一个新的水平,并使功能基因组的规模增加一倍。
潘多尔菲表示,科学家现在已经开始重视RNA分子之间的竞争性问题,虽然过去人们还很难梳理出相关信息,但他们现在已经知道如何倾听RNA的语言,并通过这种方式来预测哪些RNA是竞争性内源RNA,进而掌握它们的功能。目前科学家已经掌握了数千个与人类疾病相关的RNA分子。这些发现将为生物学奠定新的基础,有助于开发出新手段来迅速识别与人类疾病相关的基因,并掌握其功能和作用,从而提高诊断和治疗水平。

Nature doi:10.1038/nature09144
A coding-independent function of gene and pseudogene mRNAs regulates tumour biology
Laura Poliseno,Leonardo Salmena,Jiangwen Zhang,Brett Carver,William J. Haveman& Pier Paolo Pandolfi
The canonical role of messenger RNA (mRNA) is to deliver protein-coding information to sites of protein synthesis. However, given that microRNAs bind to RNAs, we hypothesized that RNAs could possess a regulatory role that relies on their ability to compete for microRNA binding, independently of their protein-coding function. As a model for the protein-coding-independent role of RNAs, we describe the functional relationship between the mRNAs produced by the PTEN tumour suppressor gene and its pseudogene PTENP1 and the critical consequences of this interaction. We find that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role. We also show that the PTENP1 locus is selectively lost in human cancer. We extended our analysis to other cancer-related genes that possess pseudogenes, such as oncogenic KRAS. We also demonstrate that the transcripts of protein-coding genes such as PTEN are biologically active. These findings attribute a novel biological role to expressed pseudogenes, as they can regulate coding gene expression, and reveal a non-coding function for mRNAs.

章鱼:海洋里的高智商怪物

生长速度快
    儒勒·凡尔纳梦见过它,太平洋造就了它。一种被称之为Octopus dofleini的庞大章鱼统治着从加利福尼亚到日本的北太平洋沿岸水域。
    这种章鱼的确很大:雄章鱼平均直径3米,重25—30公斤:雌章鱼直径不到2.5米,重15公斤。但没有人确切知道章鱼到底能长到多大,海洋有时会把巨型章鱼抛上沙滩。但一直到1957年以前,人们甚至没有想到海里会有那么大的章鱼。1957年,一只巨型章鱼搁浅在加拿大不列颠哥伦比亚省的海滩上,它直径 9.6米,重272公斤。25年来一直从事头足纲动物研究的吉姆·科斯格罗夫认为,只有加拿大西海岸的海域才会存在如此巨大的章鱼。那里温和的气候和水中丰富的食物,保证了童鱼能够长成巨大的身躯。
    拟态伪装术
    捕食是章鱼的主要活动,也是它们以惊人的速度生长的秘密。长到2—3岁时,章鱼会过度进食,达到成年体型,即由直径20厘米长到直径3米。科学家从章鱼 “住所”门口的大片蟹壳得知了章鱼的食谱。章鱼吃所有甲壳动物和贝壳类动物,当食物从自己住所附近经过时,章鱼用一条灵活的腕抓住食物(这很容易,因为章鱼有八条腕)。对付鱼类和小一点的鲨鱼,章鱼另有其它更完善的捕猎技术,例如“降落伞”捕措术:章鱼先是定位好猎物的位置,然后伸展开有1600个吸盘的腕,猛然扑向猎物,不给对方留任何逃脱的机会。章鱼还能够像最灵活的变色龙一样拟态,能够改变自身的颜色和构造,变得如同一块覆盖着藻类的石头,然后突然扑向猎物,而猎物根本没有时间意识到发生了什么事情。拟态伪装术还可以使章鱼躲过凶残的敌人(如海豹)的“毒手”。
    千万年过去了,章鱼的进化方向也不尽相同。有的章鱼能够分泌出一种足以把人杀死的超强毒素,有的章鱼(如深海章鱼)的吸盘则变成了发光器官以吸引猎物…… 但所有章鱼都具有“概念思维”(这一名称是专家们经过多年观察后认可的)。
    构造极特别
    吉姆·科斯格罗夫说:“章鱼是地球上曾经出现的与人类差异最大的生物之一。”章鱼有很发达的眼睛,这是它与人类唯一的相似之处。它在其他方面与人很不相同:章鱼有三个心脏,两个记忆系统(一个是大脑记忆系统,另一个记忆系统则直接与吸盘相连),一些非常敏感的化学的和触觉的感受器。章鱼的大脑中有5亿个神经元,具有一种非凡的思维方式(人类还远未弄清楚这种思维方式)。章鱼能够独自解决复杂的问题,即具有所谓的“概念智力”。自从30多年前库斯托研究小组进行了首批试验以来(在当时的试验中,一只章鱼打开了一个用瓶塞塞住的广口瓶,抓住了放在瓶中的一只龙虾),章鱼不断地令科学家们感到惊讶,有时甚至使科学家们感到不安。有人断言:潜伏的章鱼正等待着自己的统治时期的到来。
   喜欢独居处
    章鱼自出生之日起就独居。小章鱼只需极短的时间就能学会应有的本领,并且与大部分动物不同,小章鱼的学习不是以长辈的传授为基础。小章鱼独自学习捕食、伪装、寻找更好的住所。另外,虽然父母遗传给了它们一些能力,但面对一些新问题,它们还必须独自寻找解决新问题的答案。而要想找到新问题的秘密,章鱼则还需学习许多年。然而,章鱼最大的不幸在于它们的寿命超不过3—5岁。短命限制了章鱼获得知识的机会,也摧毁了章鱼深入学习的念头。在被问到如果章鱼的寿命更长,将会发生什么事情时,吉姆·科斯格罗夫答:“那章鱼很可能会来到陆地,做一些关于人类的报道!”