| FAPESP Genome Program | |||||||||||||||||||||||||||||||
| ONSA is a network of research laboratories in the State of São Paulo funded by FAPESP to implement its Genome Program. The Sugarcane EST project will be part of the ONSA network. | |||||||||||||||||||||||||||||||
SUCEST - THE SUGARCANE EST PROJECT
INDEX
INTRODUCTION
Sugarcane is one of the worlds most important crop plants and is cultivated in tropical and subtropical areas in more than 80 countries around the globe. In 1995, 1.2 x109 tons of sugarcane were produced in an area estimated at 18 million ha and was used mainly for sugar consumption or as an energy source (ethanol and electricity). Brazil is responsible for 25% of the worlds production, half of which comes from São Paulo State.
The cultivated sugarcane varieties are the result of interespecific hybridization involving Saccharum officiarum, S. barberi, S. sinense and the wild species S. spontaneum and S. robustum. It is thought that S. officinarum was originally selected by humans in Papua New Guinea, perhaps from S. robustum germplasm. Because of its multispecies origin, sugarcane is thought to have one of the most complex plant genomes carrying variable chromosome numbers (generally 2n = 70-120) with a commensurately large DNA content. This complexity complicates the application of conventional genetics and breeding techniques.
Other plant species present complex genomes and this has stimulated researchers to develop comparative mapping strategies using less complex genomes as a reference and/or source of genetic material. Using probes that hybridize to several species, consensus maps are being developed for related species with similar genomes. Such maps are based on conservation of linkage groups composed of homologous chromosomes. These maps merge information derived from closely related species and are useful for cross-referencing genetic information from more distantly related species.
Over the past few years, comparative mapping of grasses, including the major cereal crops, has revealed a relatedness much greater than previously expected. Gene composition and gene orders along chromosomes are being found to be highly conserved. This implies that information can be transferred between species, once the appropriate cross-map comparison has been confirmed. A full array of grass species could be surveyed in the near future to identify useful alleles of genes for engineering and breeding purposes. To manage the large amount of information organizations and/or consortium has been created to provide researchers working on grass species - including the major cereals, the forage grasses, sugarcane and their wild relatives - access to all the DNA resources they require. Because of its small genome and the existence of detailed genetic maps as well as a project intended to sequence the entire genome, rice was chosen as a reference species. The genome of each grass species will be related to that of rice by using several strategies including anchor probes, sequence tagged markers and ESTs. ESTs should play a major role in generating information on expressed genes in grasses. In the last 5 years two large scale plant EST sequencing projects have been undertaken: Rice and Arabidopsis. Recently several plant EST sequencing projects were launched including the major crops maize and soybean. However the data already available and that will be available in the near future will not supply all the information needed to solve the anatomy of complex genomes such as sugarcane.
It is clear that sugarcane researchers will benefit from the information generated on comparative mapping, but it is also clear that those information will not supply all the data concerning the architecture of its complex genome. At present, sugarcane genome projects are being conducted in Australia, South Africa and the USA. In Australia and the USA, the projects are mainly focused on mapping and application of DNA markers for sugarcane genetics and breeding, while South Africa is conducting a small EST project. The information generated by these projects has been utilized in comparative mapping of the grasses family using common markers that hybridize with sugarcane, rice, maize, hexaploid wheat, barley, diploid oat and sorghum. However the molecular information developed to date for sugarcane is minimal compared to the information necessary to identify and characterize loci encoding traits of physiological and agronomic importance. Genetic systems regulating differentiation and development or controlling important traits, such as pest resistance, amino acid and sugar metabolism, among many others, could be identified in a large scale EST sequencing project. These informations could subsequently be used to identify homologous loci in other grasses by comparative mapping, thus complementing the information generated from the rice genome project.
To date, there are over 1.5 million ESTs deposited in the public databases (Table I), of which 63,2% correspond to human and 20% to mouse genes. Only 4,6% of the public available ESTs are derived from plants.
Table I. Summary of ESTs in the dbEST database (last update February 25, 1999)
Total number of public entries 2,153,771
| Homo sapiens (human) | 1,270,291 |
| Mus musculus + domesticus (mouse) | 413,370 |
| Rattus sp. (rat) | 94,118 |
| Caenorhabditis elegans (nematode) | 72,568 |
| Drosophila melanogaster (fruit fly) | 64,186 |
| Arabidopsis thaliana (thale cress) | 37,673 |
| Oryza sativa (rice) | 35,215 |
| Brugia malayi (parasitic nematode) | 16,642 |
| Emericella nidulans | 12,998 |
| Danio rerio (zebrafish) | 11,340 |
| Dictyostelium discoideum | 10,700 |
| Toxoplasma gondii | 10,676 |
| Neurospora crassa | 10,229 |
| Schizosaccharomyces pombe (fission yeast) | 8,118 |
| Schistosoma mansoni (blood fluke) | 7,821 |
| Onchocerca volvulus | 7,092 |
| Trypanosoma cruzi | 6,911 |
| Bombyx mori (domestic silkworm) | 6,458 |
| Glycine max (soybean) | 4,992 |
| Populus tremula x Populus tremuloides | 4,809 |
| Zea mays (maize) | 4,468 |
| Trypanosoma brucei rhodesiense | 4,043 |
| Saccharomyces cerevisiae (baker's yeast) | 3,041 |
| Plasmodium falciparum (malaria parasite) | 2,871 |
| Caenorhabditis briggsae | 2,424 |
| Sus scrofa (pig) | 2,374 |
| Leishmania major | 2,191 |
| Pinus taeda (loblolly pine) | 1,953 |
| Brassica napus (oilseed rape) | 1,702 |
| Oryctolagus cuniculus (rabbit) | 1,662 |
| Gossypium hirsutum (upland cotton) | 1,327 |
| Magnaporthe grisea (rice blast fungus) | 1,152 |
| Mesembryanthemum crystallinum (common ice plant) | 1,094 |
| Brassica campestris (field mustard) | 966 |
| Medicago truncatula (barrel medic) | 899 |
| Populus balsamifera subsp. trichocarpa | 883 |
| Citrus unshiu | 874 |
| Schistosoma japonicum (blood fluke) | 799 |
| Ricinus communis (castor bean) | 747 |
| Pristionchus pacificus | 703 |
| Pyrococcus furiosus (hyperthermophilic archaeon) | 577 |
| Cryptosporidium parvum | 567 |
| Paralichthys olivaceus | 550 |
| Brassica rapa ssp. pekinensis (Chinese cabbage) | 536 |
| Manduca sexta (tobacco hornworm) | 524 |
| Toxocara canis | 519 |
| Saccharum sp. | 495 |
| Other plants (12) | 2,219 |
| Other organims (31) | 6,269 |
Objectives
The main goal of SUCEST is to undertake a large scale EST program by sequencing random clones from cDNA libraries prepared from several sugarcane tissues (calli, root, stalk, etiolated leaves, flowers, developing seed, etc). The aim of the project is to identify around 50,000 sugarcane genes. The project will be considered finished when this goal is reached or when 300,000 reads are deposited. Computational analysis and similarity searches against known sequences deposited in the databases will be performed in order to putatively identify the ESTs. The information provided by SUCEST can be exploited by the research community in studies aimed to use the sugarcane genes as source of markers for agriculturally significant characteristics and also to provide molecular basis for studies of plant growth and development that could be further used to solve questions in plant physiology, biochemistry, cell biology, pathology and ultimately plant breeding. The cDNA clones whose nucleotide sequence has been determined will be used to complement the sugarcane molecular map and fabricate microarrays of immobilized DNAs which will be used to survey expression of each gene in different sugarcane tissues under different environmental conditions.
General Project Organization
SUCEST will be overseen by a Steering Committee (SC), nominated by FAPESP, composed of two international experts in EST sequencing and two scientists from the State of São Paulo.
The DNA Coordinators will be appointed by FAPESP to run the project. He will be responsible for generating the cDNA libraries, distribution of the clones, and coordinating the flow of information from the network laboratories to the Bio-informatic Center. The DNA Coordinator should organize strategies to construct the cDNA libraries and distribute the clones for sequencing. All the clones should be maintained by the DNA coordinator until the sequence is deposited in the database.
Will be responsible for establishing computerized tools for the collection of sequence data generated by the sequencing laboratories, analysis and annotation of the sequences and creating a Sugarcane Genome Database. The Bio-informatic Coordinator will centralize all the sequence generated by the network. He will evaluate the quality of the sequence delivered and perform analysis according to the following scheme:

The sequencing effort in SUCEST will be done by members of the ONSA network. In addition to these ONSA labs, five to ten new laboratories or groups will be admitted to ONSA through SUCEST. Any interested laboratory may apply for the new sequencing laboratories including less experienced applicants. The participation in SUCEST of ONSA members involved in the Xylella Genome Project will be granted under conditions described below.
Storage and Distribution Coordinator:
A laboratory will be selected by the SC for storage, in the form of bacterial culture, of clones representing each cluster deposited in the database. A distribution strategy should be organized in order to supply, upon request, the clone and any related information to any place in the world.
International Cooperation Coordinator:
Will be responsible for establishing cooperation with IGGI and the International Consortium of Sugarcane Biotechnology (ICSB) in order to facilitate the use of the information produced by SUCEST. The Coordinator should also establish interaction with other Plant Genome Projects and any group or organization interested in SUCEST.
Laboratories should submitted proposal to continuously identify and annotate from the SUCEST database all the proteins related to a specific theme.
Expression Analysis Laboratories
Laboratories will be selected by the SC, on the basis of a submitted proposal, to establish microarray technologies to be used by SUCEST to study the expression of sugarcane genes. The proposal should include equipment and personal training in a well-established laboratory. The selected laboratories shall have one year to establish the proposed technology. After this period the selected laboratories should be able to make microarray technologies available to the ONSA network to study gene expression.
Conditions for joining ONSA through SUCEST
Joining ONSA through SUCEST requires that the laboratories contribute to the sequencing, bioinformatic, storage, data mining and expression analysis of sugarcane cDNAs, according to an agreed schedule. Sequencing laboratories of the ONSA network already involved in the Xylella fastidiosa Genome Project will be welcome to join SUCEST under the special conditions described below. Any group willing to undertake large scale sequencing, data mining and expression analysis of ESTs is welcome to join the collaborative effort as long as they are willing to agree to the following guidelines:
Sequencing labs: A group must agree to execute a minimum of 4000 single pass sequences (phred quality >20, minimum sequence length 300 nucleotides) per year to maintain membership. Members agree to declare their sequencing plans and to provide detailed progress reports. Members will be granted in the form of contracts between the laboratory and the SUCEST funding agency FAPESP. Members will receive US$ 12,00/EST, 50% being paid up-front, 25% upon delivery of the first 2000 sequences to the Bio-Informatic Coordinator and the remaining 25% after delivery of the final 2000 sequences. The new groups to be admitted to ONSA shall have the same conditions of those involved in the X. fastidiosa Genome Project. Upon admission they shall receive R$ 12,00/EST plus an amount, to set up the sequencing laboratory. Equipment bought with this money will only be formally transferred to the laboratory after the sequence is delivered. Non compliance with either schedule or quality will mean that the laboratory will be excluded from the network, will not receive the remaining funds and will have its equipment relocated to another laboratory. These decisions will be taken based on the recommendation of the SC. On the other hand, when a laboratory delivers the sequence it can apply for a second assignment.
Bio-informatic: Candidates should submit a detailed proposal, including description of the computer facility available to support the beginning of the project. The proposal should also include a description of the computerized tools for the collection of sequence data generated by the sequencing laboratories, analysis and annotation of the sequences and creation of a Sugarcane Genome Database.
Data mining labs: Groups most agree to search the sugarcane database for genes related to one of the following topics: sugar metabolism, amino acid metabolism, transcription factors, nuclear and membrane receptors, pathogens resistance (virus, bacteria, fungi), environmental stress, cell cycle, cell signaling, secondary metabolites, fitohormone biosynthesis and regulation. The annotation of these genes should be continuously deposited at the Bio-informatic Lab and should include a list of relevant related bibliography. A progress report should be submitted to the DNA coordinator every two months. Each group should submit a proposal for equipment and other expenses related to the project.
Expression Analysis Laboratories: Groups most agree to set up a microarray technology to be used by the ONSA network. A proposal including personal training and equipment should be submitted. An experienced laboratory where personal should by trained most be suggested. A progress report should be submitted to the project coordinator every two months. The selected laboratories shall have one year to establish in his/her lab the proposed technology. After this period the selected laboratories should be able to use the microarray technologies to study sugarcane ESTs expression.
Storage and Distribution Laboratory: Groups most send a detailed proposal to set up a laboratory for storage and distribution of clones. The proposal should include strategies for quality control. Quality control should by checked by the SUCEST network by sequencing random clones every 6 months.
Fellowships
Fellowships (see the category and value in the Xyllela fastidiosa Genome Project Web Site) are not included in amount to be paid for the activities described above. Fellowship applications should be addressed to SUCEST and submitted to FAPESP under the normal foundation rules. The local members of the SC will evaluate the proposals.
Conditions for ONSA members in Xylella project
Members of the Xylella Genome Project network whose sequencing effort have been completed may apply, after approval by the DNA coordinator of the Xylella project. An electronic letter of intent should be sent using the forms available at the web site http://watson.fapesp.br/sucest.htm. Selected candidates will be invited for a meeting with the selected DNA coordination at FAPESP before the final contracts are signed. Questions regarding applications can be sent by e-mail Sucest@trieste.fapesp.br. Only electronic applications will be accepted. The deadline for applications is April 6th 1999.
How to become a member
If you are interested in joining the ONSA network through SUCEST you should submit an electronic letter of intent using the forms available at the web site http://watson.fapesp.br/sucest.htm. Selected candidates will be invited for a meeting with the overall coordinator at FAPESP before the final contracts are signed. Questions regarding applications can be sent by e-mail (Sucest@trieste.fapesp.br). Only electronic applications will be accepted. The deadline for applications is April 6th 1999.
Preliminary Timetable
| April 6th 1999 | All applications deposited in FAPESP. |
| April 30th 1999 | Selection finished by SC, contracts signed, and first stage payments made. |
| May 5th 1999 | Start clone distribution |
| December 1999 | 20,000 EST sequences deposited. |
| July 2000 | 60,000 EST sequences deposited. |
| December 2000 | 100,000 EST sequences deposited |
| July 2001 | 140,000 EST sequences deposited |
| December 2001 | 180,000 EST sequences deposited |
| July 2002 | 220,000 EST sequences deposited |
| December 2002 | 260.000 EST sequences deposited |
| July 2003 | 300.000 EST sequences deposited and project finished. |