This signature was driven by genes that are typically expressed only upon differentiation of LPs into secretory alveolar cells in a hormone-dependent manner during gestation/lactation and included caseins (Csn1s1, Csn1s2a, Csn2, and Csn3), milk mucins (Muc1/15), lactose synthase (Lalba), apolipoprotein D (Apod), and milk proteins (Glycam1, Spp1, and Wap; Fig. Two independent sgRNAs/genes were used, and data were combined (see Supplementary Fig. It was an innovation of the limma package to show that exact small-sample inference could be conducted using the empirical Bayes posterior variance estimators (16). Kdm6afl/fl;LSL-Cas9-EGFP mice were crossed to Krt5-CreERT2 mice [Krt5tm1.1(cre/ERT2)Blh, #029155 from The Jackson Laboratory] and then to Kdm6afl/fl;R26-LSL-Pik3caH1047R mice to generate Kdm6afl/fl;R26-LSL-Pik3caH1047R;LSL-Cas9-EGFP;Krt5-CreERT2 mice. In the first ATAC-seq paper (Buenrostro et al., 2013), all reads aligning to the + strand were offset by +4 bp, and all reads aligning to the strand were offset 5 bp, since Tn5 transposase has been shown to bind as a dimer and insert two adaptors
C, Whole-mount image of mammary glands 4 weeks and 7.5 weeks after Ad-K5-Cre injection showing K14+/K8+ (empty arrowheads) as well as K14/K8 double-positive and K14/K8+ GFP+ lineage-traced cells (filled arrowheads). In the case of RGList, MAList, EListRaw and EList, rows correspond to probes/genes and columns to different samples. EpiDriver mutations are found in 39% of human breast cancers, and 50% of ductal carcinoma in situ express casein, suggesting that lineage infidelity and alveogenic mimicry may significantly contribute to early steps of breast cancer etiology. In the first ATAC-seq paper (Buenrostro et al., 2013), all reads aligning to the + strand were offset by +4 bp, and all reads aligning to the strand were offset 5 bp, since Tn5 transposase has been shown to bind as a dimer and insert. Genomic DNA from epithelial and tumor cells was isolated with the DNeasy Blood and Tissue Kit (Qiagen). Cases with casein staining did not show statistically significant differences with regard to ipsilateral breast cancer recurrence, although trends toward poorer outcome were observed especially in PR+, as well as HER2+ HR+, cases (Supplementary Fig. We optimized the parameters for an in vivo CRISPR screen by using a mixture of lentiviruses expressing GFP or RFP to determine the viral titer that transduces the mammary epithelium at clonal density (multiplicity of infection <1). Egan, The Hospital for Sick Children], R26-LSL-Cas9-GFP [Gt(ROSA)26Sortm1(CAG-xstpx-cas9,-EGFP)Fezh/J, #026175, in C57/Bl6 background from The Jackson Laboratory], LSL-TdTomato [B6;129S6-Gt(ROSA)26Sortm14(CAG-tdTomato)Hze/J, #007908 from The Jackson Laboratory], Asxl2fl/fl [C57BL/6N-Asxl2tm1c(EUCOMM)Hmgu/Tcp generated by The Canadian Mouse Respiratory], and Kdm6afl/fl [Kdm6atm1.1Kaig] mice kindly provided by Jacob Hanna, Weizmann Institute of Science. Small, compact genomes confer a selective advantage to viruses, yet human cytomegalovirus (HCMV) expresses the long non-coding RNAs (lncRNAs); RNA1.2, RNA2.7, RNA4.9, and RNA5.0. Our data now show that, given the right combination of oncogene and cooperating epigenetic alteration, basal cells can also be the cell of origin of luminal tumors. The orange line between the two cohorts indicates the significant difference of absence variation between the two groups. Significance of the difference between groups was calculated by a two-tailed Student t test (with Welch correction when variances were significantly different), Wilcoxon rank-sum test (when data were not normally distributed), or log-rank test for survival data using Prism 7 (GraphPad Software) unless otherwise specified in the figure legends. This capability to sequence DNA at high throughput and low cost has enabled the development of a growing number of sequencing-based methods and applications. limma however is able to analyse RNA-seq read counts with high precision by converting counts to the log-scale and estimating the mean-variance relationship empirically (Figure 3A). The limma package has benefited from many other people, too many to list here, who have made suggestions, reported bugs or contributed code. Vignette: examples (not run) for deviations from SSWM. 2017 Jun;14(6):584. This bug was reported by Christopher Wilks. 1A). (B) Venn diagram showing overlap in the number of DE genes for three comparisons from the same study as (A), generated by the vennDiagram function. The gene expression was quantified using featureCounts (2.0.1) 43 Bugfix for d=NA with specified subset.row= in fastMNN(). Finally, the individual files resulting from the batch analysis were consolidated in RStudio using phenoptr reports to determine the percentage of total casein per TMA core, and this information was aligned with known clinical data. The top 20 enriched pathways are shown. Metascape analysis was performed using default settings (85). I typically use the GENCODE annotation as it combines comprehensive gene annotation and transcript sequences. Chromatin was shared into 200- to 500-bp fragments with 8 cycles of 30 seconds sonication and 30 seconds of pause at 4C using the Bioruptor Pico sonicator (Diagenode). Bumped version (to for new BioC devel. The fact that the same linear model is fitted to each gene allows us to borrow strength between genes in order to moderate the residual variances (16). Counts were obtained using featureCounts (Subread package version 2.0.0) with the settings -s2 and -t gene . Linear models allow researchers to test very flexible hypotheses, not just simple comparisons between groups but also interaction effects or more complex customized comparisons. The plot_river function now shows the number of mutations per sample. The package supports both single-cell and single-molecule Shifting reads. Models can be fit robustly or by least squares. The Hyperion Imaging System (Fluidigm) was calibrated using a tuning slide, and IMC images were acquired at 1-m resolution at 200 Hz. Focusing specifically on EpiDriver-mutant versus control sgNT Pik3caHR tumors revealed that EpiDriver inactivation leads to upregulation of epithelial-to-mesenchymal transition (EMT) and proinflammatory interferon-/ responses and downregulation of cellular metabolism (oxidative phosphorylation and fatty acid metabolism) and estrogen responses (Supplementary Fig. Bug fixes in handling of divide and conquer inference. One way to address this question is to count the overlap in differentially expressed genes from the two treatments, as in Figure 4B. As happens with technical sequences, trying to align reads that contain low-quality ends can lead to misplacement or poor mapping quality. E, UMAP plots showing open chromatin associated with the alveolar/lactation-associated genes Lalba and Csn2. Data used to Depends The merged samples were first embedded in UMAP by running latent semantic indexing with 1 iteration with the iterative latent semantic indexing (LSI) function. To remove technical/contaminant sequences and low-quality ends, read trimming tools like. Hence the length of a sequence has no effect on the coverage of that sequence. Mice were monitored for tumor formation by mammary gland palpation for 6 months. In this secondary screen, the histone lysine demethylase and nuclear receptor corepressor hairless (Hr), the interleukin 4 receptor (Il4ra), and the transcription repressor Bcl6 scored as hits, indicating that these shared downregulated genes function themselves as tumor suppressors (Supplementary Fig. B, Volcano plots showing differentially accessible chromatin peaks between Pik3caH1047R;Kdm6afl/fl and wild-type control, between Pik3caH1047R;Kdm6afl/fl and Pik3caH1047R, or between Pik3caH1047R and wild-type control LP cells. Starting from an aligned bam file, we show how to perform quality These new analyses are described briefly later in this article. The result is a normalized data matrix K=RAS, a product features per sample. In contrast, ASXL2-, KDM6A-, KMT2C-, or PTEN-mutant spheres showed a transformed phenotype with large branching protrusions (Supplementary Fig. The genomic regions are often genes or exons, but could in principle be any genomic feature of interest. The RelTime algorithm employed in the command line version of MEGA7 was used to infer the relative divergence times. Fold change over input tracks was generated using the macs2 bdgcmp utility. scRNA-seq reveals basal-to-alveolar transdifferentiation at the onset of breast cancer initiation. Probing deeper into the mechanism of how inactivation of Kdm6a affects transcription and chromatin accessibility at the onset of transformation, we performed parallel single-cell RNA-seq (scRNA-seq) and single-nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq). In human tumors, EpiDriver genes are deleted or harbor nonsense or missense mutations. D, UMAP and violin blots showing alveogenesis signature. For instance, as there is not a reference sequence for the genome of the coyote, we can use that of the closely related dog for the read alignment. For paraffin sections, samples were embedded in paraffin, sectioned, and rehydrated, and antigen retrieval was performed with sodium citrate buffer. G.D. Bader: Conceptualization. The highly parallel nature of gene expression experiments lends itself to a particular class of statistical methods, called parametric empirical Bayes, that borrow information between genes in a dynamic way (14,15). For instance, suboptimal DNA preparation procedures may leave a high proportion of DNA-converted ribosomal RNA (rRNA) in the sample. D.W. Cescon: Data curation, investigation. Trp53- and Apc-mutant tumors presented mostly as squamous or basal-like tumors. Heat map depicts how these pathways are altered in the three major epithelial lineages. A consensus peak set was generated per histone modification by merging peak sets from wild-type and knockout conditions. In small, complex experiments, the potential compromises involved in modelling expression values using parametric distributions, which can never be perfectly correct, are outweighed by the gains in precision and accuracy by modelling the variance structure more realistically. Eight hours after transfection, media were added to the plates supplemented with 10% fetal bovine serum and 1% pencillinstreptomycin antibiotic solution (w/v). H. Bergholtz: Data curation, formal analysis. Gene setbased analysis of differentially expressed genes by RNA sequencing (RNA-seq) again revealed EMT and differentiation as the most significant sets upregulated in cultured Kdm6a-mutant mammary tumor cells (Supplementary Fig. 