python-webscraping-BeautifulSoup


Pay Notebook Creator: Salah Ahmed0
Set Session Lifetime: 10 minutes0
Total0

Jupyter notebooks

notebooks are great for exploring code and ideas

they provide an interactive environment with tab completion for objects in code, meaning you can explore libraries and methods in a way not possible via terminal or in some ide's

Some important things to help you in your development with jupyter

key function
h pulls out help menu
shift + enter runs notebook cell
? after an object (function, method, or object of class) shows object signature

example

import os
os?
os.<tab>
# shows functions in "os" module
In [ ]:
import os
os?

Python

  • "lists" are similar to arrays in other languages
  • "dictionaries" are hash tables

explore list and dictionary methods

In [1]:
py_list = [1, 'salh', [1, 2]]
py_list.append(40)
py_list.append(2)

py_dictionary = {'alex': 100, 'mike': 80}
py_dictionary['jon'] = 50
py_dictionary['k'] = 4
print(py_dictionary)
print(py_list)
{'mike': 80, 'alex': 100, 'k': 4, 'jon': 50}
[1, 'salh', [1, 2], 40, 2]

OS library

In [4]:
import os
# explore os library
# use method in os module that *lists* names of entries in directory '.', 
#            look at function signature to know what parameters to use
path = '.'
os.listdir(path)
Out[4]:
['workshop.ipynb', 'titles-hypnosis.csv', 'pubmed.ipynb', '.ipynb_checkpoints']
In [15]:
# use join to write to a file
# save it to tmp dir

from os.path import join
join?
directory = '/tmp'
f = 'xyz'
path = join(directory, f)
print(path)
f = open(path, 'w')
f.write('whee')
f.close()
f = open(path, 'r')
print(f.read())
/tmp/xyz
whee

HTTP

The two most important methods are GET and POST check out the other methods

GET requests

get requests retrieve data from a url, you can add parameters to your query to get the page you want

for example:

site: https://youtube.com
query: "'cats' videos"
type the query in youtube search bar, you should get the following url
"https://www.youtube.com/results?search_query=cats"
notice the '?' followed by query keyword and value connected by '='

you can query anything by substituting 'cats' with anything you'd like and going to that url in your browser

HTML

html is the skeleton of websites. You can view a website's html through the developer console. to pull out the developer console in chrome:

os key
linux/windows ctrl + shift + j
osx Cmd + Opt + K

Requests

requests is a python library for handling http requests

it is extremely simple to use and is very intuitive

  • get requests use the 'get' method requests.get
  • post requests use the 'post' method requests.post

use '?' to find out function signatures

  • to get html of a page use the resulting object's content method
In [ ]:
import requests
requests.get?
In [7]:
import requests
url = 'https://nbconvert.readthedocs.io/en/latest/usage.html'
r = requests.get(url)
print(r.url)
print(r.content)
https://nbconvert.readthedocs.io/en/latest/usage.html


<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
  <meta charset="utf-8">
  
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
  <title>Using as a command line tool &mdash; nbconvert 5.2.0.dev documentation</title>
  

  
  
  
  

  

  
  
    

  

  
  

  
    <link rel="stylesheet" href="https://media.readthedocs.org/css/sphinx_rtd_theme.css" type="text/css" />
  

  
        <link rel="index" title="Index"
              href="genindex.html"/>
        <link rel="search" title="Search" href="search.html"/>
    <link rel="top" title="nbconvert 5.2.0.dev documentation" href="index.html"/>
        <link rel="next" title="Using nbconvert as a library" href="nbconvert_library.html"/>
        <link rel="prev" title="Installation" href="install.html"/> 

  
  <script src="_static/js/modernizr.min.js"></script>


<!-- RTD Extra Head -->

<!-- 
Always link to the latest version, as canonical.
http://docs.readthedocs.org/en/latest/canonical.html
-->
<link rel="canonical" href="http://nbconvert.readthedocs.io/en/latest/usage.html" />

<link rel="stylesheet" href="https://media.readthedocs.org/css/readthedocs-doc-embed.css" type="text/css" />

<script type="text/javascript" src="_static/readthedocs-data.js"></script>

<!-- Add page-specific data, which must exist in the page js, not global -->
<script type="text/javascript">
READTHEDOCS_DATA['page'] = 'usage' 		
READTHEDOCS_DATA['source_suffix'] = '.rst'
</script>

<script type="text/javascript" src="_static/readthedocs-dynamic-include.js"></script>

<!-- end RTD <extrahead> --></head>

<body class="wy-body-for-nav" role="document">

   
  <div class="wy-grid-for-nav">

    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search">
          

          
            <a href="index.html" class="icon icon-home"> nbconvert
          

          
          </a>

          
            
            
            
              <div class="version">
                latest
              </div>
            
          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <p class="caption"><span class="caption-text">User Documentation</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Using as a command line tool</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#default-output-format-html">Default output format - HTML</a></li>
<li class="toctree-l2"><a class="reference internal" href="#supported-output-formats">Supported output formats</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#html">HTML</a></li>
<li class="toctree-l3"><a class="reference internal" href="#latex">LaTeX</a></li>
<li class="toctree-l3"><a class="reference internal" href="#pdf">PDF</a></li>
<li class="toctree-l3"><a class="reference internal" href="#reveal-js-html-slideshow">Reveal.js HTML slideshow</a></li>
<li class="toctree-l3"><a class="reference internal" href="#markdown">Markdown</a></li>
<li class="toctree-l3"><a class="reference internal" href="#restructuredtext">reStructuredText</a></li>
<li class="toctree-l3"><a class="reference internal" href="#executable-script">Executable script</a></li>
<li class="toctree-l3"><a class="reference internal" href="#notebook-and-preprocessors">Notebook and preprocessors</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#converting-multiple-notebooks">Converting multiple notebooks</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="nbconvert_library.html">Using nbconvert as a library</a></li>
<li class="toctree-l1"><a class="reference internal" href="latex_citations.html">LaTeX citations</a></li>
<li class="toctree-l1"><a class="reference internal" href="execute_api.html">Executing notebooks</a></li>
</ul>
<p class="caption"><span class="caption-text">Configuration</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="config_options.html">Configuration options</a></li>
<li class="toctree-l1"><a class="reference internal" href="customizing.html">Customizing nbconvert</a></li>
<li class="toctree-l1"><a class="reference internal" href="external_exporters.html">Customizing exporters</a></li>
<li class="toctree-l1"><a class="reference internal" href="external_exporters.html#parameters-controlled-by-an-external-exporter">Parameters controlled by an external exporter</a></li>
<li class="toctree-l1"><a class="reference internal" href="external_exporters.html#writing-a-custom-exporter">Writing a custom <code class="docutils literal"><span class="pre">Exporter</span></code></a></li>
</ul>
<p class="caption"><span class="caption-text">Developer Documentation</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="architecture.html">Architecture of nbconvert</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/index.html">Python API for working with nbconvert</a></li>
</ul>
<p class="caption"><span class="caption-text">About nbconvert</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="changelog.html">Changes in nbconvert</a></li>
</ul>
<p class="caption"><span class="caption-text">Questions? Suggestions?</span></p>
<ul>
<li class="toctree-l1"><a class="reference external" href="https://groups.google.com/forum/#!forum/jupyter">Jupyter mailing list</a></li>
<li class="toctree-l1"><a class="reference external" href="https://jupyter.org">Jupyter website</a></li>
<li class="toctree-l1"><a class="reference external" href="https://stackoverflow.com/questions/tagged/jupyter">Stack Overflow - Jupyter</a></li>
<li class="toctree-l1"><a class="reference external" href="https://stackoverflow.com/questions/tagged/jupyter-notebook">Stack Overflow - Jupyter-notebook</a></li>
</ul>

            
          
        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="index.html">nbconvert</a>
        
      </nav>


      
      <div class="wy-nav-content">
        <div class="rst-content">
          















<div role="navigation" aria-label="breadcrumbs navigation">

  <ul class="wy-breadcrumbs">
    
      <li><a href="index.html">Docs</a> &raquo;</li>
        
      <li>Using as a command line tool</li>
    
    
      <li class="wy-breadcrumbs-aside">
        
            
            
              <a href="https://github.com/jupyter/nbconvert/blob/master/docs/source/usage.rst" class="fa fa-github"> Edit on GitHub</a>
            
          
        
      </li>
    
  </ul>

  
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
  
<style>
/* CSS overrides for sphinx_rtd_theme */

/* 24px margin */
.nbinput.nblast,
.nboutput.nblast {
    margin-bottom: 19px;  /* padding has already 5px */
}

/* ... except between code cells! */
.nblast + .nbinput {
    margin-top: -19px;
}

/* nice headers on first paragraph of info/warning boxes */
.admonition .first {
    margin: -12px;
    padding: 6px 12px;
    margin-bottom: 12px;
    color: #fff;
    line-height: 1;
    display: block;
}
.admonition.warning .first {
    background: #f0b37e;
}
.admonition.note .first {
    background: #6ab0de;
}
.admonition > p:before {
    margin-right: 4px;  /* make room for the exclamation icon */
}
</style>
<div class="section" id="using-as-a-command-line-tool">
<h1>Using as a command line tool<a class="headerlink" href="#using-as-a-command-line-tool" title="Permalink to this headline">¶</a></h1>
<p>The command-line syntax to run the <code class="docutils literal"><span class="pre">nbconvert</span></code> script is:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert --to FORMAT notebook.ipynb
</pre></div>
</div>
<p>This will convert the Jupyter notebook file <code class="docutils literal"><span class="pre">notebook.ipynb</span></code> into the output
format given by the <code class="docutils literal"><span class="pre">FORMAT</span></code> string.</p>
<div class="section" id="default-output-format-html">
<h2>Default output format - HTML<a class="headerlink" href="#default-output-format-html" title="Permalink to this headline">¶</a></h2>
<p>The default output format is HTML, for which the <code class="docutils literal"><span class="pre">--to</span></code> argument may be
omitted:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert notebook.ipynb
</pre></div>
</div>
</div>
<div class="section" id="supported-output-formats">
<span id="supported-output"></span><h2>Supported output formats<a class="headerlink" href="#supported-output-formats" title="Permalink to this headline">¶</a></h2>
<p>The currently supported output formats are:</p>
<blockquote>
<div><ul class="simple">
<li><a class="reference internal" href="#convert-html"><span class="std std-ref">HTML</span></a>,</li>
<li><a class="reference internal" href="#convert-latex"><span class="std std-ref">LaTeX</span></a>,</li>
<li><a class="reference internal" href="#convert-pdf"><span class="std std-ref">PDF</span></a>,</li>
<li><a class="reference internal" href="#convert-revealjs"><span class="std std-ref">Reveal.js HTML slideshow</span></a>,</li>
<li><a class="reference internal" href="#convert-markdown"><span class="std std-ref">Markdown</span></a>,</li>
<li><a class="reference internal" href="#convert-rst"><span class="std std-ref">reStructuredText</span></a>,</li>
<li><a class="reference internal" href="#convert-script"><span class="std std-ref">executable script</span></a>,</li>
<li><a class="reference internal" href="#convert-notebook"><span class="std std-ref">notebook</span></a>.</li>
</ul>
</div></blockquote>
<p>Jupyter also provides a few templates for output formats. These can be
specified via an additional <code class="docutils literal"><span class="pre">--template</span></code> argument and are listed in the
sections below.</p>
<div class="section" id="html">
<span id="convert-html"></span><h3>HTML<a class="headerlink" href="#html" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">html</span></code></p>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--template</span> <span class="pre">full</span></code> (default)</p>
<p>A full static HTML render of the notebook.
This looks very similar to the interactive view.</p>
</li>
<li><p class="first"><code class="docutils literal"><span class="pre">--template</span> <span class="pre">basic</span></code></p>
<p>Simplified HTML, useful for embedding in webpages, blogs, etc.
This excludes HTML headers.</p>
</li>
</ul>
</li>
</ul>
</div>
<div class="section" id="latex">
<span id="convert-latex"></span><h3>LaTeX<a class="headerlink" href="#latex" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">latex</span></code></p>
<p>Latex export.  This generates <code class="docutils literal"><span class="pre">NOTEBOOK_NAME.tex</span></code> file,
ready for export.</p>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--template</span> <span class="pre">article</span></code> (default)</p>
<p>Latex article, derived from Sphinx&#8217;s howto template.</p>
</li>
<li><p class="first"><code class="docutils literal"><span class="pre">--template</span> <span class="pre">report</span></code></p>
<p>Latex report, providing a table of contents and chapters.</p>
</li>
<li><p class="first"><code class="docutils literal"><span class="pre">--template</span> <span class="pre">basic</span></code></p>
<p>Very basic latex output - mainly meant as a starting point for custom
templates.</p>
</li>
</ul>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">nbconvert uses <a class="reference external" href="http://pandoc.org/">pandoc</a> to convert between various markup languages,
so pandoc is a dependency when converting to latex or reStructuredText.</p>
</div>
</li>
</ul>
</div>
<div class="section" id="pdf">
<span id="convert-pdf"></span><h3>PDF<a class="headerlink" href="#pdf" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">pdf</span></code></p>
<p>Generates a PDF via latex. Supports the same templates as <code class="docutils literal"><span class="pre">--to</span> <span class="pre">latex</span></code>.</p>
</li>
</ul>
</div>
<div class="section" id="reveal-js-html-slideshow">
<span id="convert-revealjs"></span><h3>Reveal.js HTML slideshow<a class="headerlink" href="#reveal-js-html-slideshow" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">slides</span></code></p>
<p>This generates a Reveal.js HTML slideshow.
It must be served by an HTTP server. The easiest way to do this is adding
<code class="docutils literal"><span class="pre">--post</span> <span class="pre">serve</span></code> on the command-line. The <code class="docutils literal"><span class="pre">serve</span></code> post-processor proxies
Reveal.js requests to a CDN if no local Reveal.js library is present.
To make slides that don&#8217;t require an internet connection, just place the
Reveal.js library in the same directory where your_talk.slides.html is
located, or point to another directory using the <code class="docutils literal"><span class="pre">--reveal-prefix</span></code> alias.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">In order to designate a mapping from notebook cells to Reveal.js slides,
from within the Jupyter notebook, select menu item
View &#8211;&gt; Cell Toolbar &#8211;&gt; Slideshow. That will reveal a drop-down menu
on the upper-right of each cell.  From it, one may choose from
&#8220;Slide,&#8221; &#8220;Sub-Slide&#8221;, &#8220;Fragment&#8221;, &#8220;Skip&#8221;, and &#8220;Notes.&#8221;  On conversion,
cells designated as &#8220;skip&#8221; will not be included, &#8220;notes&#8221; will be included
only in presenter notes, etc.</p>
</div>
</li>
</ul>
</div>
<div class="section" id="markdown">
<span id="convert-markdown"></span><h3>Markdown<a class="headerlink" href="#markdown" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">markdown</span></code></p>
<p>Simple markdown output.  Markdown cells are unaffected,
and code cells indented 4 spaces.</p>
</li>
</ul>
</div>
<div class="section" id="restructuredtext">
<span id="convert-rst"></span><h3>reStructuredText<a class="headerlink" href="#restructuredtext" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">rst</span></code></p>
<p>Basic reStructuredText output. Useful as a starting point for embedding
notebooks in Sphinx docs.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">nbconvert uses <a class="reference external" href="http://pandoc.org/">pandoc</a> to convert between various markup languages,
so pandoc is a dependency when converting to latex or reStructuredText.</p>
</div>
</li>
</ul>
</div>
<div class="section" id="executable-script">
<span id="convert-script"></span><h3>Executable script<a class="headerlink" href="#executable-script" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">script</span></code></p>
<p>Convert a notebook to an executable script.
This is the simplest way to get a Python (or other language, depending on
the kernel) script out of a notebook. If there were any magics in an
Jupyter notebook, this may only be executable from a Jupyter session.</p>
<p>For example, to convert a Julia notebook to a Julia executable script:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">script</span> <span class="n">my_julia_notebook</span><span class="o">.</span><span class="n">ipynb</span>
</pre></div>
</div>
</li>
</ul>
</div>
<div class="section" id="notebook-and-preprocessors">
<span id="convert-notebook"></span><h3>Notebook and preprocessors<a class="headerlink" href="#notebook-and-preprocessors" title="Permalink to this headline">¶</a></h3>
<ul>
<li><p class="first"><code class="docutils literal"><span class="pre">--to</span> <span class="pre">notebook</span></code></p>
<div class="versionadded">
<p><span class="versionmodified">New in version 3.0.</span></p>
</div>
<p>This doesn&#8217;t convert a notebook to a different format <em>per se</em>,
instead it allows the running of nbconvert preprocessors on a notebook,
and/or conversion to other notebook formats. For example:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">notebook</span> <span class="o">--</span><span class="n">execute</span> <span class="n">mynotebook</span><span class="o">.</span><span class="n">ipynb</span>
</pre></div>
</div>
</li>
</ul>
<p>This will open the notebook, execute it, capture new output, and save the
result in <code class="file docutils literal"><span class="pre">mynotebook.nbconvert.ipynb</span></code>. By default, <code class="docutils literal"><span class="pre">nbconvert</span></code> will
abort conversion if any exceptions occur during execution of a cell. If you
specify <code class="docutils literal"><span class="pre">--allow-errors</span></code> (in addition to the <code class="docutils literal"><span class="pre">--execute</span></code> flag) then
conversion will continue and the output from any exception will be included
in the cell output.</p>
<p>The following command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">notebook</span> <span class="o">--</span><span class="n">nbformat</span> <span class="mi">3</span> <span class="n">mynotebook</span>
</pre></div>
</div>
<p>will create a copy of <code class="file docutils literal"><span class="pre">mynotebook.ipynb</span></code> in <code class="file docutils literal"><span class="pre">mynotebook.v3.ipynb</span></code>
in version 3 of the notebook format.</p>
<p>If you want to convert a notebook in-place, you can specify the ouptut file
to be the same as the input file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">notebook</span> <span class="n">mynb</span> <span class="o">--</span><span class="n">output</span> <span class="n">mynb</span>
</pre></div>
</div>
<p>Be careful with that, since it will replace the input file.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">nbconvert uses <a class="reference external" href="http://pandoc.org/">pandoc</a> to convert between various markup languages,
so pandoc is a dependency when converting to latex or reStructuredText.</p>
</div>
<p>The output file created by <code class="docutils literal"><span class="pre">nbconvert</span></code> will have the same base name as
the notebook and will be placed in the current working directory. Any
supporting files (graphics, etc) will be placed in a new directory with the
same base name as the notebook, suffixed with <code class="docutils literal"><span class="pre">_files</span></code>:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert notebook.ipynb
$ ls
notebook.ipynb   notebook.html    notebook_files/
</pre></div>
</div>
<p>For simple single-file output, such as html, markdown, etc.,
the output may be sent to standard output with:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert --to markdown notebook.ipynb --stdout
</pre></div>
</div>
</div>
</div>
<div class="section" id="converting-multiple-notebooks">
<h2>Converting multiple notebooks<a class="headerlink" href="#converting-multiple-notebooks" title="Permalink to this headline">¶</a></h2>
<p>Multiple notebooks can be specified from the command line:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert notebook*.ipynb
$ jupyter nbconvert notebook1.ipynb notebook2.ipynb
</pre></div>
</div>
<p>or via a list in a configuration file, say <code class="docutils literal"><span class="pre">mycfg.py</span></code>, containing the text:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">c</span> <span class="o">=</span> <span class="n">get_config</span><span class="p">()</span>
<span class="n">c</span><span class="o">.</span><span class="n">NbConvertApp</span><span class="o">.</span><span class="n">notebooks</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;notebook1.ipynb&quot;</span><span class="p">,</span> <span class="s2">&quot;notebook2.ipynb&quot;</span><span class="p">]</span>
</pre></div>
</div>
<p>and using the command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ jupyter nbconvert --config mycfg.py
</pre></div>
</div>
</div>
</div>


           </div>
           <div class="articleComments">
            
           </div>
          </div>
          <footer>
  
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
      
        <a href="nbconvert_library.html" class="btn btn-neutral float-right" title="Using nbconvert as a library" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
      
      
        <a href="install.html" class="btn btn-neutral" title="Installation" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
      
    </div>
  

  <hr/>

  <div role="contentinfo">
    <p>
        &copy; Copyright 2015-2017, Jupyter Development Team.
      
        <span class="commit">
          Revision <code>b557f3fc</code>.
        </span>
      

    </p>
  </div>
  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 

</footer>

        </div>
      </div>

    </section>

  </div>
  

  <div class="rst-versions" data-toggle="rst-versions" role="note" aria-label="versions">
    <span class="rst-current-version" data-toggle="rst-current-version">
      <span class="fa fa-book"> Read the Docs</span>
      v: latest
      <span class="fa fa-caret-down"></span>
    </span>
    <div class="rst-other-versions">
      <dl>
        <dt>Versions</dt>
        
          <dd><a href="/en/latest/">latest</a></dd>
        
          <dd><a href="/en/stable/">stable</a></dd>
        
          <dd><a href="/en/5.1.1/">5.1.1</a></dd>
        
          <dd><a href="/en/5.1/">5.1</a></dd>
        
          <dd><a href="/en/5.0.0/">5.0.0</a></dd>
        
          <dd><a href="/en/4.3.0/">4.3.0</a></dd>
        
          <dd><a href="/en/4.2.0/">4.2.0</a></dd>
        
          <dd><a href="/en/5.x/">5.x</a></dd>
        
      </dl>
      <dl>
        <dt>Downloads</dt>
        
          <dd><a href="//readthedocs.org/projects/nbconvert/downloads/pdf/latest/">pdf</a></dd>
        
          <dd><a href="//readthedocs.org/projects/nbconvert/downloads/htmlzip/latest/">htmlzip</a></dd>
        
          <dd><a href="//readthedocs.org/projects/nbconvert/downloads/epub/latest/">epub</a></dd>
        
      </dl>
      <dl>
        <dt>On Read the Docs</dt>
          <dd>
            <a href="//readthedocs.org/projects/nbconvert/?fromdocs=nbconvert">Project Home</a>
          </dd>
          <dd>
            <a href="//readthedocs.org/builds/nbconvert/?fromdocs=nbconvert">Builds</a>
          </dd>
      </dl>
      <hr/>
      Free document hosting provided by <a href="http://www.readthedocs.org">Read the Docs</a>.

    </div>
  </div>



  

    <script type="text/javascript">
        var DOCUMENTATION_OPTIONS = {
            URL_ROOT:'./',
            VERSION:'5.2.0.dev',
            COLLAPSE_INDEX:false,
            FILE_SUFFIX:'.html',
            HAS_SOURCE:  true,
            SOURCELINK_SUFFIX: '.txt'
        };
    </script>
      <script type="text/javascript" src="https://media.readthedocs.org/javascript/jquery/jquery-2.0.3.min.js"></script>
      <script type="text/javascript" src="https://media.readthedocs.org/javascript/jquery/jquery-migrate-1.2.1.min.js"></script>
      <script type="text/javascript" src="https://media.readthedocs.org/javascript/underscore.js"></script>
      <script type="text/javascript" src="https://media.readthedocs.org/javascript/doctools.js"></script>
      <script type="text/javascript" src="https://media.readthedocs.org/javascript/readthedocs-doc-embed.js"></script>

  

  
  

  
  
  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.StickyNav.enable();
      });
  </script>
   

</body>
</html>
In [8]:
# get request
query = 'pianos'
p = {'q': query}
print(p)
r = requests.get('https://youtube.com', params=p)
print(r.url)
p['time'] = 'now'
print(p)
r = requests.get('https://google.com', params=p)
print(r.url)
{'q': 'pianos'}
https://www.youtube.com/?q=pianos
{'q': 'pianos', 'time': 'now'}
https://www.google.com/?q=pianos&time=now

BeautifulSoup

BeautifulSoup is a library used to parse html and xml

important methods:

  • find
    • finds first tag, class, id, or other selector types matching query
    • finding 'p' tag
      • soup.find('p')
      • soup.find(id='myid')
      • soup.find(class_='myclass')
  • find_all
    • find all matching tags and return them in a list
      • soup.find_all('myclass')
In [21]:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<div class="title"><p> Pie </p></div><div class="title"><p> xyz </p></div>',
                    'html5lib')
print(soup.text)
print(soup.find('p'))
print(soup.find(class_='title'))
print(soup.find_all(class_='title'))
 Pie  xyz 
<p> Pie </p>
<div class="title"><p> Pie </p></div>
[<div class="title"><p> Pie </p></div>, <div class="title"><p> xyz </p></div>]

open this url

In [22]:
url = 'https://nbconvert.readthedocs.io/en/latest/usage.html'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
print(soup.find('h1').text)
Using as a command line tool¶

pandas.DataFrame

DataFrame from the pandas module transforms a list or dictionary into a table

In turn we can export the table to a csv object

csv

csv stands for comma separated values, and is the common way to view data, similar to excel files

the format is

row1col1,row1col2
row2col1,row2col2

which is equivalent to the table

row1col1 row1col2
row2col1 row2col2
In [28]:
from pandas import DataFrame
data = [(1, 2), (2, 3), (4, 5)]
df = DataFrame(data, columns=['x', 'y'])
print('table:')
print(df)

print('--------------')
print('csv:')
print(df.to_csv(index=False))
table:
   x  y
0  1  2
1  2  3
2  4  5
--------------
csv:
x,y
1,2
2,3
4,5